iSCSI/mdadm shenanigans

I’ve spent the better part of the last six months wrestling with a problem with Open-iSCSI on CentOS 7 and 8. Here’s the scenario:

I set up four server virtual machines (A, B, C, D), and have two extra disks/block devices per server. I add these disks as backstores/LUNs for the iSCSI target configuration (in targetcli). I then set up a fifth server (which I call the client), which acts as an iSCSI initiator, and loaded the iSCSI disks from all of the targets using iscsiadm. I then created an mdadm RAID10 array with these iSCSI disks (six disks in the array, with two spares). I then formatted and mounted the resulting /dev/md0 array. This all works with no problems.

Part of the test is to shut down one of the servers, and see if mdadm begins rebuilding the array with the configured spares. When I shutdown the target server, the client did notice the disconnect. I hadn’t waited long enough to see if mdadm started rebuilding the array with the spare. Another, larger problem surfaced: the server lost the backstores when it rebooted.

I went through several iterations reproducing the problem. I had initially found this problem on the Linux Academy’s Playground servers. I then set up my own local VirtualBox VMs and was able to replicate the problem. I even set up these VMs with Arch Linux. At first Arch didn’t reproduce the problem, but then I remembered that Arch doesn’t install mdadm by default. Once I installed mdadm on the Arch servers, the problem came back.

So, the problem was on reboot the mdadm subsystem would see that the attached extra disks were Linux mdadm RAID members, which would lock out the target configuration. When the LIO subsystem (handles iSCSI) tried to restore the storage objects/backstores, mdadm already had them loaded so LIO/iSCSI said they were already in use. targetcli confirmed this with zero storage objects (no LUNs anymore, either).

The fix was deceptively simple. I had to create or modify /etc/mdadm.conf on the servers, and ensure it had only the following contents:

AUTO -all
ARRAY <ignore> uuid=UUID_of_RAID_members

The UUID of the RAID members was visible in lsblk -f on the server in the failure state, or was available in lsblk -f on the client. I needed the UUID of the RAID members (all the same for each iSCSI block device), not the UUID of the md0 filesystem. I rebooted all servers after making this change, and then rebooted the client. The RAID array on the client came back OK, so I finally figured out the problem!

midnight-commander shortcuts I will use…

These are my shortcuts. Some are default (marked with default). Others are ones that I’ve set (to match a shortcut on some other system Here’s the list, in no particular order:

Shortcut Effect
Meta+. (Alt+.) Toggle hidden files/directories. default
Control+o Toggle the full-screen subshell. default
Control+p Previous command in subshell history (matches ksh Emacs mode on AIX machine at MCLC).
Control+n Next command in subshell history (matches ksh Emacs mode on AIX machine at MCLC).

That’s it for now. Replacing Alt-Tab (default shell completion) with Shift-Tab, and swapping OtherPanel toggle (Tab) with the Completion command still elude me.

Increase random entropy pool in Debian sid

Hopefully this will be a short post. I saw some folks in IRC (one of the many #debian channels I’m connected to) chatting about /dev/urandom and /dev/random, and increasing its available entropy pool. This entropy pool is where all the random numbers generated by a Linux system come from. The higher the entropy pool value, the more truly random the pseudorandom number generator (PRNG) like /dev/random and /dev/urandom can be. This has a specific impact on computer cryptography: if your random number pool is low on entropy, its sequence of random numbers can be guessed relatively easily. I found this Wikipedia article which briefly describes the technology on various operating systems.

The operating system file (well, in the /proc pseudo-filesystem) which displays how much entropy my system currently has is /proc/sys/kernel/random/entropy_avail. It will change over time; to watch it change I used this command:

watch -n 1 cat /proc/sys/kernel/random/entropy_avail

This showed my entropy fluctuating between 100 and 200, which is pretty low and not very useful (or secure). I did some research to try and discover a way to increase this entropy pool. Probably the best option is a hardware random number generator (HRNG), maybe sometimes called a true random number generator (TRNG). These cost money, money I don’t have for spending. I found randomsound, but running it did not appear to affect my entropy one way or the other (probably because on my home machine I don’t have a mic). I found this blog post, but it initially suggests a questionable method to increase entropy. Its update, quietly hidden at the top of the post, gives the solution I came upon.

The solution was to use haveged. This uses nondeterministic optimizations available in modern CPU hardware as its random source. When I ran it with the default options, my entropy pool shot up to between 1024 and 4096. Much improved. In a post further on down on Chris’s blog, someone suggested using the /proc/sys/kernel/random/poolsize as the lower threshold, with the -w option. Debian provides an /etc/default/haveged file where you can place these options:

DAEMON_ARGS="-w $(cat /proc/sys/kernel/random/poolsize)"

Currently, poolsize is set to 4096. Should a new kernel from the Debian team set this pool to be different, haveged will automatically be set to whatever value it is. I have successfully set this on my main workstation machines at work and at home. I will set this on my laptop and my VPS systems, and see how it goes.

UPDATE: All but one of my VPS systems was able to use haveged. The one outlier was because it’s on an OpenVZ VM system, and I don’t have access to those particular parts of the kernel (even as root). I have relegated that VPS to being just a toy, since I can’t really use it for much else. I will probably cancel my subscription to that altogether. We’ll have to see about that.