FS#25132 - [mdadm] mkinitcpio hook does not contain any "mdadm" call
Attached to Project:
Arch Linux
Opened by Phillip Keldenich (Zac) - Friday, 15 July 2011, 08:05 GMT
Last edited by Tobias Powalowski (tpowa) - Sunday, 22 January 2012, 10:52 GMT
Opened by Phillip Keldenich (Zac) - Friday, 15 July 2011, 08:05 GMT
Last edited by Tobias Powalowski (tpowa) - Sunday, 22 January 2012, 10:52 GMT
|
Details
Description:
After updating from 0.7.1, my system fails to boot automatically and i am dropped to recovery shell because the mdadm hook fails to create my correctly specified and configured md0 device; thereafter, cryptsetup fails because /dev/md0, where it expects an encrypted root raid0, does not exist. I can, however, boot the machine by executing: mdadm --assemble md0 cryptsetup luksOpen /dev/md0 root exit So the mdadm.conf file works and so does cryptsetup. Additional Information: The mdadm hook file looks like this: run_hook () { input="$(cat /proc/cmdline)" mdconfig="/etc/mdadm.conf" # for partitionable raid, we need to load md_mod first! modprobe md_mod 2>/dev/null # If md is specified on commandline, create config file from those parameters. if [ "$(echo $input | grep "md=")" ]; then #Create initial mdadm.conf # scan all devices in /proc/partitions echo DEVICE partitions > $mdconfig for i in $input; do case $i in # raid md=[0-9]*,/*) device="$(echo "$i" | sed -e 's|,/.*||g' -e 's|=||g')" array="$(echo $i | cut -d, -f2-)" echo "ARRAY /dev/$device devices=$array" >> $mdconfig ;; # partitionable raid md=d[0-9]*,/*) device="$(echo "$i" | sed -e 's|,/.*||g' -e 's|=|_|g')" array="$(echo $i | cut -d, -f2-)" echo "ARRAY /dev/$device devices=$array" >> $mdconfig ;; # raid UUID md=[0-9]*,[0-9,a-z]*) device="$(echo "$i" | sed -e 's|,.*||g' -e 's|=||g')" array="$(echo $i | cut -d, -f2-)" echo "ARRAY /dev/$device UUID=$array" >> $mdconfig ;; # partitionable raid UUID md=d[0-9]*,[0-9,a-z]*) device="$(echo "$i" | sed -e 's|,.*||g' -e 's|=|_|g')" array="$(echo $i | cut -d, -f2-)" echo "ARRAY /dev/$device UUID=$array" >> $mdconfig ;; esac done # If i add mdadm --assemble md0 here, everything works fine for me... fi } --------------------------------------------------------------------------- Is it normal that this file does not even try to execute mdadm and is a no-op if /etc/mdadm.conf is already existing/there are no md=... arguments? --------------------------------------------------------------------------- Output of mkinitcpio -v: `--> sudo mkinitcpio -v ==> Starting dry run: 2.6.39-ARCH adding file: /lib/firmware/tigon/tg3_tso5.bin adding file: /lib/firmware/tigon/tg3_tso.bin adding file: /lib/firmware/tigon/tg3.bin adding file: /lib/firmware/iwlwifi-5150-2.ucode adding file: /lib/firmware/iwlwifi-5000-5.ucode adding file: /lib/firmware/iwlwifi-6000g2b-5.ucode adding file: /lib/firmware/iwlwifi-6000g2a-5.ucode adding file: /lib/firmware/iwlwifi-6050-5.ucode adding file: /lib/firmware/iwlwifi-6000-4.ucode adding file: /lib/firmware/iwlwifi-100-5.ucode adding file: /lib/firmware/iwlwifi-1000-5.ucode -> Parsing hook: [base] adding dir: /proc adding dir: /sys adding dir: /dev adding dir: /run adding dir: //usr/bin adding dir: //usr/sbin adding file: /bin/busybox adding symlink: /lib/libc-2.14.so -> /lib/libc.so.6 adding file: /lib/libc-2.14.so adding symlink: /lib/ld-2.14.so -> /lib/ld-linux-x86-64.so.2 adding file: /lib/ld-2.14.so adding file: /sbin/modprobe adding file: /sbin/blkid adding symlink: /lib/libblkid.so.1.1.0 -> /lib/libblkid.so.1 adding file: /lib/libblkid.so.1.1.0 adding symlink: /lib/libuuid.so.1.3.0 -> /lib/libuuid.so.1 adding file: /lib/libuuid.so.1.3.0 adding file: /init_functions adding file: /init adding file: /etc/modprobe.d/usb-load-ehci-first.conf -> Parsing hook: [udev] adding file: /sbin/udevd adding symlink: /lib/librt-2.14.so -> /lib/librt.so.1 adding file: /lib/librt-2.14.so adding symlink: /lib/libpthread-2.14.so -> /lib/libpthread.so.0 adding file: /lib/libpthread-2.14.so adding file: /sbin/udevadm adding file: /lib/udev/rules.d/50-firmware.rules adding file: /lib/udev/rules.d/50-udev-default.rules adding file: /lib/udev/rules.d/60-persistent-storage.rules adding file: /lib/udev/rules.d/80-drivers.rules adding file: /lib/udev/firmware adding file: /lib/udev/ata_id adding file: /lib/udev/path_id adding file: /lib/udev/scsi_id adding file: /lib/udev/usb_id adding file: /etc/udev/udev.conf adding file: /hooks/udev -> Parsing hook: [autodetect] -> Parsing hook: [scsi] -> Parsing hook: [usbinput] -> Parsing hook: [keymap] adding file: /hooks/keymap -> Parsing hook: [mdadm] Custom /etc/mdadm.conf file will be used in initramfs for assembling arrays. adding file: /etc/mdadm.conf adding file: /sbin/mdadm adding file: /lib/udev/rules.d/64-md-raid.rules adding file: /hooks/mdadm -> Parsing hook: [encrypt] adding file: /sbin/cryptsetup adding symlink: /lib/libcryptsetup.so.1.2.0 -> /lib/libcryptsetup.so.1 adding file: /lib/libcryptsetup.so.1.2.0 adding symlink: /lib/libpopt.so.0.0.0 -> /lib/libpopt.so.0 adding file: /lib/libpopt.so.0.0.0 adding file: /lib/libdevmapper.so.1.02 adding symlink: /lib/libgcrypt.so.11.6.0 -> /lib/libgcrypt.so.11 adding file: /lib/libgcrypt.so.11.6.0 adding symlink: /lib/libgpg-error.so.0.7.0 -> /lib/libgpg-error.so.0 adding file: /lib/libgpg-error.so.0.7.0 adding symlink: /lib/libudev.so.0.11.5 -> /lib/libudev.so.0 adding file: /lib/libudev.so.0.11.5 adding file: /sbin/dmsetup adding file: /lib/udev/rules.d/10-dm.rules adding file: /lib/udev/rules.d/13-dm-disk.rules adding file: /lib/udev/rules.d/95-dm-notify.rules adding file: /lib/udev/rules.d/11-dm-initramfs.rules adding file: /hooks/encrypt -> Parsing hook: [filesystems] ==> Generating module dependencies ==> Dry run complete, use -g IMAGE to generate a real image ------------------------------------------------------------------------------------ The /etc/mdadm.conf file contains a valid configuration for /dev/md0. ------------------------------------------------------------------------------------ Finally, a shortened version of my /etc/mkinitcpio.conf: MODULES="" BINARIES="" FILES="" HOOKS="base udev autodetect scsi usbinput keymap mdadm encrypt filesystems" COMPRESSION="xz" Steps to reproduce: I do not know; does everything work fine if you use an encrypted root on a raid0 device? |
This task depends upon
Closed by Tobias Powalowski (tpowa)
Sunday, 22 January 2012, 10:52 GMT
Reason for closing: Fixed
Additional comments about closing: if ARRAY is defined it's fixed!
Sunday, 22 January 2012, 10:52 GMT
Reason for closing: Fixed
Additional comments about closing: if ARRAY is defined it's fixed!
It is strange that the mdadm hook tries to do anything at all, all of that is unnecessary and will probably be without effect.
Renaming the file was the only thing I changed.
The fact that you end up without ahci after changing a file completely unrelated to mkinitcpio tells me you have no idea what you are doing, and your problem is most likely a case of PEBKAC. If it is not, please provide the correct debugging information.
This is just a standard mdadm raid1 setup, no lvm or luks.
My system is booting correctly with the changed hook - I do not care whether udev creates my root device automatically or my changed version of the mdadm hook file does.
This might also be something wrong in my udev configuration, though I did not touch any of this, but as a user I was not able to figure out which component was failing; the only thing I noticed was that there was a mkinitcpio update and the regenerated initramfs did not boot correctly with an unchanged configuration and the md was not build correctly by the time the mdadm hook was running which was the case in the former version.
From this I concluded this might well be a bug in mkinitcpio because none of the other components or the configuration changed.
I then changed the hook file to fix the annoying 20 second wait at startup and the need to bring up my md manually.
@Zac, your last comment stated that you lost ahci in your initramfs after moving mdadm.conf, which are entirely unrelated, so I am pretty sure you are doing something wrong.
Before I had a rather specific "DEVICE /dev/disk/by-id/ata-Hitachi*" line, which stopped working.
The default "DEVICE partitions" was OK.
- devices linked from /dev/disk/by-id/
- devices linked from /dev/md/
- "ordinary" devices, e.g. /dev/sdXN
- devices linked from /dev/disk/by-uuid/
- devices linked from /dev/disk/by-label/
(I could be wrong about this, though; I find udev rules cryptic and don't understand them very well...)
If udev tries "mdadm -I" on a device that is a RAID member, but isn't mentioned in a DEVICE line (say, because udev is calling "mdadm -I /dev/sda1" but mdadm.conf names that partition as /dev/disk/by-id/something-part1 instead), the call will fail and the array will not assemble.
Small rant:
Doing RAID assembly with udev when there is a custom mdadm.conf strikes me as overly complex -- the arrays could be assembled by just calling "mdadm --assemble" for each ARRAY named in mdadm.conf (as the original reporter is doing manually). That's a lot simpler and more transparent in case of problems than a big mess of udev rules that assemble the array if and only if udev happens to, more or less by brute force, try all of the devices that might be mentioned in DEVICE and ARRAY lines.
Sorry, but I'm getting tired of RAID arrays breaking on upgrades -- "persistent" device names aren't looking so persistent.
~Felix.
PS: Jens, the specific names you had were better, and you should try to get them working again. "partitions" makes it too easy to accidentally pull/replace the wrong device when a disk fails, because it's difficult to tell which one failed.
My mdadm.conf contains: DEVICE partitions
udev read attributes on block devices which are created earlier in the ruleset. Anything marked with an ID_FS_TYPE of 'linux_raid_member' or 'isw_raid_member' invokes mdadm --incremental to try and start the array. If an array is started, the kernel creates an md* device, which triggers creation the aforementioned symlinks.
I'm not sure I follow the griping about (lack of?) persistent naming. Label your filesystem. Boot with root=LABEL=mysweetfs or root=/dev/disk/by-label/mysweetfs. Furthermore, inclusion of an mdadm.conf gives the array an appropriate name on incremental assembly so you can identify by something like /dev/md/mysweetraid. You're far from out of options here. It's not really clear to me what's broken to the point that people are unable to boot...
https://bbs.archlinux.org/viewtopic.php?id=81674
Short version:
Relying on drives (specifically, the *data* on the drives) to identify themselves is dangerous, for a variety of reasons.
Examples of when using UUID's or labels for raid members breaks down:
- During a drive failure, the system will go looking for the remaining RAID members. It will blindly add *any* devices that it finds with matching UUID's -- even if they're in a ridiculous location like an external USB port. Were you trying to recover a failed drive plugged into USB temporarily? Oops, you just booted to it and probably corrupted all the data badly.
- Suppose you and a friend both label your root device "root". His power supply blows, so you plug his drive into your eSATA port to retrieve some files for him. Oops, your system just booted to his drive because it initializes faster than yours.
- Suppose one of your RAID members fails. Which drive was it that failed, and what is it connected to?
Telling the system to boot from a path you know to be "the first SATA port on the motherboard inside the machine" is safer and avoids these problems.
~Felix.
Wait... does this mean that there is now *no way* to persistently name RAID members? That we're down to the equivalent of the old RAID auto-detect/assemble and that's our only choice?
Conversely, calling mdadm -I on a device that's already in an assembled array is also harmless (though it does output a warning message about the device being busy). I think udev would be finished with its attempts to create arrays by the time this runs, though?
I've tested this on a virtual machine; will test it on my real server at the office later.
~Felix.
~Felix.
grep ^ARRAY /etc/mdadm.conf | while read _ array; do
mdadm --assemble "$array"
done
I'm not even sure we want something like this, but I think we can all agree that we don't want a broken patch.
Well, I think there's an mdadm command line that'll cause mdadm to go through mdadm.conf, parse it all properly, and do the assembling. "mdadm --assemble --scan" or something?
There is (was) also a program called "mdassemble" that existed before raid assembly was moved over to udev rules, that did all of this quite nicely. But talking about mdassemble (which we're reinventing here, poorly) is basically asking for the whole "let's do this in udev" change to be completely thrown out, so I'm not sure that'll get us anywhere.
~Felix.
It would help a lot if anyone here would actually provide details about their raid setup - I cannot reproduce this problem, auto-assembly works fine for all my systems.
As I understand the current rules auto-assembly behaviour, it only looks at the actual device nodes -- which may not be persistently named -- and not any symlinks to them. This doesn't matter if assembly is done by reading identifiers off each partition (UUID's) and then matching them up into arrays, e.g. if the two pieces of a RAID-1 are /dev/sda1 and /dev/sdb1 one boot but /dev/sdb1 and /dev/sdc1 the next boot, it'll still assemble correctly because udev will just pass those to mdadm -I one at a time, and mdadm will figure out which array each new device should go into.
But if assembly is to be done by saying, "I want this partition on this physical disk and this partition on this physical disk in my array," (which I vastly prefer), this cannot be done by triggering on device names that are subject to change across boots. The conundrum is this:
- udev will only trigger on /dev/sdX names
- so, /dev/sdX names must be used in mdadm.conf; otherwise, mdadm will not know what to do with the argument given to -I
- but, /dev/sdX names are subject to change, and thus *cannot* be used in mdadm.conf
Where this becomes really apparent is when a hard drive fails -- even though /dev/sdX names are normally quite stable, if a disk is not detected or simply not present, everything after that disk will shift its name by one place. /dev/sdc will become /dev/sdb if the existing /dev/sdb dies. In the old days, the /dev/ names were stable -- /dev/hdb always meant "disk in the primary slave position" and it would stay /dev/hdb even if the primary master disk died. Using /dev/hdX naming, one *could* write a good mdadm.conf for these udev rules. But those days are gone, and if one wants to refer to disks based on where they're connected, in a way that remains stable even if earlier disks are removed, one needs to use /dev/disk/by-path .
(Ultimately, I'm still of the opinion that removing the association between where a disk is connected and its default name in /dev/ was a mistake -- I would much prefer it if /dev/sda meant "SATA connector 1" instead of "the first disk that initializes, regardless of where it is" -- but that's neither here nor there :) )
~Felix.
In any case, I now finally know why this hook fails as it is (at least in your case) - using your configuration, you actually prevent mdadm from auto-assembling. You are right that udev calls mdadm before it creates the symlinks, thus your mdadm.conf is not working.
There is a fundamental problem with your approach though: The device creation is asynchronous, meaning it can take time until the devices show up - mkinitcpio might call the mdadm assembly before the devices are created. This problem is actually solved by udev auto-assembly.
I could think of this solution: Split the mdadm into two hooks:
1) mdadm - only add /sbin/mdadm, the udev rule and /etc/mdadm.conf, no runtime hook.
2) mdadm_manual - add the above, omit the udev rule and add the old runtime hook.
I've never run into that -- doesn't putting sata and pata before mdadm in HOOKS solve this? Or does initcpio not wait for hooks to finish running?
Anyway, based on earlier experiments I've done, I'm not convinced that matching UUID's is completely safe -- for instance, I've been able to cause bad RAID assembly, and in some cases data corruption by:
- dd'ing a partition onto an external USB drive -- when auto-assembling, mdadm doesn't care where something is and can pull it into the array. This is bad if the reason the disk was on USB was because you were in the middle of trying to recover it.
- plugging in the components of an array from a different system. all the components are there, so mdadm starts it -- maybe the reason I plugged those drives in was for data recovery, and the disks shouldn't be used except for dd'ing onto backups.
- maliciously creating a disk with the same UUID and superblock. mdadm will assume that such a device is clean and add it to the array, even though the data might differ completely from what's on the good disk. (Granted, attacks of this kind are quite unlikely, but possible if one has physical access to USB ports but not inside the machine's case. Where might this be true? Something like a "Userful" setup, commonly used in schools and libraries.) What bothers me about this is that an unprivileged user can (on their own machine) create a disk with the correct UUID and superblock, then plug it into a machine on which they do not have root, and have mdadm consider the disk on a level playing field with the internal SATA ports.
- marking a disk 'failed' -- apparently, doing this does not prevent the 'failed' disk from being a member of an array in future. e.g. try this:
1. make a RAID-1 array /dev/md0 of /dev/sda1 and /dev/sdb1
2. mdadm /dev/md0 --fail /dev/sdb1 ; mdadm /dev/md0 --remove /dev/sdb1
3. now stop /dev/md0 and then mdadm --assemble --scan
mdadm recognizes that the failed disk (sdb1) should not be paired with the good one (sda1) -- but it still makes it part of an array (in the example I just tried, it calls it /dev/md/0_0). So now there are two arrays:
/dev/md0 -- consisting of the good disk, /dev/sda1
/dev/md127 -- consisting of the failed disk, /dev/sdb1
All of these scenarios are, in my opinion, serious problems that can be avoided by being explicit about what is and is not part of each array. If we rely on mdadm to scan and dynamically assemble things based on their UUID, it is difficult (if not impossible, as in that last case) to stop a particular disk from being used -- either because it is known to be a bad disk, or because it doesn't belong in the running system.
~Felix.
physical access is root access. if we're going to consider this scenario and account for it, why not dispatch ravenous badgers to protect your computer and cover all sorts of other scenarios?
>> I've never run into that -- doesn't putting sata and pata before mdadm in HOOKS solve this? Or does initcpio not wait for hooks to finish running?
Not all items in HOOKS have a runtime component. Use "lsinitcpio -a /path/to/image". The last item in the output is a list of hooks which do have, and will run, a script for bootstrap.
>> - dd'ing a partition onto an external USB drive -- when auto-assembling, mdadm doesn't care where something is and can pull it into the array. This is bad if the reason the disk was on USB was because you were in the middle of trying to recover it.
Change the UUID of the newly created partition on the external.
>> marking a disk 'failed' -- apparently, doing this does not prevent the 'failed' disk from being a member of an array in future.
This is why you zero the superblock with mdadm after removing it from the array.
>> plugging in the components of an array from a different system. all the components are there, so mdadm starts it.
I don't even understand this point. If the disks are to be used as backup, why is there evidence of an array on it? Zero it out.
physical access is root access."
There are different levels of physical access -- keyboard, mouse, display, and USB port should not be equivalent to "complete access to the innards." A brief description of the "Userful" system I mentioned:
There is one large computer, which is in a locked cabinet or perhaps another room. Users never see this box. It is connected to multiple sets of keyboards, monitors, mice (i.e. it's running multi-headed X), each with their own USB port on the end of an extension cable or a USB hub. Users can plug in their own USB drives and work with files on them.
">> marking a disk 'failed' -- apparently, doing this does not prevent the 'failed' disk from being a member of an array in future.
This is why you zero the superblock with mdadm after removing it from the array."
That's possible when you're simulating a failure explicitly with mdadm, but disks don't zero out own their superblocks when they disappear from the array for another reason (temporary failure due to bad cabling, power supply issues, etc.). Their reappearance is quite likely to be at the next power cycle, so zeroing will have to be done by booting to alternate media. If the reboot happens unattended, you could suddenly have a system with twice as many arrays as it should, which could be a mess to sort out. And if filesystems are mounting by UUID, who knows what would get mounted (since UUID's would still be the same across array pieces). Actually, this sounds fun to simulate; I'm going to try it next time I'm at work and report back :)
Also, is it safe to zero the superblock if you might want to recover data off that piece of the array later?
">> plugging in the components of an array from a different system. all the components are there, so mdadm starts it.
I don't even understand this point. If the disks are to be used as backup, why is there evidence of an array on it? Zero it out."
I didn't mean that the array was being used for any regular purpose; this is a data recovery case. The system that owned the array originally died (say, fried motherboard or something) and the disks have been moved to a different system to check their health and possibly recover them. So, zeroing out the superblocks is not a good idea, since you might want to try starting the array again (but on your own terms -- not automatically, and not before you've had a chance to check the disks).
~Felix.
The setup is as follows:
/dev/md0 /dev/sd[abcd]1 (raid1)
/dev/md1 /dev/sd[abcd]2 (raid0)
/dev/md2 /dev/sd[ab]3 (raid0)
/dev/md3 /dev/sd[cd]3 (raid0)
/dev/md4 /dev/md[23] (raid1)
As you can see, I am using raid0+1 where two striping arrays are mirrored. I have just upgraded my system after a pretty long time, and I notice that md4 is not being built. The rest is ok. I will now test the patch suggestion.
*Edit* Yes, the patch worked for me. Thanks.
# cat /etc/mdadm.conf
ARRAY /dev/md1 level=raid0 num-devices=3 metadata=0.90 UUID=fc6fcb91:68a75a29:b88acab5:adec4669
ARRAY /dev/md2 level=raid0 num-devices=3 metadata=0.90 UUID=de4e3121:f431727a:a6faa410:2054c963
ARRAY /dev/md0 level=raid1 num-devices=2 metadata=0.90 UUID=708ad3f4:71151a9e:2886212e:c7b400f1
ARRAY /dev/md3 level=raid0 num-devices=3 metadata=0.90 UUID=8e83828a:f27d4277:52349015:4ece80f2
ARRAY /dev/md4 level=raid0 num-devices=3 metadata=0.90 UUID=24c0389f:5157f6ca:58a3bbf9:facd50e1
ARRAY /dev/md5 level=raid1 num-devices=2 metadata=0.90 UUID=b5ac45b4:5d186d3c:530cf6cd:548af09e
ARRAY /dev/md6 level=raid1 num-devices=2 metadata=0.90 UUID=78aa9709:15503fd7:35e7b8f4:67d09af7
I have a raid10 setup for / and /home .
With or without mdadm hook, before or after udev hook ... nothing works for me.
I get stuck at boot time while attemtping to mount "md5". I can manually add md5 and md6 but
system doesn't boot. I also tried patching mdadm hook. No joy.
This box serves email, CVS and file sharing. Important box. Help appreciated.
I managed to boot system after couple of resets. Well sort of. Manual intervention was needed.
(mdadm -A --scan) which didn't work on previous reboots.
The only thing I can think of is some kind of "race condition" between udev and mdadm hooks setting
up md arrays. When mdadm prevailed with "correct" mdx names for my setup I could start it up with
(mdadm -A --scan)
Still doesn't boot automatically.
Q: How can udev hook setup raid array that has raid array members? How do we tell it to use names we wan't?
To take udev out of the equation completely, delete or move the file /lib/udev/rules.d/64-md-raid.rules and re-run mkinitcpio. Then mdadm will be the only thing running the assembly, and it should happen in the correct order and bring all your arrays up.
~Felix.
http://sprunge.us/KLZX
(or if someone comes here years from now, send me a message via the forums if it has expired and you want it.)
As an aside, I don't want to suggest that there should be no more attempts to fix this bug. There should, I suppose. But at the same time, udev sucks :) It used to be two steps forward from devfs, but now it is one or two steps backward from that. I recently saw it for the matured but untaimed beast that it has become: it is too conflicting with other suddenly deprecated applications, too "dark" or obscure to understand, and has too many odd configuration files to entertain any casual or professional sysadmin; in general they would rather play with more interesting parts of Linux until udev itself becomes deprecated. I'll hang out the flag when that happens.. and meanwhile I'll just write speedy workarounds :)
cat /lib/initcpio/hooks/mdadm
# vim: set ft=sh:
run_hook ()
{
input="$(cat /proc/cmdline)"
mdconfig="/etc/mdadm.conf"
[ -f $mdconfig ] && mdadm -A --scan
}
Default mdadm hook does not work.
EDIT:(replaced filename mdadm.simple with mdadm)