FS#42884 - [linux] [systemd] [mkinitcpio] failed boot with root on btrfs multi-device
Attached to Project:
Arch Linux
Opened by Radek Podgorny (rpodgorny) - Sunday, 23 November 2014, 15:13 GMT
Last edited by freswa (frederik) - Wednesday, 12 February 2020, 11:53 GMT
Opened by Radek Podgorny (rpodgorny) - Sunday, 23 November 2014, 15:13 GMT
Last edited by freswa (frederik) - Wednesday, 12 February 2020, 11:53 GMT
|
Details
after a recent kernel update (3.17.2 to 3.17.3) all my
systems with root on btrfs multi-device setup fail too boot
with:
BTRFS: open_ctree failed ...this turns out be a result of failure to read the device list. a simple "btrfs device scan" and mounting root by hand fixes things. after some googling, it seems to be the btrfs module is not being loaded at boot anymore. adding it to MODULES= in /etc/mkinitcpio.conf fixes things but that's just a workaround. i'm not sure if this a kernel's, udev's or initcpio's (or the btrfs initcpio hook) fault but is used to work perfectly and is now broken. unfortunately i can't pinpoint the exact versions by experiments as all my multi-device systems are production, sorry. see the linked threads in bbs for more info: https://bbs.archlinux.org/viewtopic.php?id=189845 https://bbs.archlinux.org/viewtopic.php?id=189987 all in all, it would be nice to make an official notice because: 1) it breaks working systems 2) it takes quite some time to debug this 3) it's quite simple to ruin your filesystem while trying to "fix" it because the error is quite cryptic and fools you into thinking something went wrong with the filesystem itself thanks! |
This task depends upon
Closed by freswa (frederik)
Wednesday, 12 February 2020, 11:53 GMT
Reason for closing: None
Additional comments about closing: Seems stalled. Please request re-open if this is still an issue.
Wednesday, 12 February 2020, 11:53 GMT
Reason for closing: None
Additional comments about closing: Seems stalled. Please request re-open if this is still an issue.
It's likely that this is a kernel problem. It's unfortunate that you aren't willing to help debug this.
anyway, it's not me now willing to debug this, it's just that i have no machine to test it on. :-( hopefully, others will help - i've linked this report on the bbs threads. just doing my part by posting this bug report instead of just ranting at the forums. ;-)
Again, I refer you to the hook's help message itself:
$ mkinitcpio -H btrfs
This hook provides support for multi-device btrfs volumes. This hook
is only needed for initramfs images which do not use udev.
With udev in the initramfs, udev rules handle device discovery and assembly -- see /usr/lib/udev/rules.d/64-btrfs.rules (included via the udev hook)
HOOKS="base udev autodetect modconf block filesystems keyboard fsck"
"btrfs" was also in there though I forget the exact position from before I modified it.
BTRFS 4 disk RAID1 filesystem.
About half the time it would fail to mount and the only 'fix' I found was to reboot and retry.
The btrfs mkinitcpio hook seems to be gone in the latest mkinitcpio
pacman -Qi mkinitcpio
Name : mkinitcpio
Version : 18-2
...
mkinitcpio -L
==> Available hooks
autodetect encrypt keyboard mdadm_udev pata¹ scsi¹ sd-vconsole systemd usr
base filesystems keymap memdisk pcmcia sd-encrypt shutdown udev virtio¹
block fsck lvm2 mmc¹ resume sd-lvm2 sleep usb¹
consolefont fw¹ mdadm modconf sata¹ sd-shutdown strip usbinput²
¹ This hook is deprecated in favor of 'block'
² This hook is deprecated in favor of 'keyboard'
---
This hook was probably what made my boot solid previously.
I suspect there was a race between the udev hook and the modprobe for btrfs.
The fix below seems to have resolved the problem here.
edit /etc/mkinitcpio.conf
make sure your MODULES line has at least: MODULES="btrfs xor raid6_pq"
regenerate the image with 'mkinitcpio -p linux' where linux is the name of the image you need to regenerate (here its linux-stable-git)
reboot
The xor and rand6_pq are added to be sure all modules need for btrfs raid5/6 are loaded too.
You may also want to modify your BINARIES to include at least: BINARIES="/usr/bin/btrfsck"
which could save you bacon if /usr/bin happens to be in an unmountable btrfs fs.
It hasn't been part of mkinitcpio for years. btrfs-progs owns it.
Looks like the aur/btrfs-progs-git is not adding the hook which would explain why its
gone here.
Edit: I'm an idiot and should know more about how early boot works.
https://bugs.freedesktop.org/show_bug.cgi?id=88483
I am the one who posted the bug above. However, adding the "btrfs xor raid6_pq" modules doesn't help, and apparently it is not a "btrfs scan device" problem. Since it looks like that one is Arch specific too, I guess I'll file a bug report in here too...
Adding 'btrfs' to the MODUUES sections and removing it from de HOOKS section of /etc/mkinitcpio.conf, apparentely solves the problem in my system.
2 root and home on subvol.
3 I just add btrfs to MODULES= in /etc/mkinitcpio.conf, always work
So there must be something wrong in the systemd hook.
I have the following setup. I tried other setups, but they failed as well. This seems to be the only config that worked at least for home1.
/etc/crypttab
home1 UUID=xxxx1 /root/key1 luks
home2 UUID=xxxx2 /root/key2 luks
home3 UUID=xxxx3 /root/key3 luks
/etc/fstab
/dev/mapper/home1 /home btrfs defaults,noatime,compress=lzo,device=/dev/mapper/home1,device=/dev/mapper/home2,device=/dev/mapper/home3 0 0
By no means a fix, but a potential workaround to others who are suffering this issue has been added to the ArchWiki and is also explained in a blog post here: https://blog.samcater.com/fix-for-btrfs-open_ctree-failed-when-running-root-fs-on-raid-1-or-raid10-arch-linux/
In a nutshell, use a single disk as an identifier and let btrfs discover the rest of the array during boot up. Group identifiers in /etc/fstab seem to contribute to the problem. Like I say, a workaround and not a fix, but others may find it useful.
"So long as the chosen disk survives, everything is fine. In theory I have a 25% chance of that particular disk failing and leaving me locked out."
...still better than nothing. ;-)
I suspect this is just a timing issue.
My root partition is a BTRFS RAID0 array of 2 usb sticks.
When the issues occurs this is what happens:
1. first drive gets detected by the kernel
2. udev hook tries to mount it
3. btrfs array is incomplete so that fails
4. second drive gets detected by the kernel
5. if I try to mount the array in the emergency console, it works without any error or any other manual action because the array is now complete
Now if I just add the btrfs mkinitcpio hook before udev, here is what happens:
1. btrfs hook scans for btrfs drives
2. the scan blocks until all drives are available
3. udev hook mounts the array without issue
EDIT:
Adding the btrfs hook was not enough to fix the issue, in the end to fix it for good I had to:
* Add a sleep hook, just after the block hook in /etc/mkinitcpio.conf (order is important, more on that below)
* Add sleepdevice=/dev/xxx (/dev/disk/by-id/yyy or /dev/disk/by-uuid/zzz work too) in the kernel command line boot parameters (in /etc/default/grub in my case). That is needed so that the sleep hook waits for the device to appear, and continues the boot process as soon as it is available. It is important that the sleep hook is after the block hook otherwise block devices won't be detected no matter how long you wait.
* Add a btrfs hook just after the sleep hook. Having the block device is not enough, if btrfs has not identified it is part of a RAID array, so that hook is needed to scan the new devices.
Now I finally have a reliable boot :)
It was running happily, but I had made a mistake in the partitioning of it's µSD (mounting without lzo enabled), and thus root was to small/full to install some stuff.
Thus I repartioned and re-subvolumed (like @, @home...) and restored state from backup - as the card is big enough, but a bit worn, I thought it a good idea to half it and make a raid1-on-one disk setup which lead to the strange error message:
sector size 4096 not supported yet, only support 65536 - from https://patchwork.kernel.org/patch/8836881/
I do not think aarch64 has such huge pages, but I don't know. So maybe this constant SECTOR_SIZE is wrong for some reason? - might be related or not.
It does not matter on which platform I generate the btrfs system.
I also tried "-m dup -d dup" instead of two partitions - the effect is the same, the ctree fails to be opened.
There are differences to desbmas observation - I can't mount anything in the ermergency shell.
Also, I have btrfs in the "modules" section, not in the HOOKS section for mkinitcpio.conf.
What I find strange also is, that the odroid-image has btrfs support compiled into initrd, and the pi3 image has a module only.
I tried ignoring the pi3-boot style of booting by UUDID and use raw /dev/mmblk0p2 (p1 being the /boot-fs) and also "bash makscr" in /boot didnt do anything.
So the really strange thing here is, that it happens on single device setup with dup or two raid1 parts as well...
However, I will check if anything changes when adding the sleep / block / sleepdevice stuff some time (if I figure out how to chroot and make initcpio on the sdcard on a differnt machine...
1. Run `while ! dev=$(blkid -lt UUID=$FS_UUID -o device); do { true; } done; mount $dev /mnt/` in a terminal
2. Run `losetup -f one.img ; losetup -f two.img` in another terminal
btrfs module loaded beforehand or not, the `mount` command could fail. If it does not, add `sleep 1` (or even less) between the two `losetup` commands, for in reality the time gap could possibly be even bigger.
This is exactly what could happen with the mkinitcpio init, udev hook or btrfs hook. Neither udev or `btrfs device scan` guarantees that all the required btrfs devices has shown up before the mount handler is called.
We need to wait until `btrfs device ready $root` returns 0 with the btrfs hook (which will be needed no matter the udev hook is used or not). And because it needs $root resolved, we need to do `resolve_device "$root"` in the hook as well.