FS#42884 - [linux] [systemd] [mkinitcpio] failed boot with root on btrfs multi-device

Attached to Project: Arch Linux
Opened by Radek Podgorny (rpodgorny) - Sunday, 23 November 2014, 15:13 GMT
Last edited by freswa (frederik) - Wednesday, 12 February 2020, 11:53 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Dave Reisner (falconindy)
Architecture All
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 20
Private No

Details

after a recent kernel update (3.17.2 to 3.17.3) all my systems with root on btrfs multi-device setup fail too boot with:

BTRFS: open_ctree failed

...this turns out be a result of failure to read the device list. a simple "btrfs device scan" and mounting root by hand fixes things. after some googling, it seems to be the btrfs module is not being loaded at boot anymore. adding it to MODULES= in /etc/mkinitcpio.conf fixes things but that's just a workaround.

i'm not sure if this a kernel's, udev's or initcpio's (or the btrfs initcpio hook) fault but is used to work perfectly and is now broken.

unfortunately i can't pinpoint the exact versions by experiments as all my multi-device systems are production, sorry. see the linked threads in bbs for more info:

https://bbs.archlinux.org/viewtopic.php?id=189845
https://bbs.archlinux.org/viewtopic.php?id=189987

all in all, it would be nice to make an official notice because:
1) it breaks working systems
2) it takes quite some time to debug this
3) it's quite simple to ruin your filesystem while trying to "fix" it because the error is quite cryptic and fools you into thinking something went wrong with the filesystem itself

thanks!
This task depends upon

Closed by  freswa (frederik)
Wednesday, 12 February 2020, 11:53 GMT
Reason for closing:  None
Additional comments about closing:  Seems stalled. Please request re-open if this is still an issue.
Comment by Dave Reisner (falconindy) - Sunday, 23 November 2014, 15:24 GMT
Mentioning udev and the btrfs hook in the same sentence doesn't make a whole lot of sense -- if you use the udev hook (or systemd), then the btrfs hook is useless. The hook's help message even tells you as much.

It's likely that this is a kernel problem. It's unfortunate that you aren't willing to help debug this.
Comment by Radek Podgorny (rpodgorny) - Sunday, 23 November 2014, 15:42 GMT
i'm not that familiar with the details of the boot process but it seems to me the btrfs hook is still needed for multi-device systems since it runs 'btrfs device scan' which is essential.

anyway, it's not me now willing to debug this, it's just that i have no machine to test it on. :-( hopefully, others will help - i've linked this report on the bbs threads. just doing my part by posting this bug report instead of just ranting at the forums. ;-)
Comment by Dave Reisner (falconindy) - Sunday, 23 November 2014, 15:49 GMT
> i'm not that familiar with the details of the boot process but it seems to me the btrfs hook is still needed for multi-device systems
Again, I refer you to the hook's help message itself:

$ mkinitcpio -H btrfs
This hook provides support for multi-device btrfs volumes. This hook
is only needed for initramfs images which do not use udev.

With udev in the initramfs, udev rules handle device discovery and assembly -- see /usr/lib/udev/rules.d/64-btrfs.rules (included via the udev hook)
Comment by David M. (Davidma) - Sunday, 23 November 2014, 20:03 GMT
It also happened earlier this week for me with the linux-mainline (AUR) and linux-lts kernel in addition to the standard Arch kernel. Therefore I doubt it is isolated to any one kernel. I since used the MODULES= workaround but I can confirm that my HOOKS= section in mkinitcpio.conf included udev:

HOOKS="base udev autodetect modconf block filesystems keyboard fsck"

"btrfs" was also in there though I forget the exact position from before I modified it.

BTRFS 4 disk RAID1 filesystem.
Comment by ... (spider007) - Sunday, 23 November 2014, 21:31 GMT
I noticed I could only mount the 'last' of my 4 disk RAID10 system (manually). I didn't enable the hook so maybe this is time related. It's a very nasty issue, quite time-consuming and invasive
Comment by ed tomlinson (edt) - Sunday, 07 December 2014, 22:32 GMT
I was having problems with mounting my home fs which is a btrfs raid1 volume mounted by uuid with x-systemd.automount
About half the time it would fail to mount and the only 'fix' I found was to reboot and retry.

The btrfs mkinitcpio hook seems to be gone in the latest mkinitcpio

pacman -Qi mkinitcpio
Name : mkinitcpio
Version : 18-2
...

mkinitcpio -L
==> Available hooks
autodetect encrypt keyboard mdadm_udev pata¹ scsi¹ sd-vconsole systemd usr
base filesystems keymap memdisk pcmcia sd-encrypt shutdown udev virtio¹
block fsck lvm2 mmc¹ resume sd-lvm2 sleep usb¹
consolefont fw¹ mdadm modconf sata¹ sd-shutdown strip usbinput²

¹ This hook is deprecated in favor of 'block'
² This hook is deprecated in favor of 'keyboard'
---

This hook was probably what made my boot solid previously.
I suspect there was a race between the udev hook and the modprobe for btrfs.
The fix below seems to have resolved the problem here.

edit /etc/mkinitcpio.conf
make sure your MODULES line has at least: MODULES="btrfs xor raid6_pq"
regenerate the image with 'mkinitcpio -p linux' where linux is the name of the image you need to regenerate (here its linux-stable-git)
reboot

The xor and rand6_pq are added to be sure all modules need for btrfs raid5/6 are loaded too.

You may also want to modify your BINARIES to include at least: BINARIES="/usr/bin/btrfsck"
which could save you bacon if /usr/bin happens to be in an unmountable btrfs fs.
Comment by Dave Reisner (falconindy) - Sunday, 07 December 2014, 22:40 GMT
> The btrfs mkinitcpio hook seems to be gone in the latest mkinitcpio
It hasn't been part of mkinitcpio for years. btrfs-progs owns it.
Comment by ed tomlinson (edt) - Sunday, 07 December 2014, 23:23 GMT
I had some corruption issues that needed the git btrfs-progs along with a few patches.
Looks like the aur/btrfs-progs-git is not adding the hook which would explain why its
gone here.
Comment by smikims (smikims) - Friday, 12 December 2014, 11:15 GMT
Is it possible this is at all related to a typo I found in initcpio-hook-udev in the systemd package? The first line is #!/usr/bin/ash when I'm assuming it should be #!/usr/bin/bash.

Edit: I'm an idiot and should know more about how early boot works.
Comment by Dave Reisner (falconindy) - Friday, 12 December 2014, 12:20 GMT
That's not a typo. We use busybox ash in early userspace. Regardless, the script is sourced, not executed, so it's only there to serve as a hint to the reader about what syntax is acceptable.
Comment by Paolo (palmaway) - Friday, 16 January 2015, 07:30 GMT
Possibly related:
https://bugs.freedesktop.org/show_bug.cgi?id=88483
I am the one who posted the bug above. However, adding the "btrfs xor raid6_pq" modules doesn't help, and apparently it is not a "btrfs scan device" problem. Since it looks like that one is Arch specific too, I guess I'll file a bug report in here too...
Comment by Dave Reisner (falconindy) - Sunday, 15 March 2015, 14:08 GMT
Is this still a problem with systemd 219?
Comment by Javier Viñal (fjvinal) - Wednesday, 01 April 2015, 09:45 GMT
The issue continues with the last systemd.
Adding 'btrfs' to the MODUUES sections and removing it from de HOOKS section of /etc/mkinitcpio.conf, apparentely solves the problem in my system.
Comment by Carlos Silva (r3pek) - Friday, 08 May 2015, 11:33 GMT
Don't know what you guys are doing, but I never touched mkinitcpio.conf and my BTRFS RAID5 volume(s) works like a charm (/{,home,etc}). I actually never had any problem with it.
Comment by rainer (raneon) - Monday, 18 May 2015, 19:25 GMT
I do have this issue as well with my vm with 2 encrypted disks btrfs RAID1 (/home) and the kernel 4.0.3. Usually in the past months I just needed to boot twice to get the vm up and running, but today I had to try it 7 times and that is why I found this bug report.
Comment by rainer (raneon) - Friday, 10 July 2015, 13:36 GMT
I still have this issue on linux 4.1.1 and systemd 221.
Comment by Aleksa Sarai (cyphar) - Friday, 17 July 2015, 11:07 GMT
There is a similar bug for booting with a `vfat` formatted drive in /etc/fstab. You need to add `vfat` to MODULES= in /etc/mkinitcpio.conf. This bug was caused in 4.1.2-2-ARCH.
Comment by Nicholas Yim (nyim) - Thursday, 30 July 2015, 02:59 GMT
1 btrfs raid1 with two ssd
2 root and home on subvol.
3 I just add btrfs to MODULES= in /etc/mkinitcpio.conf, always work
Comment by rainer (raneon) - Saturday, 03 October 2015, 11:38 GMT
I still have this issue with linux 4.2.2-1 and systemd 226. Config: VM with 3 encrypted disks BTRFS RAID1. Without btrfs in modules in mkinitcpio.conf I had more that 15 failed boots, so it seems to get worse.
Comment by rainer (raneon) - Tuesday, 20 October 2015, 21:34 GMT
I thougt that btrfs in modules in /etc/mkinitcpio.conf would be a solution as when I added it last time via chroot my system started straigt away. But now after updating to linux 4.2.3-1 and systemd 227-1 I had to try to boot again more than 15 times. To get the system up and running seems to depent now just from a random lucky constellation during boot. But I don't know which one. I would appreciate if somebody would step in to reproduce these issues. Don't know what a normal user like me could do to get this fixed. This is such an annoying bug and it appears to get worse with latest kernel/systemd versions.
Comment by Jiachen Yang (farseerfc) - Thursday, 29 October 2015, 15:32 GMT
I have been experiencing this problem for several month, then I found that removing systemd hook from HOOK array solved my problem.
So there must be something wrong in the systemd hook.
Comment by rainer (raneon) - Sunday, 01 November 2015, 23:06 GMT
I do not have systemd in hook in /etc/mkinitcpio.conf.
Comment by rainer (raneon) - Sunday, 22 November 2015, 23:20 GMT
I've recognized, that my virtual server starts without any issue if it picks up coincidently my disk 1. disk 2 and 3 will not work, systemd will not finish to boot. So this explains now why I needed lately so many reboots to get it up and running.

I have the following setup. I tried other setups, but they failed as well. This seems to be the only config that worked at least for home1.

/etc/crypttab
home1 UUID=xxxx1 /root/key1 luks
home2 UUID=xxxx2 /root/key2 luks
home3 UUID=xxxx3 /root/key3 luks

/etc/fstab
/dev/mapper/home1 /home btrfs defaults,noatime,compress=lzo,device=/dev/mapper/home1,device=/dev/mapper/home2,device=/dev/mapper/home3 0 0




Comment by Sam (cellardoor) - Tuesday, 09 August 2016, 19:21 GMT
Dear All,

By no means a fix, but a potential workaround to others who are suffering this issue has been added to the ArchWiki and is also explained in a blog post here: https://blog.samcater.com/fix-for-btrfs-open_ctree-failed-when-running-root-fs-on-raid-1-or-raid10-arch-linux/

In a nutshell, use a single disk as an identifier and let btrfs discover the rest of the array during boot up. Group identifiers in /etc/fstab seem to contribute to the problem. Like I say, a workaround and not a fix, but others may find it useful.
Comment by Radek Podgorny (rpodgorny) - Tuesday, 09 August 2016, 19:24 GMT
more precisely, a partial workaround:

"So long as the chosen disk survives, everything is fine. In theory I have a 25% chance of that particular disk failing and leaving me locked out."

...still better than nothing. ;-)
Comment by Dave Reisner (falconindy) - Saturday, 10 December 2016, 03:05 GMT
Is this still a problem with systemd v232 in the initramfs?
Comment by Michael Werner (Xaseron) - Tuesday, 24 January 2017, 11:31 GMT
For me the problem still persists and i'm still on the last working systemd version, which is 229.
Comment by Sven-Hendrik Haase (Svenstaro) - Sunday, 05 March 2017, 16:49 GMT
For what it's worth, this is also an issue for me right now on systemd 232.
Comment by desbma (desbma) - Sunday, 07 May 2017, 12:52 GMT
I am hit by this bug too, randomly but very frequently (I'd say 8 boots out of 10).

I suspect this is just a timing issue.
My root partition is a BTRFS RAID0 array of 2 usb sticks.

When the issues occurs this is what happens:
1. first drive gets detected by the kernel
2. udev hook tries to mount it
3. btrfs array is incomplete so that fails
4. second drive gets detected by the kernel
5. if I try to mount the array in the emergency console, it works without any error or any other manual action because the array is now complete

Now if I just add the btrfs mkinitcpio hook before udev, here is what happens:
1. btrfs hook scans for btrfs drives
2. the scan blocks until all drives are available
3. udev hook mounts the array without issue

EDIT:
Adding the btrfs hook was not enough to fix the issue, in the end to fix it for good I had to:
* Add a sleep hook, just after the block hook in /etc/mkinitcpio.conf (order is important, more on that below)
* Add sleepdevice=/dev/xxx (/dev/disk/by-id/yyy or /dev/disk/by-uuid/zzz work too) in the kernel command line boot parameters (in /etc/default/grub in my case). That is needed so that the sleep hook waits for the device to appear, and continues the boot process as soon as it is available. It is important that the sleep hook is after the block hook otherwise block devices won't be detected no matter how long you wait.
* Add a btrfs hook just after the sleep hook. Having the block device is not enough, if btrfs has not identified it is part of a RAID array, so that hook is needed to scan the new devices.

Now I finally have a reliable boot :)
Comment by Sven Witterstein (grizzlyfred) - Thursday, 08 June 2017, 20:56 GMT
I want to report that I got hit by presumably the same bug on aarch64 on a pi3.
It was running happily, but I had made a mistake in the partitioning of it's µSD (mounting without lzo enabled), and thus root was to small/full to install some stuff.

Thus I repartioned and re-subvolumed (like @, @home...) and restored state from backup - as the card is big enough, but a bit worn, I thought it a good idea to half it and make a raid1-on-one disk setup which lead to the strange error message:
sector size 4096 not supported yet, only support 65536 - from https://patchwork.kernel.org/patch/8836881/

I do not think aarch64 has such huge pages, but I don't know. So maybe this constant SECTOR_SIZE is wrong for some reason? - might be related or not.

It does not matter on which platform I generate the btrfs system.
I also tried "-m dup -d dup" instead of two partitions - the effect is the same, the ctree fails to be opened.

There are differences to desbmas observation - I can't mount anything in the ermergency shell.

Also, I have btrfs in the "modules" section, not in the HOOKS section for mkinitcpio.conf.

What I find strange also is, that the odroid-image has btrfs support compiled into initrd, and the pi3 image has a module only.

I tried ignoring the pi3-boot style of booting by UUDID and use raw /dev/mmblk0p2 (p1 being the /boot-fs) and also "bash makscr" in /boot didnt do anything.

So the really strange thing here is, that it happens on single device setup with dup or two raid1 parts as well...

However, I will check if anything changes when adding the sleep / block / sleepdevice stuff some time (if I figure out how to chroot and make initcpio on the sdcard on a differnt machine...
Comment by Tom Yan (tom.ty89) - Thursday, 13 July 2017, 22:52 GMT
It's only natural it happens tbh. We have nothing that guards the possible race condition anyway. Try this:

1. Run `while ! dev=$(blkid -lt UUID=$FS_UUID -o device); do { true; } done; mount $dev /mnt/` in a terminal
2. Run `losetup -f one.img ; losetup -f two.img` in another terminal

btrfs module loaded beforehand or not, the `mount` command could fail. If it does not, add `sleep 1` (or even less) between the two `losetup` commands, for in reality the time gap could possibly be even bigger.

This is exactly what could happen with the mkinitcpio init, udev hook or btrfs hook. Neither udev or `btrfs device scan` guarantees that all the required btrfs devices has shown up before the mount handler is called.

We need to wait until `btrfs device ready $root` returns 0 with the btrfs hook (which will be needed no matter the udev hook is used or not). And because it needs $root resolved, we need to do `resolve_device "$root"` in the hook as well.
Comment by Leonidas Spyropoulos (inglor) - Tuesday, 31 October 2017, 10:12 GMT
I had this happened on my laptop with btrfs. The workaround I've made was to include in mkinitcpio in the array MODULES the `crc32c` and `btrfs`. It could be just random but I haven't hit the issue after.
Comment by Justin Capella (justincapella) - Thursday, 07 March 2019, 21:15 GMT
Just curious if any of you have used systemd initrd it has generators and device dependencies and mounts... I wonder if btrfs was builtin before and by default is a module now? What you're using for dev path /dev/disk/by-uuid etc.

Loading...