FS#69771 - [grub] grub-mkconfig no longer correctly detects zfs root pool name

Attached to Project: Arch Linux
Opened by Tessa N (tessaracht) - Thursday, 25 February 2021, 00:58 GMT
Last edited by Christian Hesse (eworm) - Friday, 26 February 2021, 07:47 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Ronald van Haren (pressh)
Christian Hesse (eworm)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
At some point some updates have happened since I last rebooted my system, and now it looks like grub-mkconfig (with ZPOOL_VDEV_NAME_PATH=1 set in /etc/environment) no longer correctly detects my root ZFS pool name (rpool). Looking at /boot/grub/grub.cfg after the latest linux kernel package updates (linux=5.11.1.arch1-1), it's set the kernel commandline to "root=ZFS=/ROOT/default" instead of the correct "root=ZFS=rpool/ROOT/default", which leaves my system unbootable.

Steps to reproduce:
run "grub-mkconfig -o /boot/grub/grub.cfg", see incorrect values generated.

Notes:
I've only marked this as high, but per the wiki it might count as critical severity if it's affecting other ZFS root users? feel free to adjust as you see fit. ;)
This task depends upon

Closed by  Christian Hesse (eworm)
Friday, 26 February 2021, 07:47 GMT
Reason for closing:  Upstream
Additional comments about closing:  upstream issue, follow the linked github issue for explanation and workaround
Comment by Tessa N (tessaracht) - Thursday, 25 February 2021, 01:02 GMT
ahhh notably it looks like it was working prior to grub package version 2.04-10, which was installed on my system on the 24th. whatever version I had previously was fine. my zfs package versions are zfs-dkms=2.0.3-1 and zfs-utils=2.0.3-1, but I doubt that's where this issue is coming from.
Comment by Christian Hesse (eworm) - Thursday, 25 February 2021, 07:32 GMT
Well, we do not have zfs packages in out official repositories, so I handle this with low priority. And as I do not use zfs myself I can not test.

Anyway... The only change in 2.04-10 is a build system call to `objcopy`, unlikely to break features.
Are you sure this did not break with other updates? You could verify by downgrading the grub package.
Comment by Tessa N (tessaracht) - Thursday, 25 February 2021, 18:14 GMT
hmm yeah, I tried downgrading grub and that didn't fix this problem. I've been doing some testing of the code in /etc/grub.d/10_linux, and it seems like this line is the culprit:

rpool=`${grub_probe} --device ${GRUB_DEVICE} --target=fs_label 2>/dev/null || true`

however, in comparison with my working Ubuntu systems, they've got a whole chunk of code in /etc/grub.d/10_linux_zfs that Arch doesn't seem to have, which seems to do the heavy lifting there, and is sourced and called right before that rpool line above. not sure if that's an Ubuntu specific set of code though. it looks like this is a known issue upstream that still isn't fixed yet (https://github.com/zfsonlinux/grub/issues/22), but it's very unclear to me how this worked fine on this Arch install for the last month or so. very puzzling.

they've got a workaround which is to just specify the whole root= option in /etc/default/grub, and that'll end up after the autogenerated entries in the kernel options, and so take precedence. it's a little hacky, but it should work ok. we can probably close this since it seems to be a fully upstream problem and not Arch's responsibility, but yeah, I still wish I understood how it worked in the first place! 🤷‍♀️
Comment by Christian Hesse (eworm) - Friday, 26 February 2021, 07:46 GMT
The linked github issue indicates that this is influenced by filesystem features. Possibly you (or "something") enabled a new zfs feature that breaks the functionality?

Loading...