FS#77569 - [dracut] boot failure with systemd >= 253

Attached to Project: Arch Linux
Opened by Toolybird (Toolybird) - Saturday, 18 February 2023, 23:21 GMT
Last edited by Toolybird (Toolybird) - Thursday, 13 July 2023, 00:35 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Christian Hesse (eworm)
Giancarlo Razzolini (grazzolini)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 4
Private No

Details

This is mainly a heads-up because I haven't figured it out yet. It's the first time in yonks I've had to recover an Arch system with an installation USB flash drive :(

- standard LVM setup (root volume -> no raid, thin, luks or anything like that, just basic)
- dracut
- system fails to boot when systemd-253 is present in the initrd

Removing "quiet" and adding "rd.debug" to kernel params shows it's udev related...seems to be hanging somewhere around "dm-pre-udev.sh"

Downgrading systemd and regenerating initrd works. Upgrading systemd but leaving initrd containing old systemd also works. Bit of a head-scratcher so far. I'm curious if anyone else is affected.
This task depends upon

Closed by  Toolybird (Toolybird)
Thursday, 13 July 2023, 00:35 GMT
Reason for closing:  Fixed
Additional comments about closing:  dracut 059-2
Comment by Toolybird (Toolybird) - Sunday, 19 February 2023, 04:55 GMT Comment by loqs (loqs) - Sunday, 19 February 2023, 12:27 GMT
If the issue is still present under dracut 059 what if you change the paths that are not made read only [1]? I would suggest trying changing /dev/shm/ to /dev/ if that fails then try adding /etc/ /var/ /usr/.

[1] https://github.com/systemd/systemd/commit/ca6ce62d2a437432082b5c6e5d4275d56055510f#diff-0d21c797b372d6ceff84bd203e533ad48b9ed4fee07c24e2b4bc64f4ee6bac0fR3766
Comment by Janis König (LeonardK) - Sunday, 19 February 2023, 17:33 GMT
@loqs: I compiled dracut from git to have 059-* instead of 056 but nothing changed.

FWIW, I also run LVM but I do use LUKS on the individual logical volumes, but since this breaks even before cryptsetup comes into play, I'm probably hitting the same bug.

I've uploaded the /run/initramfs/rdsosreport.txt and journalctl output for systemd-253 with rd.debug enabled. While the logs mention linux-hardened, the same behavior can be seen with linux plain as well.

Note that `dmsetup` claims no devices exist, and there are no `/dev/dm*` devices and `/dev/disk/by-uuid` doesn't contain any mapped disks. `lvm` does list the physical and logical volumes as well as the volume groups, but the volume group `vgmain` (in my case) doesn't exist on `/dev/vgmain/*`.

From the log it does seem like lvm_scan is never run, maybe because it doesn't exist at the right place/its hook is not registered or because udev doesn't settle properly? I have no idea about the actual underlying workings unfortunately. I ~~can provide~~ have uploaded a tarball of the contents of a faulty and a correctly generated initramfs ~~but not here~~ on the linked GitHub issue (too big for flyspray)
Comment by loqs (loqs) - Sunday, 19 February 2023, 18:11 GMT
@LeonardK what if you rebuild systemd 253 with the attached patch?
Comment by Toolybird (Toolybird) - Sunday, 19 February 2023, 21:27 GMT
Thanks for the test patch @loqs, but no difference here.
Comment by Janis König (LeonardK) - Sunday, 19 February 2023, 21:40 GMT
Same for me unfortunately :/
Comment by Janis König (LeonardK) - Sunday, 19 February 2023, 21:52 GMT
I also rebuilt adding the other directories you mentioned:

- STRV_MAKE("/sys", "/run", "/proc", "/dev/shm", "/tmp"));
+ STRV_MAKE("/sys", "/run", "/proc", "/dev", "/tmp", "/etc/", "/var", "/usr"));

but unfortunately no change.
Comment by loqs (loqs) - Sunday, 19 February 2023, 22:00 GMT
Can you determine what is left mounted read only?
Edit:
Skip the whole remount and see if the issue is still present.
Comment by Toolybird (Toolybird) - Sunday, 19 February 2023, 22:35 GMT
@loqs, that patch works. But that is to be expected because it effectively reverts the intent of the original MR. Maybe we just need to fix dracut? Unfortunately, I don't get an emergency shell when it fails so I cannot see the state of things.
Comment by loqs (loqs) - Sunday, 19 February 2023, 22:45 GMT
Something other than /dev /etc /proc /run /sys /tmp /usr /var seems to need write access.
Could you somehow place strace at the top of the two generators and find what call is getting blocked?
I would suggest using kernel audit rules but that would involve getting auditd installed and configured in the initrd.
Try asking upstream dracut?
Comment by Toolybird (Toolybird) - Monday, 20 February 2023, 01:02 GMT Comment by Toolybird (Toolybird) - Monday, 20 February 2023, 04:00 GMT
Upstream systemd produced a patch [1]. Works for me.

[1] https://github.com/systemd/systemd/pull/26494
Comment by Christian Hesse (eworm) - Friday, 03 March 2023, 14:45 GMT
Upstream systemd will not fix / work around this. Please wait for dracut to fix this.
Comment by Toolybird (Toolybird) - Friday, 03 March 2023, 17:22 GMT
So those of us affected are basically screwed? Fedora have applied the patch [1] as a temporary workaround until dracut is fixed. Any chance you could do the same?

[1] https://src.fedoraproject.org/rpms/systemd/c/4bdd16eba5c409a5aa0afcc16f6e284f20793e06?branch=rawhide
Comment by Christian Hesse (eworm) - Friday, 03 March 2023, 17:47 GMT
Fedora uses dracut by default, we do not. Anyway... A commit exists for dracut - yet to be merged. But perhaps that should be evaluated and pushed to dracut package.

https://github.com/aafeijoo-suse/dracut/commit/4bde75fabe31a5c048fd75e533b94e91c3faa83b
Comment by loqs (loqs) - Friday, 03 March 2023, 20:28 GMT
Seems there are four commits that need to be cherry-picked to 056, which is the last signed release. At least there are no merge commits and the patch applies and builds. That is as far as I tested.
Comment by Christian Hesse (eworm) - Saturday, 04 March 2023, 18:44 GMT
Anybody can test this? I will happily push a fixed dracut package after confirmation.
Comment by Toolybird (Toolybird) - Sunday, 05 March 2023, 04:33 GMT
@loqs, thanks for the patch, it works a treat.

@eworm, please push, thanks.
Comment by Toolybird (Toolybird) - Friday, 07 July 2023, 22:11 GMT
dracut 059-1 just broke this again.

@grazzolini, it appears you dropped the patch that fixed this. The patch is *not* included in 059 upstream.
Comment by Toolybird (Toolybird) - Saturday, 08 July 2023, 00:02 GMT
Our GitLab merge requests don't appear to be "live" yet. But I've tested this [1] and can confirm it solves the problem.

[1] https://gitlab.archlinux.org/toolybird/dracut/-/commit/0f48723a277d8f1ee6e24cb21fd5c0ee01bd2846

Loading...