FS#54655 - [mkinitcpio-busybox] Boot failure after upgrade to 1.26.1

Attached to Project: Arch Linux
Opened by Dario Giovannetti (kynikos) - Saturday, 01 July 2017, 07:31 GMT
Last edited by Eli Schwartz (eschwartz) - Thursday, 26 October 2017, 03:11 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Dave Reisner (falconindy)
Bartłomiej Piotrowski (Barthalion)
Architecture All
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 7
Private No

Details

Today I upgraded the system on a laptop (an old Acer 1810TZ), and among the changes there was 'mkinitcpio-busybox' 1.25.1-2 -> 1.26.1-1 (x86_64).

After performing the upgrade, which automatically triggers the initramfs image regeneration, the system doesn't boot anymore, this is what is displayed after the boot loader stage:

starting version 232
ERROR: resume: hibernation device '/dev/mapper/grp-swap' not found
ERROR: device '/dev/mapper/grp-root' not found. Skipping fsck.
mount: you must specify the filesystem type
You are now being dropped into an emergency shell.
sh: can't access tty; job control turned off
[rootfs ]#

If I downgrade mkinitcpio-busybox back to 1.25.1-2 and recreate the image, the system boots again normally.

I have performed the same package update on other machines successfully, only on this laptop it generates this problem.

Other package versions:
linux 4.11.7-1
mkinitcpio 23-1
systemd 232-8
This task depends upon

Closed by  Eli Schwartz (eschwartz)
Thursday, 26 October 2017, 03:11 GMT
Reason for closing:  Fixed
Additional comments about closing:  mkinitcpio-busybox 1.27.2-1 seems to have fixed this
Comment by somini (_somini_) - Saturday, 01 July 2017, 18:08 GMT
Just got hit by this same bug. I can confirm downgrading and regenerating the initramfs image works.
I have the same package versions as you.

I'm using LUKS with LVM, so the workaround to getting the system to boot so that you can downgrade the package is the following:
- Use `blkid` to find the LUKS volume containing the real root.
- `cryptsetup luksOpen $luksvolume luks` will prompt for the password.
- `mount $lvmroot /new_root` will mount the real root in the proper place
- Use CTRL-d to exit the recovery shell and booting will continue normally.
Comment by Dario Giovannetti (kynikos) - Sunday, 02 July 2017, 02:53 GMT
I too use LVM on LUKS on that machine, if it helps to solve the bug, its mkinitcpio.conf has:

HOOKS="base udev autodetect modconf block keymap encrypt lvm2 resume filesystems keyboard fsck shutdown"
Comment by Tom Yan (tom.ty89) - Thursday, 13 July 2017, 18:54 GMT
@_somini_ So it seems not even the encrypt hook is successfully run. What do you use exactly for cryptdevice=? Paste your /proc/cmdline maybe?

@kynikos Is it the same case for you? Can you confirm that in the emergency shell, you can see your disks (/dev/sd*) but no mapper at all (i.e. not even /dev/dm-0)? If so can you also paste your /proc/cmdline?

P.S. I also use the encrypt hook for my LUKS-encrypted root partition as well but I cannot reproduce your issue. I am not using LVM but that doesn't seem relevant to me (for it's LVM on LUKS).
Comment by Robert Alessi (ralessi) - Friday, 14 July 2017, 01:10 GMT
I have the same issue on my laptop with lvm on luks. But it so happens that I got a new one on which I just installed a new system, the same way (lvm on luks), with no issue. There is one difference though: I did not set the resume device on the latter yet. My guess is this issue may originate with the definition of a resume device. I'll run some more tests later on.
*EDIT*: wrong guess. I just set the resume device, and the laptop boots.
Comment by Tom Yan (tom.ty89) - Friday, 14 July 2017, 10:11 GMT
Can you help check the value of the following variables in the emergency shell?

cryptdevice
cryptdev
resolved
Comment by Robert Alessi (ralessi) - Friday, 14 July 2017, 10:56 GMT
I have this in the emergency shell:

lsblk -f
sda
`-sda1 crypto_LUKS <UUID>

cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz root=UUID=<uuid> rw cryptdevice=UUID=<uuid>:root cryptkey=rootfs:/path/to/key resume=UUID=<uuid> quiet
Comment by Tom Yan (tom.ty89) - Friday, 14 July 2017, 13:36 GMT
No offense but this is not really helpful because it doesn't reveal much more why the encrypt hook does not open the encrypted partition as expected. Though at least I can see from your cmdline that you use UUIDs.

My speculation is resolve_device() somehow failed with some corner cases. That's why I asked for the three variables, which might reveal how/when it failed. With cases like this it would be more helpful if you can provide your exact UUIDs so that others might test with them and see if they can reproduce the issue.
Comment by Dave Reisner (falconindy) - Friday, 14 July 2017, 13:54 GMT
Boot with rd.debug and rd.log=file, then post /run/initramfs/init.log
Comment by Dario Giovannetti (kynikos) - Friday, 14 July 2017, 14:05 GMT
(I was writing this answer before Dave posted, I'll try his suggestion now)

My lsblk is also:
sda
`-sda1 crypto_LUKS

I confirm I get stuck at the encrypt hook too, I don't see any mapped devices in the emergency console either, including any /dev/dm-#.

My /proc/cmdline is:
BOOT_IMAGE=/vmlinuz-linux root=/dev/mapper/grp-root rw cryptdevice=/dev/sda1:lvm cryptkey=rootfs:/boot/root_keyfile resume=/dev/mapper/grp-swap quiet
Note that I don't use UUIDs there.

In the emergency shell, only 'cryptdevice' is set for me, I don't see any 'cryptdev' or 'resolved' variables.

One thing that I see is that ralessi and I are both using a root keyfile, in my case it's because I'm using GRUB cryptodisk, to avoid having to type the passphrase twice, if it matters.
Comment by Dario Giovannetti (kynikos) - Friday, 14 July 2017, 14:43 GMT
Ok, so I've done some more testing, I've attached the logs from rd.debug: init.log.fail is saved from the emergency shell, before manually unlocking the partition; init.log.fail-full-boot is the full log including the events after manually unlocking the partition; init.log.ok is a normal boot from an image made with mkinitcpio-busybox 1.25.1-2.
Comment by Dave Reisner (falconindy) - Friday, 14 July 2017, 14:56 GMT
So, the difference is that in a failed boot, resolve_device is called with an empty argument because root= isn't set. That's weird, because I can't replicate the failure in the unit tests. And sadly, command line parsing isn't logged because we don't know if we're logging until after parsing has occurred...
Comment by Dave Reisner (falconindy) - Friday, 14 July 2017, 15:08 GMT
Oh, ignore that... resolve_device is called with an empty arg from the cryptsetup hook because 'IFS=: read cryptdev cryptname cryptoptions' (followed by the heredoc) results in an empty "cryptdev". I can't replicate that out of band either...
Comment by Robert Alessi (ralessi) - Friday, 14 July 2017, 16:42 GMT
@tom.ty89 I had no intention to hide my UUIDs. I just thought innocently that this information was useless or irrelevant to the point. Just tell me if it's needed or if anything else that I may be able to provide is.

Otherwise, I have the same settings as @kynikos. It may be worth mentioning that I haven't used UUIDs on my second laptop which boots fine.
Comment by 8argd1+6qaasvkxuygvk (8argd16qaasvkxuygvk) - Friday, 14 July 2017, 21:20 GMT
I am also getting the error. I found that dropping to a grub shell and typing "cryptosetup -a, linux ... initrd ..., boot" manually, resolves the issue.
Comment by Dario Giovannetti (kynikos) - Friday, 14 July 2017, 22:04 GMT
I should have tried this before and I don't know if it's obvious or helpful at all, but if I remove the cryptkey parameter from the command line (everything else unaltered), of course I'm asked for the passphrase again, but then the boot succeeds.
Comment by Robert Alessi (ralessi) - Saturday, 15 July 2017, 07:50 GMT
Many thanks, @kynikos, for this very good point. I confirm that removing the cryptkey parameter made my laptop boot again. At least, we know how the bug is triggered.
Comment by Tom Yan (tom.ty89) - Sunday, 16 July 2017, 04:39 GMT
Not really. For one I have the encrypted /boot and keyfile in initramfs as well. I also have it under /boot so I am setting cryptkey= as well, while the new version of busybox works here.

Also it doesn't tell why the `read` to split cryptdevice= fail anyway, especially when the variable itself seems fine either case.

In any case, if removing cryptkey= from the cmdline helps, you can consider moving the keyfile to the fallback path /crypto_keyfile.bin instead. It should save you from entering the paraphrase twice.
Comment by somini (_somini_) - Tuesday, 18 July 2017, 22:43 GMT
@tom.ty89 Sorry for the delay, but my /proc/cmdline is similar to kynikos:
> BOOT_IMAGE=/vmlinuz-linux root=/dev/mapper/internal-root rw cryptdevice=UUID=4571e505-026c-4012-bd64-a08c0929142d:lvm cryptkey=rootfs:/lvm_keyfile resume=/dev/mapper/internal-swap quiet
Comment by Tom Yan (tom.ty89) - Friday, 21 July 2017, 07:12 GMT
I think it's just some tricky upstream bug anyway. Yesterday I experienced it _once_ on a machine that I have been using.
Comment by Neil Darlow (neildarlow) - Wednesday, 02 August 2017, 11:18 GMT
Are you sure this isn't just a failure of the lvm2 hook?

My system stopped booting after this update. I managed to use an archlinux iso to boot the system and mounted the LVM volumes.

I though the problem was kernel related and attempted to reinstall the kernel manually. At the initramfs build stage I noticed an error being displayed to the effect that the lvm2 hook has no build function.

I tried rebooting after image generation and it dropped me to a shell and poking around it looks like there's no LVM support present in the initramfs.
Comment by somini (_somini_) - Thursday, 03 August 2017, 18:29 GMT
If the problem is with the LVM support, why does the system boots correctly when we manually unlock the LUKS volume?

Plus, the LVM hook hasn't been changed in over 1 year:
https://git.archlinux.org/svntogit/packages.git/log/trunk/lvm2_hook?h=packages/lvm2
Comment by regid (regid1) - Monday, 07 August 2017, 13:12 GMT
The following might be related:
How to reproduce the bug, or a related one?
Just tried to install my machine. I have only one machine, and it is written from memory since I had to use a public PC:
1) Intsall arch from scratch on a BIOS PC.
2) Use the following HD partitioning: A seperate primary boot partition. A seperate primary partition for swap. An extended partition, with one logical partition to have all the LVM2 filesystems. Within the LVM2 make a seperate root LV, tmp LV, usr LV and home LV.
3) Install base and the extlinux boot loader. With dependency packages such as systemd-sysvcomapt, of course.
4) Mark the usr filesystem ro in /etc/fstab.
5) For the mkinitcpio.conf use HOOKS="base systemd sd-lvm2 fsck". Again, this is written from memory. I hope I didn't forgot something. And yes, this is with a systemd configuration. Not the classical one.
6) Finish the installation.
7) Reboot

Yes, no encryption is mentioned. I am aware of it.
Comment by Dave Reisner (falconindy) - Monday, 07 August 2017, 13:19 GMT
> yes, this is with a systemd configuration
Then it isn't related to this bug.
Comment by regid (regid1) - Monday, 07 August 2017, 13:41 GMT
Dave Reisner (falconindy): It is LVM2 and mkinitcpio related. Do you think I should delete my comment, or take it else where (where?)?
Comment by Dave Reisner (falconindy) - Monday, 07 August 2017, 13:44 GMT
This bug is about busybox, which isn't at all involved when using systemd. So, both.

I don't see any bug or other problem in your comment, so I don't know where to direct you.
Comment by somini (_somini_) - Tuesday, 24 October 2017, 21:40 GMT
I think this was fixed, at least I can't reproduce it anymore.
I installed mkinitcpio-busybox-1.27.2-1 today, regenerated the boot system and the root partition is found as before.

I got hit by an unrelated bug and the encrypt hook can't read the keyfiles on initramfs so I need to input my boot password 3 times(boot,lvm, boot partition in the real system) now, but the partitions are found correctly.
Comment by Dario Giovannetti (kynikos) - Wednesday, 25 October 2017, 11:49 GMT
Confirmed, 1.27.2-1 fixes this bug for me too, thank you. No problems with keyfiles for me, all seems fine :)

Loading...