FS#72658 - [linux] Kernel 5.15(.1) appears frozen at initrd, because LUKS prompt does not display
Attached to Project:
Arch Linux
Opened by Ronan (ronjouch) - Sunday, 07 November 2021, 22:06 GMT
Last edited by Jan Alexander Steffens (heftig) - Monday, 15 November 2021, 18:46 GMT
Opened by Ronan (ronjouch) - Sunday, 07 November 2021, 22:06 GMT
Last edited by Jan Alexander Steffens (heftig) - Monday, 15 November 2021, 18:46 GMT
|
Details
This is a follow-up to [BBS » Kernel & Hardware »
[SOLVED][Testing] Kernel 5.15.1 stuck at "Loading initial
ramdisk
..."](https://bbs.archlinux.org/viewtopic.php?pid=2001959),
where I ask for help with testing kernel 5.15 and 5.15.1
being unbootable because stuck at `Loading initial ramdisk
...`.
Thanks to forum people pointing me to kernel option `earlyprintk=efi,keep`, I discover that the LUKS prompt is absent, but it's actually visible in the `earlyprintk=efi,keep` logs, drowned among other logs and not visible unless I set the earlyprintk option! Then, if I type my LUKS password (to no apparent LUKS prompt! While the last line on screen is `Loading initial ramdisk ...`), the system *does* boot. -> It feels like kernel 5.15.1 has a regression where it outputs LUKS prompt at an incorrect loglevel, or maybe this is a race condition, or output isn't properly flushed to screen, or logs fail to be enabled before the LUKS prompt is written. Additional info: * linux + linux-headers 5.15.1.arch1-1 from the [Testing] repo * Typical LUKS grub/mkinitcpio setup done following the installation guide; see [forum post for the full details](https://bbs.archlinux.org/viewtopic.php?pid=2001925#p2001925) |
This task depends upon
Closed by Jan Alexander Steffens (heftig)
Monday, 15 November 2021, 18:46 GMT
Reason for closing: Fixed
Additional comments about closing: linux 5.15.2.arch1-1
Monday, 15 November 2021, 18:46 GMT
Reason for closing: Fixed
Additional comments about closing: linux 5.15.2.arch1-1
I'm building a -custom kernel now to confirm this. Will report back tomorrow.
First, I don't have anything valuable to report about attempting to boot with a custom `CONFIG_SYSFB_SIMPLEFB=n` ABS kernel build. My -custom build is always unbootable and stuck at "Loading initial ramdisk ...". So, my custom build is worse than 5.15.1.arch1-1 from [testing], for which at least I'm able to boot when typing my password blindly to the invisible LUKS prompt. Not sure what I screwed up, I plainly followed the ABS wiki with one config change: setting my custom flag, disabling docs build, `makepkg -s`, install, reboot.
Secondly and maybe most importantly, through more testing, I narrowed the conditions causing the problem. My problem is limited to booting with the laptop lid closed, and boot display happening over an external monitor (a 27" BenQ GW2765 over HDMI).
So, my revised summary of this bug is: "Display of LUKS prompt over external monitor, which was already graphically corrupted in 5.14, regressed to invisible in 5.15, causing boot to appear stuck at 'Loading initial ramdisk ...'". Details:
==== 1. Stable kernel 5.14.16, laptop lid open ====
1.0. When booting 5.14.16 with my laptop lid **open** (and display only happens on my laptop monitor), "Loading initial ramdisk ..." succeeds, I get a correct LUKS prompt, and I'm able to boot.
==== 2. Stable kernel 5.14.16, laptop lid closed ====
2.0. When booting 5.14.16 with my laptop lid **closed** (displaying through external HDMI monitor, and typing on an external USB keyboard), "Loading initial ramdisk ..." succeeds, I get a "corrupted but reactive to keyboard input" LUKS prompt, and I'm able to boot.
2.1. By "corrupted but reactive to keyboard input" above, I mean that in kernel 5.14.x, although the LUKS prompt did display and react to keyboard input when viewed on an external monitor, it appeared severely graphically corrupted. See new attached screenshot kernel-514-displaying-luks-prompt-already-corrupted.jpg .
2.2. This corruption of the LUKS prompt when on an external monitor / lid closed wasn't always present. This is a recent-ish regression (I'd say a few months, maybe years, somewhere in 5.12 / 5.13 / 5.14), one that I didn't bother to report, neither here nor upstream. Sorry.
==== 3. Testing kernel 5.15.1, laptop lid open ====
3.0. When booting 5.15.1 with my laptop lid **open**, "Loading initial ramdisk ..." succeeds, I get a correct LUKS prompt, and I'm able to boot.
==== 4. Testing kernel 5.15.1, laptop lid closed ====
4.0. When booting 5.15.1 with my laptop lid **closed**, boot sequence display stays frozen at "Loading initial ramdisk ..." and does *not* show the LUKS prompt.
4.1. Said differently, while 5.14.16 displays LUKS garbage (see point 2.1. and screenshot), it is garbage looking like a LUKS prompt and garbage that reacts to keyboard input. On the contrary, 5.15.1 stays frozen at "Loading initial ramdisk" and displays nothing about LUKS, not even garbage, leaving me to think the boot crashed / was aborted.
4.2. At this point, no keyboard password input will cause the display to update, display stays frozen...
4.3. ... until I do one of two effectful things:
A. Ctrl+Alt+Del to reboot
B. Type my LUKS password and press Enter, which as mentioned above, will successfully decrypt my LUKS, get back a working graphical mode, and boot to my DE.
4.4. Also of interest: with the laptop lid closed, earlyprintk logs will *not* display. The display always remains "frozen at the last frame of 'Loading initial ramdisk'". Again, the two only effectful things I found when in this state are to Ctrl+Alt+Del, or to enter my password + hit Enter.
==== Conclusion ====
Does that ring a bell? Any boot flags for me to try? (I already tried these forum suggestions: nomodeset i915.modeset=0 acpi=off iommu=soft). Should I file a bug upstream?
Also, tomorrow or during the week I'll try a few things to get more data and check whether that's a software or hardware issue: 1. different display, 2. different HDMI cable, 3. different distro.
```
CONFIG_FB_UVESA=m
CONFIG_FB_VESA=y
CONFIG_FB_EFI=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y
```
Adding this information to the other Flyspray bug, requesting closure of this one, and will follow-up upstream regarding the pre-existing graphical glitch.
@heftig If you have no system to reproduce the issue yourself, I can try to test some (fixed) packages.
I now have two questions:
1. Is upstream already aware of the issue? Should I file a bug upstream?
2. When I write "like it used to do in 5.14.16" it means that, although now visible, the LUKS prompt is visually glitched when displayed through an external monitor. See comment above, section " ==== 2. Stable kernel 5.14.16, laptop lid closed ====", and screenshot "kernel-514-displaying-luks-prompt-already-corrupted.jpg".
-> Do you have any suggestions on how to troubleshoot this, and should I file an upstream bug report? I plan to try in this order: lts, other display, other HDMI cable, other distro. Anything else?
No, it's not caused by simpledrm.ko.zst.
Because I have built a kernel with a patched config from #72645. It's works fine, and not contain simpledrm.ko.zst.
1) This was a configuration issue, not an upstream issue.
2) Try putting i915 into the initramfs ( MODULES=(i915) in mkinitcpio.conf )
Interestingly, the system is able to boot if I load the key from an SD card (and therefore don't need the prompt). After setting `MODULES=(i915)` is looks like the graphics initialization is happening indeed a bit earlier than before (notable because it shortly turns the screen completely black) but that's apparently not soon enough.