FS#72658 - [linux] Kernel 5.15(.1) appears frozen at initrd, because LUKS prompt does not display

Attached to Project: Arch Linux
Opened by Ronan (ronjouch) - Sunday, 07 November 2021, 22:06 GMT
Last edited by Jan Alexander Steffens (heftig) - Monday, 15 November 2021, 18:46 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Jan Alexander Steffens (heftig)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

This is a follow-up to [BBS » Kernel & Hardware » [SOLVED][Testing] Kernel 5.15.1 stuck at "Loading initial ramdisk ..."](https://bbs.archlinux.org/viewtopic.php?pid=2001959), where I ask for help with testing kernel 5.15 and 5.15.1 being unbootable because stuck at `Loading initial ramdisk ...`.

Thanks to forum people pointing me to kernel option `earlyprintk=efi,keep`, I discover that the LUKS prompt is absent, but it's actually visible in the `earlyprintk=efi,keep` logs, drowned among other logs and not visible unless I set the earlyprintk option!

Then, if I type my LUKS password (to no apparent LUKS prompt! While the last line on screen is `Loading initial ramdisk ...`), the system *does* boot.

-> It feels like kernel 5.15.1 has a regression where it outputs LUKS prompt at an incorrect loglevel, or maybe this is a race condition, or output isn't properly flushed to screen, or logs fail to be enabled before the LUKS prompt is written.

Additional info:
* linux + linux-headers 5.15.1.arch1-1 from the [Testing] repo
* Typical LUKS grub/mkinitcpio setup done following the installation guide; see [forum post for the full details](https://bbs.archlinux.org/viewtopic.php?pid=2001925#p2001925)

This task depends upon

Closed by  Jan Alexander Steffens (heftig)
Monday, 15 November 2021, 18:46 GMT
Reason for closing:  Fixed
Additional comments about closing:  linux 5.15.2.arch1-1
Comment by Ronan (ronjouch) - Sunday, 07 November 2021, 22:10 GMT
Note: loqs from the forum says: "Possibly the same framebuffer changed highlighted in https://bugs.archlinux.org/task/72645 ".

I'm building a -custom kernel now to confirm this. Will report back tomorrow.
Comment by Ronan (ronjouch) - Monday, 08 November 2021, 04:11 GMT
I have two pieces of news:

First, I don't have anything valuable to report about attempting to boot with a custom `CONFIG_SYSFB_SIMPLEFB=n` ABS kernel build. My -custom build is always unbootable and stuck at "Loading initial ramdisk ...". So, my custom build is worse than 5.15.1.arch1-1 from [testing], for which at least I'm able to boot when typing my password blindly to the invisible LUKS prompt. Not sure what I screwed up, I plainly followed the ABS wiki with one config change: setting my custom flag, disabling docs build, `makepkg -s`, install, reboot.

Secondly and maybe most importantly, through more testing, I narrowed the conditions causing the problem. My problem is limited to booting with the laptop lid closed, and boot display happening over an external monitor (a 27" BenQ GW2765 over HDMI).

So, my revised summary of this bug is: "Display of LUKS prompt over external monitor, which was already graphically corrupted in 5.14, regressed to invisible in 5.15, causing boot to appear stuck at 'Loading initial ramdisk ...'". Details:

==== 1. Stable kernel 5.14.16, laptop lid open ====

1.0. When booting 5.14.16 with my laptop lid **open** (and display only happens on my laptop monitor), "Loading initial ramdisk ..." succeeds, I get a correct LUKS prompt, and I'm able to boot.

==== 2. Stable kernel 5.14.16, laptop lid closed ====

2.0. When booting 5.14.16 with my laptop lid **closed** (displaying through external HDMI monitor, and typing on an external USB keyboard), "Loading initial ramdisk ..." succeeds, I get a "corrupted but reactive to keyboard input" LUKS prompt, and I'm able to boot.

2.1. By "corrupted but reactive to keyboard input" above, I mean that in kernel 5.14.x, although the LUKS prompt did display and react to keyboard input when viewed on an external monitor, it appeared severely graphically corrupted. See new attached screenshot kernel-514-displaying-luks-prompt-already-corrupted.jpg .

2.2. This corruption of the LUKS prompt when on an external monitor / lid closed wasn't always present. This is a recent-ish regression (I'd say a few months, maybe years, somewhere in 5.12 / 5.13 / 5.14), one that I didn't bother to report, neither here nor upstream. Sorry.

==== 3. Testing kernel 5.15.1, laptop lid open ====

3.0. When booting 5.15.1 with my laptop lid **open**, "Loading initial ramdisk ..." succeeds, I get a correct LUKS prompt, and I'm able to boot.

==== 4. Testing kernel 5.15.1, laptop lid closed ====

4.0. When booting 5.15.1 with my laptop lid **closed**, boot sequence display stays frozen at "Loading initial ramdisk ..." and does *not* show the LUKS prompt.

4.1. Said differently, while 5.14.16 displays LUKS garbage (see point 2.1. and screenshot), it is garbage looking like a LUKS prompt and garbage that reacts to keyboard input. On the contrary, 5.15.1 stays frozen at "Loading initial ramdisk" and displays nothing about LUKS, not even garbage, leaving me to think the boot crashed / was aborted.

4.2. At this point, no keyboard password input will cause the display to update, display stays frozen...

4.3. ... until I do one of two effectful things:
A. Ctrl+Alt+Del to reboot
B. Type my LUKS password and press Enter, which as mentioned above, will successfully decrypt my LUKS, get back a working graphical mode, and boot to my DE.

4.4. Also of interest: with the laptop lid closed, earlyprintk logs will *not* display. The display always remains "frozen at the last frame of 'Loading initial ramdisk'". Again, the two only effectful things I found when in this state are to Ctrl+Alt+Del, or to enter my password + hit Enter.

==== Conclusion ====

Does that ring a bell? Any boot flags for me to try? (I already tried these forum suggestions: nomodeset i915.modeset=0 acpi=off iommu=soft). Should I file a bug upstream?

Also, tomorrow or during the week I'll try a few things to get more data and check whether that's a software or hardware issue: 1. different display, 2. different HDMI cable, 3. different distro.
Comment by Ronan (ronjouch) - Tuesday, 09 November 2021, 03:25 GMT
I confirm that the framebuffer kernel flags suggested at https://bugs.archlinux.org/task/72645 and pointed to at https://bbs.archlinux.org/viewtopic.php?pid=2002047#p2002047 successfully fixes for me the LUKS prompt not displaying (Intel Skylake GT2 / HD Graphics 520 / i915):

```
CONFIG_FB_UVESA=m
CONFIG_FB_VESA=y
CONFIG_FB_EFI=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y
```

Adding this information to the other Flyspray bug, requesting closure of this one, and will follow-up upstream regarding the pre-existing graphical glitch.
Comment by Marius (Martchus) - Tuesday, 09 November 2021, 16:53 GMT
I've just noticed that this ticket has been re-opened. Considering this separately would make sense indeed. Note that adjusting the initramfs might also help, see my comment in https://bugs.archlinux.org/task/72645 and what Alpine did: https://gitlab.alpinelinux.org/alpine/aports/-/commit/5f853a3eba702b41918edbe939cd099065f51633. I haven't had the time to test it, though (and I'm honestly also not sure how to transfer this change to Arch Linux). Otherwise we could of course also just change the config like it was mentioned in previous comments. (I'm not sure what the best approach is.)

@heftig If you have no system to reproduce the issue yourself, I can try to test some (fixed) packages.
Comment by loqs (loqs) - Tuesday, 09 November 2021, 17:12 GMT
@Martchus simpledrm and simplefb are built into the Arch kernel so would seem to be a different issue than addressed by the Alpine fix.
Comment by Jan Alexander Steffens (heftig) - Tuesday, 09 November 2021, 17:19 GMT
Should be improved in linux 5.15.1.arch1-2
Comment by Akatsuki Rui (akiirui) - Tuesday, 09 November 2021, 17:27 GMT
@heftig Emm, the 5.15.1.arch1-2 still has #72645 issue.
Comment by Marius (Martchus) - Tuesday, 09 November 2021, 17:33 GMT
Unfortunately this doesn't solve the issue for me. However, I'm also unable to login when trying to type the password "blindly" into the messed output. So possibly it is a different issue in my case after all.
Comment by Marius (Martchus) - Tuesday, 09 November 2021, 17:47 GMT
Looks like the file `/usr/lib/modules/<version>/kernel/drivers/gpu/drm/tiny/simpledrm.ko.zst` is absent as of linux 5.15 but exists in 5.14.16 (which is the latest kernel that boots on my system). I am not sure whether this is related, I've just noticed it because the driver came up in the Alpine ticket.
Comment by Ronan (ronjouch) - Tuesday, 09 November 2021, 17:48 GMT
@heftig OP here, I confirm that [testing] linux 5.15.1.arch1-2 fixes the issue: the LUKS prompt now appears like it used to do in 5.14.16. Feel free to close this bug.

I now have two questions:

1. Is upstream already aware of the issue? Should I file a bug upstream?

2. When I write "like it used to do in 5.14.16" it means that, although now visible, the LUKS prompt is visually glitched when displayed through an external monitor. See comment above, section " ==== 2. Stable kernel 5.14.16, laptop lid closed ====", and screenshot "kernel-514-displaying-luks-prompt-already-corrupted.jpg".
-> Do you have any suggestions on how to troubleshoot this, and should I file an upstream bug report? I plan to try in this order: lts, other display, other HDMI cable, other distro. Anything else?
Comment by Akatsuki Rui (akiirui) - Tuesday, 09 November 2021, 17:52 GMT
@Martchus
No, it's not caused by simpledrm.ko.zst.
Because I have built a kernel with a patched config from #72645. It's works fine, and not contain simpledrm.ko.zst.
Comment by Jan Alexander Steffens (heftig) - Tuesday, 09 November 2021, 18:00 GMT
@ronjouch

1) This was a configuration issue, not an upstream issue.

2) Try putting i915 into the initramfs ( MODULES=(i915) in mkinitcpio.conf )
Comment by Marius (Martchus) - Tuesday, 09 November 2021, 18:13 GMT
Adding i915 and invoking `mkinitcpio -P` didn't help. Also reinstalling the kernel (after setting `MODULES=(i915)`) didn't help.

Interestingly, the system is able to boot if I load the key from an SD card (and therefore don't need the prompt). After setting `MODULES=(i915)` is looks like the graphics initialization is happening indeed a bit earlier than before (notable because it shortly turns the screen completely black) but that's apparently not soon enough.
Comment by Marius (Martchus) - Monday, 15 November 2021, 11:17 GMT
Just for the record, in my case the issue was fixed by https://bugs.archlinux.org/task/72645.

Loading...