Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#70236 - [linux][linux-zen] 5.11 - Very slow boot process. Soft lockup.

Attached to Project: Arch Linux
Opened by env (ENV25) - Tuesday, 30 March 2021, 19:33 GMT
Last edited by Andreas Radke (AndyRTR) - Wednesday, 31 March 2021, 06:28 GMT
Task Type Bug Report
Category Kernel
Status Assigned
Assigned To Jan Alexander Steffens (heftig)
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No

Details

Description:

Soft lockups during early boot. Takes very long to boot. Many stack traces in dmesg once finished booting (see attachment).

This only happens in 5.11 kernels (both arch and zen), 5.10 lts kernel works fine.
The messages "watchdog: BUG: soft lockup - CPU#x stuck for 23s! [xxxxx/xx]" are very vague, I can't what exactly the problem is. I don't know how to interpret the stack traces either, help would be welcome.

Additional info:

* package version(s)

kernels
5.11.10.arch1-1
5.11.10.zen1-1

* config and/or log files etc.

Laptop: HP Pavilion Laptop 13-an0xxx
CPU: i3-8145U
Boot: UEFI sd-boot
Graphics: xorg w/ i915 (no nvidia or radeon)
DE: plasma & sddm

dmesg dump is attached.
This task depends upon

Comment by Jan Alexander Steffens (heftig) - Wednesday, 31 March 2021, 08:27 GMT
What's your modprobe and mkinitcpio config?
Comment by env (ENV25) - Wednesday, 31 March 2021, 09:38 GMT
I don't have anything in `/etc/modprobe.d` or `/etc/syscall.d`.

In mkinitcpio.conf I replaced udev with systemd and I have i915 module added. I use kernel-install to automatically concatenate microcode.
Same issues with default settings and separate microcode.

I have "root=PARTLABEL=ROOT rw resume=PARTLABEL-swap quiet splash" in my kernel command-line for quiet boot (no plymouth).
Same issue with "root=PARTLABEL=ROOT rw".

Also note the "soft lockup" messages happen before systemd's initramfs messages, so it probably happens before initramfs.

I also have some ACPI errors, but I've always had those.
Comment by env (ENV25) - Wednesday, 31 March 2021, 09:39 GMT
Oops I meant "resume=PARTLABEL=SWAP".
Comment by Jan Alexander Steffens (heftig) - Wednesday, 31 March 2021, 11:07 GMT
Where do these options from? You say you have nothing in /etc/modprobe.d and they're not in your command line, either.

[ 95.255064] Setting dangerous option enable_guc - tainting kernel
[ 95.255067] Setting dangerous option enable_fbc - tainting kernel
Comment by Jan Alexander Steffens (heftig) - Wednesday, 31 March 2021, 11:09 GMT
Also, the dump is from kernel 5.11.8. Is this still happening with 5.11.11?
Comment by Jan Alexander Steffens (heftig) - Wednesday, 31 March 2021, 11:11 GMT
The lockups seem to happen when ACPI is initialized? Is your system firmware up-to-date?
Comment by env (ENV25) - Wednesday, 31 March 2021, 11:36 GMT
The dump was from a few days ago, when I posted it in forums. I did not have time to make a new one.

I've attached a new dmesg dump.

Yes, this still happens in 15.11.11 . Compared to previous version, it doesn't seem to happen consistently. Sometimes it boots correctly. 50% chance.

I did not know it was possible to update firmware. I'll look it up.
Comment by Jan Alexander Steffens (heftig) - Wednesday, 31 March 2021, 11:46 GMT
It might be helpful to trigger magic sysrq 't' (show task states) while it's hanging.
Comment by env (ENV25) - Wednesday, 31 March 2021, 12:30 GMT
It seems all bios and firmware updates are in Windows EXEs. I'll try to run them in my old laptop and see what happens.

I tried to do sysrq 't' but it didn't work, I think. I'll try something else later.
https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html

I've seen a few issues, in this bugtracker and elsewhere, that are similar to mine but not exactly the same. Is this kind of thing common?
Comment by Paul Kerry (paulkerry) - Wednesday, 31 March 2021, 18:53 GMT
To upgrade HP bios using the windows exe file - see https://bbs.archlinux.org/viewtopic.php?pid=1870408#p1870408
Comment by env (ENV25) - Thursday, 01 April 2021, 13:41 GMT
I'm out of luck, I can't update my ssd firmware or my UEFI bios. I'll need a windows USB stick.

The ssd firmware might fix acpi errors. It seems there was an issue with Drive Self Test.

https://support.hp.com/us-en/drivers/selfservice/HP-Pavilion-13-an0000-Laptop-PC/23238359/
Comment by Jan Alexander Steffens (heftig) - Thursday, 01 April 2021, 13:58 GMT
Might be worth trying to have the BIOS update the firmware from the downloaded .exe, directly. At least Dell supports this. Maybe HP does, too.
Comment by env (ENV25) - Saturday, 10 April 2021, 13:01 GMT
Update: Issue persists in kernel in 5.11.12.arch1-1 . I did not get time to try updating
the firmware, I am planning to try using Windows PE:
https://wiki.archlinux.org/index.php/Windows_PE .
Comment by env (ENV25) - Monday, 03 May 2021, 08:30 GMT
Issue continues even after BIOS update.

BIOS update instructions:
https://gist.github.com/eNV25/c8001491dc0440656ff7b0ae18993ba1

Loading...