Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#70236 - [linux][linux-zen] 5.11 - Very slow boot process. Soft lockup.

Attached to Project: Arch Linux
Opened by env (ENV25) - Tuesday, 30 March 2021, 19:33 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Thursday, 14 October 2021, 21:49 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Jan Alexander Steffens (heftig)
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

Soft lockups during early boot. Takes very long to boot. Many stack traces in dmesg once finished booting (see attachment).

This only happens in 5.11 kernels (both arch and zen), 5.10 lts kernel works fine.
The messages "watchdog: BUG: soft lockup - CPU#x stuck for 23s! [xxxxx/xx]" are very vague, I can't what exactly the problem is. I don't know how to interpret the stack traces either, help would be welcome.

Additional info:

* package version(s)

kernels
5.11.10.arch1-1
5.11.10.zen1-1

* config and/or log files etc.

Laptop: HP Pavilion Laptop 13-an0xxx
CPU: i3-8145U
Boot: UEFI sd-boot
Graphics: xorg w/ i915 (no nvidia or radeon)
DE: plasma & sddm

dmesg dump is attached.
This task depends upon

Closed by  Sven-Hendrik Haase (Svenstaro)
Thursday, 14 October 2021, 21:49 GMT
Reason for closing:  Fixed
Additional comments about closing:  2021-10-12: A task closure has been requested. Reason for request: This doesn't happen anymore in the same way.
Comment by Jan Alexander Steffens (heftig) - Wednesday, 31 March 2021, 08:27 GMT
What's your modprobe and mkinitcpio config?
Comment by env (ENV25) - Wednesday, 31 March 2021, 09:38 GMT
I don't have anything in `/etc/modprobe.d` or `/etc/syscall.d`.

In mkinitcpio.conf I replaced udev with systemd and I have i915 module added. I use kernel-install to automatically concatenate microcode.
Same issues with default settings and separate microcode.

I have "root=PARTLABEL=ROOT rw resume=PARTLABEL-swap quiet splash" in my kernel command-line for quiet boot (no plymouth).
Same issue with "root=PARTLABEL=ROOT rw".

Also note the "soft lockup" messages happen before systemd's initramfs messages, so it probably happens before initramfs.

I also have some ACPI errors, but I've always had those.
Comment by env (ENV25) - Wednesday, 31 March 2021, 09:39 GMT
Oops I meant "resume=PARTLABEL=SWAP".
Comment by Jan Alexander Steffens (heftig) - Wednesday, 31 March 2021, 11:07 GMT
Where do these options from? You say you have nothing in /etc/modprobe.d and they're not in your command line, either.

[ 95.255064] Setting dangerous option enable_guc - tainting kernel
[ 95.255067] Setting dangerous option enable_fbc - tainting kernel
Comment by Jan Alexander Steffens (heftig) - Wednesday, 31 March 2021, 11:09 GMT
Also, the dump is from kernel 5.11.8. Is this still happening with 5.11.11?
Comment by Jan Alexander Steffens (heftig) - Wednesday, 31 March 2021, 11:11 GMT
The lockups seem to happen when ACPI is initialized? Is your system firmware up-to-date?
Comment by env (ENV25) - Wednesday, 31 March 2021, 11:36 GMT
The dump was from a few days ago, when I posted it in forums. I did not have time to make a new one.

I've attached a new dmesg dump.

Yes, this still happens in 15.11.11 . Compared to previous version, it doesn't seem to happen consistently. Sometimes it boots correctly. 50% chance.

I did not know it was possible to update firmware. I'll look it up.
Comment by Jan Alexander Steffens (heftig) - Wednesday, 31 March 2021, 11:46 GMT
It might be helpful to trigger magic sysrq 't' (show task states) while it's hanging.
Comment by env (ENV25) - Wednesday, 31 March 2021, 12:30 GMT
It seems all bios and firmware updates are in Windows EXEs. I'll try to run them in my old laptop and see what happens.

I tried to do sysrq 't' but it didn't work, I think. I'll try something else later.
https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html

I've seen a few issues, in this bugtracker and elsewhere, that are similar to mine but not exactly the same. Is this kind of thing common?
Comment by Paul Kerry (paulkerry) - Wednesday, 31 March 2021, 18:53 GMT
To upgrade HP bios using the windows exe file - see https://bbs.archlinux.org/viewtopic.php?pid=1870408#p1870408
Comment by env (ENV25) - Thursday, 01 April 2021, 13:41 GMT
I'm out of luck, I can't update my ssd firmware or my UEFI bios. I'll need a windows USB stick.

The ssd firmware might fix acpi errors. It seems there was an issue with Drive Self Test.

https://support.hp.com/us-en/drivers/selfservice/HP-Pavilion-13-an0000-Laptop-PC/23238359/
Comment by Jan Alexander Steffens (heftig) - Thursday, 01 April 2021, 13:58 GMT
Might be worth trying to have the BIOS update the firmware from the downloaded .exe, directly. At least Dell supports this. Maybe HP does, too.
Comment by env (ENV25) - Saturday, 10 April 2021, 13:01 GMT
Update: Issue persists in kernel in 5.11.12.arch1-1 . I did not get time to try updating
the firmware, I am planning to try using Windows PE:
https://wiki.archlinux.org/index.php/Windows_PE .
Comment by env (ENV25) - Monday, 03 May 2021, 08:30 GMT
Issue continues even after BIOS update.

BIOS update instructions:
https://gist.github.com/eNV25/c8001491dc0440656ff7b0ae18993ba1
Comment by env (ENV25) - Saturday, 10 July 2021, 18:02 GMT
I want to try bisecting this issue.

What PKGBUILD should I use?
Comment by env (ENV25) - Thursday, 22 July 2021, 07:35 GMT
<s>This issue doesn't happen in linux 5.13.4.arch1-1, but it still occurs in linux-zen 5.13.4.zen1-1.</s>

This issue happens in both linux 5.13.4.arch1-1 and linux-zen 5.13.4.zen1-1. It happens when rebooting but not after poweroff.

I wasn't able to find the commit. My laptop is too slow to compile or bisect linux.
Comment by env (ENV25) - Tuesday, 12 October 2021, 12:58 GMT
You can close this task. This doesn't happen anymore in the same way.

Loading...