FS#70663 - [linux] 5.12.0-arch1-1 - fails to boot - watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd-
Attached to Project:
Arch Linux
Opened by James (thx1138) - Friday, 30 April 2021, 15:45 GMT
Last edited by Andreas Radke (AndyRTR) - Monday, 14 June 2021, 16:36 GMT
Opened by James (thx1138) - Friday, 30 April 2021, 15:45 GMT
Last edited by Andreas Radke (AndyRTR) - Monday, 14 June 2021, 16:36 GMT
|
Details
Upgrade to linux 5.12.arch1-1
System log throws: ... watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd-udevd: 241] ... RIP: 0010:smp_call_function_single+0xf7/0x140 ... Call Trace: ? __flush_tlb_all+0x30/0x30 ? __flush_tlb_all+0x30/0x30 on_each_cpu+0x39/0x90 ... and repeats indefinitely. smp_call_function_single is defined in kernel/smp.c For now, reverting to 5.11 or lts. |
This task depends upon
Closed by Andreas Radke (AndyRTR)
Monday, 14 June 2021, 16:36 GMT
Reason for closing: Fixed
Additional comments about closing: 5.12.10.arch1-1
Monday, 14 June 2021, 16:36 GMT
Reason for closing: Fixed
Additional comments about closing: 5.12.10.arch1-1
Mobile Intel 945PM Express Chipset
ICH7-M
https://bugs.archlinux.org/task/70236
7c70f3a7488d2fa62d32849d138bf2b8420fe788 is the first bad commit
commit 7c70f3a7488d2fa62d32849d138bf2b8420fe788
Merge: 20bf195e9391 4d12b7275386
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon Feb 22 13:29:55 2021 -0800
Merge tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull more nfsd updates from Chuck Lever:
"Here are a few additional NFSD commits for the merge window:
Optimization:
- Cork the socket while there are queued replies
Fixes:
- DRC shutdown ordering
- svc_rdma_accept() lockdep splat"
* tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Further clean up svc_tcp_sendmsg()
SUNRPC: Remove redundant socket flags from svc_tcp_sendmsg()
SUNRPC: Use TCP_CORK to optimise send performance on the server
svcrdma: Hold private mutex while invoking rdma_accept()
nfsd: register pernet ops last, unregister first
fs/nfsd/nfsctl.c | 14 ++++++-------
include/linux/sunrpc/svcsock.h | 2 ++
net/sunrpc/svcsock.c | 35 ++++++++++++++++----------------
net/sunrpc/xprtrdma/svc_rdma_transport.c | 6 +++---
4 files changed, 29 insertions(+), 28 deletions(-)
--------------
There is a small chance that this bisect is not precise, because sometimes the system can boot to a temporarily working state, then lock-up after a short time. I did not test every successful initial boot extensively.
This particular commit does not produce the same "watchdog: BUG: soft lockup" log message. Instead, after sometimes booting to an Xorg display, the system just completely freezes, with not so much as the system log still working.
Trying to bisect, I arrived at a different set of commits though.
7a800a20ae6329e803c5c646b20811a6ae9ca136 showed the issue described, where a seemingly working kernel will lock up rather quickly.
f007a3d66c5480c8dae3fa20a89a06861ef1f5db worked flawlessly, without any hiccups doing random internet browsing while I was compiling the next bisect step.
However, there are six commits between those, that did not boot and left me stuck with a black screen right after the bootloader (so no systemd startup message or similar). The system did not react to any inputs (Alt+SysRq) or to a short press of the PC's power button, and thus a hard shutdown was necessary.
Attached is the git log for the offending commits (including the good and bad ones), as to not needlessly fill up the comments with long logs.
In case it helps narrowing the issue, the hardware in use is an Intel i7-6700K (non-overclocked) CPU, 32GB of RAM (at the lowest XMP profile, 2133 or whatever the relevant numbers are), and an AMD Radeon RX 480 GPU. Storage is a bcache setup using a 3TB HDD and half of an 256GB M.2 SSD, which might be relevant since the offending commits concern the block subsystem.
I will try to get the kernel log from as close as possible to the lockup when I find the time for it.
Well, turns out I should've googled (or at least looked at the bcache wiki entry) at first, which points to a known bug involving bcache and 5.12: https://www.spinics.net/lists/linux-bcache/msg10077.html
I still find it interesting that I get the same symptoms that James describes, but other than that the issues don't seem to be related.