FS#55141 - [linux] "BUG: unable to handle kernel paging request" with 4.12.3, 4.12.5

Attached to Project: Arch Linux
Opened by Jeff Cook (jeffcookio) - Monday, 14 August 2017, 16:35 GMT
Last edited by Toolybird (Toolybird) - Sunday, 28 May 2023, 06:07 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:

After 36-48 hours of heavy KVM usage, I get the attached output, which begins with:

Aug 14 06:30:28 kvm_master kernel: BUG: unable to handle kernel paging request at ffffffffc0955f96
kernel: IP: report_bug+0x94/0x120
kernel: PGD a67a0c067
kernel: P4D a67a0c067
kernel: PUD a67a0e067
kernel: PMD 495886067
kernel: PTE 800000049bbdd161

I've experienced this on 4.12.3 and 4.12.5 so far (I also experienced another KVM lockup on 4.12.3 which may have been resolved). I do *not* experience this on 4.11.9, the last 4.11 kernel that was packaged by Arch. Things are stable on 4.11.9.

Does not seem to be triggered by anything specific, just happens after some usage. I am making extensive use of PCI passthrough (USB controller + GPUs; 1 GPU to one VM, another GPU to another, both Win10) and have several VMs running on the host (4 Linux VMs + 2 Windows VMs).

On this error condition, other VMs and the host system initially remain responsive, but they too fail after issuing a few commands. Messages indicating a soft CPU lockup are emitted regularly:

kernel: INFO: rcu_preempt detected stalls on CPUs/tasks:
kernel: Tasks blocked on level-1 rcu_node (CPUs 0-15): P2837
kernel: (detected by 23, t=990092 jiffies, g=2933189, c=2933188, q=9740206)

See attached for full oops, /proc/cpuinfo, and /proc/meminfo.
This task depends upon

Closed by  Toolybird (Toolybird)
Sunday, 28 May 2023, 06:07 GMT
Reason for closing:  No response
Additional comments about closing:  Plus it's old and stale. If still an issue, please follow up with upstream.
Comment by Jeff Cook (jeffcookio) - Monday, 14 August 2017, 16:41 GMT
Clarification: this crash doesn't trigger explicit "soft CPU lockup" messages in the journal (that's the other crash that I haven't observed on 4.12.5), so describing the failure as a "soft CPU lockup" was probably dumb. This crash _does_ trigger the pasted rcu_preempt message.
Comment by loqs (loqs) - Monday, 14 August 2017, 18:18 GMT
Please try 4.12.6-1 (you might also want to try 4.13-rc5) if that does not resolve the issue bisect the kernel to find the bad commit and report the issue upstream.
Comment by Lily (voidlily) - Monday, 21 August 2017, 01:20 GMT
I'm also having this issue on 4.12.8-2-ck. I'm going to downgrade to 4.11-ck and see if that at least gets me running again for now. My oopses were the same as the reporter.
Comment by Jeff Cook (jeffcookio) - Monday, 21 August 2017, 05:38 GMT Comment by mattia (nTia89) - Sunday, 27 February 2022, 14:00 GMT
I cannot reproduce the issue. Is it still valid?

Loading...