FS#55141 - [linux] "BUG: unable to handle kernel paging request" with 4.12.3, 4.12.5
Attached to Project:
Arch Linux
Opened by Jeff Cook (jeffcookio) - Monday, 14 August 2017, 16:35 GMT
Last edited by Toolybird (Toolybird) - Sunday, 28 May 2023, 06:07 GMT
Opened by Jeff Cook (jeffcookio) - Monday, 14 August 2017, 16:35 GMT
Last edited by Toolybird (Toolybird) - Sunday, 28 May 2023, 06:07 GMT
|
Details
Description:
After 36-48 hours of heavy KVM usage, I get the attached output, which begins with: Aug 14 06:30:28 kvm_master kernel: BUG: unable to handle kernel paging request at ffffffffc0955f96 kernel: IP: report_bug+0x94/0x120 kernel: PGD a67a0c067 kernel: P4D a67a0c067 kernel: PUD a67a0e067 kernel: PMD 495886067 kernel: PTE 800000049bbdd161 I've experienced this on 4.12.3 and 4.12.5 so far (I also experienced another KVM lockup on 4.12.3 which may have been resolved). I do *not* experience this on 4.11.9, the last 4.11 kernel that was packaged by Arch. Things are stable on 4.11.9. Does not seem to be triggered by anything specific, just happens after some usage. I am making extensive use of PCI passthrough (USB controller + GPUs; 1 GPU to one VM, another GPU to another, both Win10) and have several VMs running on the host (4 Linux VMs + 2 Windows VMs). On this error condition, other VMs and the host system initially remain responsive, but they too fail after issuing a few commands. Messages indicating a soft CPU lockup are emitted regularly: kernel: INFO: rcu_preempt detected stalls on CPUs/tasks: kernel: Tasks blocked on level-1 rcu_node (CPUs 0-15): P2837 kernel: (detected by 23, t=990092 jiffies, g=2933189, c=2933188, q=9740206) See attached for full oops, /proc/cpuinfo, and /proc/meminfo. |
This task depends upon
Closed by Toolybird (Toolybird)
Sunday, 28 May 2023, 06:07 GMT
Reason for closing: No response
Additional comments about closing: Plus it's old and stale. If still an issue, please follow up with upstream.
Sunday, 28 May 2023, 06:07 GMT
Reason for closing: No response
Additional comments about closing: Plus it's old and stale. If still an issue, please follow up with upstream.
Comment by Jeff Cook (jeffcookio) -
Monday, 14 August 2017, 16:41 GMT
Comment by loqs (loqs) - Monday, 14
August 2017, 18:18 GMT
Comment by Lily (voidlily) -
Monday, 21 August 2017, 01:20 GMT
Comment by Jeff Cook (jeffcookio) -
Monday, 21 August 2017, 05:38 GMT
Comment by mattia (nTia89) - Sunday,
27 February 2022, 14:00 GMT
Clarification: this crash doesn't trigger explicit "soft CPU
lockup" messages in the journal (that's the other crash that I
haven't observed on 4.12.5), so describing the failure as a "soft
CPU lockup" was probably dumb. This crash _does_ trigger the
pasted rcu_preempt message.
Please try 4.12.6-1 (you might also want to try 4.13-rc5) if that
does not resolve the issue bisect the kernel to find the bad
commit and report the issue upstream.
I'm also having this issue on 4.12.8-2-ck. I'm going to downgrade
to 4.11-ck and see if that at least gets me running again for now.
My oopses were the same as the reporter.
Similar bug report upstream:
https://bugzilla.kernel.org/show_bug.cgi?id=196685
I cannot reproduce the issue. Is it still valid?