Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in Unsupported. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#55141 - [linux] "BUG: unable to handle kernel paging request" with 4.12.3, 4.12.5

Attached to Project: Arch Linux
Opened by Jeff Cook (jeffcookio) - Monday, 14 August 2017, 16:35 GMT
Last edited by Doug Newgard (Scimmia) - Tuesday, 15 August 2017, 15:21 GMT
Task Type Bug Report
Category Kernel
Status Assigned
Assigned To Tobias Powalowski (tpowa)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 1
Private No

Details

Description:

After 36-48 hours of heavy KVM usage, I get the attached output, which begins with:

Aug 14 06:30:28 kvm_master kernel: BUG: unable to handle kernel paging request at ffffffffc0955f96
kernel: IP: report_bug+0x94/0x120
kernel: PGD a67a0c067
kernel: P4D a67a0c067
kernel: PUD a67a0e067
kernel: PMD 495886067
kernel: PTE 800000049bbdd161

I've experienced this on 4.12.3 and 4.12.5 so far (I also experienced another KVM lockup on 4.12.3 which may have been resolved). I do *not* experience this on 4.11.9, the last 4.11 kernel that was packaged by Arch. Things are stable on 4.11.9.

Does not seem to be triggered by anything specific, just happens after some usage. I am making extensive use of PCI passthrough (USB controller + GPUs; 1 GPU to one VM, another GPU to another, both Win10) and have several VMs running on the host (4 Linux VMs + 2 Windows VMs).

On this error condition, other VMs and the host system initially remain responsive, but they too fail after issuing a few commands. Messages indicating a soft CPU lockup are emitted regularly:

kernel: INFO: rcu_preempt detected stalls on CPUs/tasks:
kernel: Tasks blocked on level-1 rcu_node (CPUs 0-15): P2837
kernel: (detected by 23, t=990092 jiffies, g=2933189, c=2933188, q=9740206)

See attached for full oops, /proc/cpuinfo, and /proc/meminfo.
This task depends upon

Comment by Jeff Cook (jeffcookio) - Monday, 14 August 2017, 16:41 GMT
Clarification: this crash doesn't trigger explicit "soft CPU lockup" messages in the journal (that's the other crash that I haven't observed on 4.12.5), so describing the failure as a "soft CPU lockup" was probably dumb. This crash _does_ trigger the pasted rcu_preempt message.
Comment by loqs (loqs) - Monday, 14 August 2017, 18:18 GMT
Please try 4.12.6-1 (you might also want to try 4.13-rc5) if that does not resolve the issue bisect the kernel to find the bad commit and report the issue upstream.
Comment by Lily (voidlily) - Monday, 21 August 2017, 01:20 GMT
I'm also having this issue on 4.12.8-2-ck. I'm going to downgrade to 4.11-ck and see if that at least gets me running again for now. My oopses were the same as the reporter.
Comment by Jeff Cook (jeffcookio) - Monday, 21 August 2017, 05:38 GMT

Loading...