FS#55816 - Kernel panic: BUG: unable to handle kernel paging request / acpi_safe_halt

Attached to Project: Arch Linux
Opened by Jakub Okoński (farnoy) - Saturday, 30 September 2017, 16:27 GMT
Last edited by Jan de Groot (JGC) - Friday, 06 October 2017, 11:10 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To No-one
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

My system freezes, sometimes after a couple of hours, sometimes after 20+ hrs of uptime.

Graphics output is frozen, I can no longer ssh into my machine etc. I was able to get an output from the kernel by using netconsole and sending it to another computer.


Additional info:
* package version(s)
core/linux 4.13-3-1
* config and/or log files etc.
I attached the dmesg output that was sent through netconsole, it shows the kernel panic details

I am using an AMD Ryzen Threadripper 1950X on a X399 Taichi motherboard, I have heard of such problems before and already tried disabling a couple of options in BIOS: "Global C-state Control", SMT, OpCache.

For what it's worth, I also get blue screens on Windows 10 with messages like DRIVER_IRQL_LESS_THAN_OR_EQUAL_TO or page fault in non paged area.


Steps to reproduce:
I don't have any reliable way of reproducing it, it just happens after some time.
This task depends upon

Closed by  Jan de Groot (JGC)
Friday, 06 October 2017, 11:10 GMT
Reason for closing:  Not a bug
Comment by Jan de Groot (JGC) - Saturday, 30 September 2017, 16:46 GMT
If windows 10 also crashes this is likely a hardware fault. Probably issues with memory incompatibility.
Comment by Jakub Okoński (farnoy) - Saturday, 30 September 2017, 17:00 GMT
Is there a way to know for certain? I already ran 10hrs of memtest86 on this configuration and it hasn't found anything.
Comment by Jan de Groot (JGC) - Monday, 02 October 2017, 08:18 GMT
The problem with memtest86 is that it stresses the memory, not the rest of the system. The memory itself probably isn't faulty.

I had systems in the past that would run memtest86 for hours without issues, but as soon as I started compiling it would crash: the chipset couldn't handle 3 memory sticks without becoming unstable.
Same with another system, memtest86 passes fine, but as soon as video drivers were upgraded, the system would freeze after rendering 2 frames in 3D. Replacing memory solved that issue.

Anyways, the trace you attached, is it always the same type of trace, or is it random? You could try to install linux-lts to see if an older kernel is also affected.
Comment by Jakub Okoński (farnoy) - Monday, 02 October 2017, 09:39 GMT
I downclocked my memory after your first message and I've only had one crash on Windows since then. I have netconsole running on linux, so I'll report back when I get another crash.
Comment by Doug Newgard (Scimmia) - Friday, 06 October 2017, 03:20 GMT
So what's the verdict?
Comment by Jakub Okoński (farnoy) - Friday, 06 October 2017, 10:39 GMT
It hasn't happened for a week after I downclocked memory, so we can close this. Thanks for the help!

Loading...