Arch Linux

FS#67131 - [linux] 5.7.6 Hard system lockup with no journal information

Attached to Project: Arch Linux
Opened by LaserEyess (LaserEyess) - Saturday, 27 June 2020, 13:13 GMT
Last edited by freswa (frederik) - Monday, 29 June 2020, 22:09 GMT
Task Type Bug Report
Category Packages: Core
Status Assigned
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
Levente Polyak (anthraxx)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 1
Private No


Description: Hard system lockup when upgrading to 5.7.6. No information whatsoever in any logs I can find. But there is a complete lack of response, even pinging the machine doesn't work. I suspect it is related to amdgpu or drm because it happens when I start my window manager. Downgrading to 5.7.5 fixes this completely. Interestingly enough this bug seems to be in 5.4.49 as well, potentially some backported fix gone wrong?

Additional info:
* linux 5.7.6
* mesa 20.1.2-1
* sway version 1.5-rc1-c8224270

Steps to reproduce:
1. Reboot
2. Start sway
3. Use computer as normal

I have captured this log with drm.debug=1 and debug=1 in my kernel cmdline (way larger than 2 MB). The end of the log is where the freeze occurs, there is nothing interesting there.

Normally, my kernel cmdline is attached (cmdline.txt)
   cmdline (0.1 KiB)
This task depends upon

Comment by loqs (loqs) - Saturday, 27 June 2020, 16:31 GMT
5.7.6 [1] and 5.4.49 [2] share many backports. Can you bisect either of the affected stable branches and locate the causal commit?


Possibly related

The same commit was backported to 5.4.49
Comment by LaserEyess (LaserEyess) - Sunday, 28 June 2020, 00:05 GMT
I can make an attempt to bisect, but unfortunately I don't have time this weekend. I tried booting in 5.7.6 again and did not experience the crash for an hour. I'm going to do some more debugging during the week when I have time.

Upstream bug report for amdgpu:
Comment by LaserEyess (LaserEyess) - Monday, 29 June 2020, 21:57 GMT
Patch from AMD

Been using it for about 30 minutes no, no crashes what so ever. There's a second affirmation in that thread as well, I think this patch fixes this.
Comment by LaserEyess (LaserEyess) - Wednesday, 01 July 2020, 00:54 GMT
Another crash after ~24 hours. Unsure if it's related, but this is a paste of `journalctl -b-1 -k -e`. The actual crash happened between 20:15 and 20:30, I wasn't at the computer at the time.

This is with the patch in the previous comment applied.
Comment by J. Andrew Lanz-O'Brien (jlanzobr) - Wednesday, 01 July 2020, 12:24 GMT
I am affected by this bug as well. Ryzen 3800X and Radeon 5700XT. Downgrading to 5.7.5 completely resolves the issue.