Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in Unsupported. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#68396 - [mesa] 20.2.1-1 [amdgpu] RX580 hard crash and glitched screens

Attached to Project: Arch Linux
Opened by Jarmo (JATothrim) - Friday, 23 October 2020, 17:57 GMT
Task Type Bug Report
Category Packages: Extra
Status Unconfirmed
Assigned To No-one
Architecture All
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No

Details

Description:

Following messages appear on dmesg with both my screens totally screwed up:

kernel: [drm:gfx_v8_0_priv_reg_irq [amdgpu]] *ERROR* Illegal register access in command stream
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=8, emitted seq=9
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1067 thread Xorg:cs0 pid 1068
kernel: amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
kernel: amdgpu: cp is busy, skip halt cp
kernel: amdgpu: rlc is busy, skip halt rlc
....
kernel: snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x870600
kernel: snd_hda_intel 0000:09:00.1: No response from codec, disabling MSI: last cmd=0x00820000
kernel: amdgpu 0000:09:00.0: amdgpu: failed to suspend display audio
kernel: amdgpu: cp is busy, skip halt cp
kernel: amdgpu: rlc is busy, skip halt rlc
kernel: amdgpu 0000:09:00.0: amdgpu: GPU BACO reset
kernel: snd_hda_intel 0000:09:00.1: No response from codec, resetting bus: last cmd=0x00820000
....

Also the system crashes instantly once Xorg or LightDM launches at boot.
The system may survive if I switch to empty VT soon as screen glitching occurs.
I have bisected the changes so that the crash is not related to an hardware nor kernel issue

This was caused by mesa-20.2.1-1-x86_64 upgrade.
If fact, the system may crash immediately after the pacman has finished upgrading the package!?

Additional info:
GPU:
glxinfo:
Vendor: X.Org (0x1002)
Device: Radeon RX 580 Series (POLARIS10, DRM 3.38.0, 5.8.13-7-rzen+, LLVM 10.0.1) (0x67df)
Version: 20.1.8
Accelerated: yes
Video memory: 8192MB
Unified memory: no
Preferred profile: core (0x1)
Max core profile version: 4.6
Max compat profile version: 4.6
Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.2
lspci:
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7)
09:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590]
CPU:
AMD Ryzen 7 2700 Eight-Core Processor


Steps to reproduce:

1) I install mesa-20.2.1-1-x86_64 with above AMDGPU hardware.
2) Launch any GPU accelerated program
3) Enjoy pretty glitched screens


I marked this as critical because I can't do 'pacman -Syu' because it crashes the system.
This task depends upon

2020-10-27: A task closure has been requested. Reason for request: Issue is solved.
Comment by Vitali Malyshkin (vitalyam13gmail.com) - Friday, 23 October 2020, 19:57 GMT
Hello! I have the same GPU and CPU AMD FX8320. I haven't such problems. I also play games in Steam and I don't have any problems with my video card.
Comment by Jarmo (JATothrim) - Saturday, 24 October 2020, 00:38 GMT
@Vitali Malyshkin: You have mesa-20.2.1-1 installed?

Since I reported this problem, I did full upgrade again and I still had to revert to mesa-20.1.8-1 to get into desktop.
I will try more with arch kernel (It did crash, I'm on self built v5.8.16 mainline now..) and verify more that my trouble It isn't my fault.

Never the less, any ideas how to fix this?

todo list for my self:
-boot into -arch kernel..
-what amdgpu module params are set to?
-double check the hardware is sane..
Comment by loqs (loqs) - Saturday, 24 October 2020, 02:31 GMT
Can you reproduce the issue with mesa 20.2.0-2? If it is not in your package cache you can obtain it from [1].
Could you bisect the mesa package to locate the causal commit?

[1] https://wiki.archlinux.org/index.php/Arch_Linux_Archive
Comment by Vitali Malyshkin (vitalyam13gmail.com) - Saturday, 24 October 2020, 05:10 GMT
@Jarmo: Yes, I have mesa 20.2.1-1 and the system works fine. I haven't installed any graphic drivers for Xorg like xf86-video... and others. Now I use Wayland, but for you I have tried Xprg session in my Gnome desktop and Xorg works fine too. I don't know how yo help you (
Comment by Vitali Malyshkin (vitalyam13gmail.com) - Saturday, 24 October 2020, 05:16 GMT
@Jarmo: Have you tried with linux 5.9.1.arch1-1. I use it.
Comment by Jarmo (JATothrim) - Saturday, 24 October 2020, 13:24 GMT
I forgot to tell what my main Desktop is:
LightDM, Xorg and cinnamon.
the system crashes also with with just LightDM starting or running "startxfce4" on VT.
I haven't tested any Wayland desktops yet.

I have some non-default amdgpu module params:
"options amdgpu dc=1 gpu_recovery=1 send_sigterm=1 mcbp=1 mes=1 moverate=1024"
I'll try test without these to see if they are the problem.

After that I will try bisect the packages on 5.9.1-arch1-1...
Comment by Jarmo (JATothrim) - Saturday, 24 October 2020, 14:57 GMT
Apparently "options amdgpu ... mcbp=1 ..." module parameter was causing the mayhem:
mesa 20.2.1-1 works fine if I boot without "mcbp=1". Ohff!

Now the question is what changed between mesa-20.1.8 and mesa-20.2.x related to the module param?

This problem is solved now, so it can be closed.

Loading...