FS#74346 - [linux] 5.17.1 NULL pointer dereference in amdgpu
Attached to Project:
Arch Linux
Opened by Lars Beckers (extmind) - Monday, 04 April 2022, 13:55 GMT
Last edited by Jelle van der Waa (jelly) - Thursday, 14 September 2023, 17:55 GMT
Opened by Lars Beckers (extmind) - Monday, 04 April 2022, 13:55 GMT
Last edited by Jelle van der Waa (jelly) - Thursday, 14 September 2023, 17:55 GMT
|
Details
Description:
Successfully resumed after suspend, changed display configuration, and changes did apply. But shortly after the system stopped responding to anything. Log shows a kernel trace, stating a NULL pointer dereference. Additional info: * linux 5.17.1-arch1-1 * Hardware: Thinkpad T14s with "AMD Ryzen 7 PRO 4750U with Radeon Graphics" (iGPU) * attached kernel log, retrieved after a forced reboot Steps to reproduce: Did not happen previously when changing displays during the same boot. |
This task depends upon
Closed by Jelle van der Waa (jelly)
Thursday, 14 September 2023, 17:55 GMT
Reason for closing: Deferred
Additional comments about closing: Old kernel, please retry with the latest
Thursday, 14 September 2023, 17:55 GMT
Reason for closing: Deferred
Additional comments about closing: Old kernel, please retry with the latest
(The logs I've joined are made with initcall_debug, no_console_suspend, ignore_loglevel for a decent amount of debugging output possible) but I guess my issue is a complete other bug and not related to yours ?
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV370 [Radeon X300] (prog-if 00 [VGA controller])
The dmesg output for the NULL pointer dereference is:
[ 9.660937] [drm] amdgpu kernel modesetting enabled.
[ 9.661025] amdgpu: CRAT table not found
[ 9.661028] amdgpu: Virtual CRAT table created for CPU
[ 9.661040] amdgpu: Topology: Add CPU node
[ 9.661296] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x5B70 0x1002:0x0F03 0x00).
[ 9.661302] amdgpu 0000:01:00.1: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[ 9.661305] amdgpu 0000:01:00.1: amdgpu: Fatal error during GPU init
[ 9.661318] amdgpu: probe of 0000:01:00.1 failed with error -12
[ 9.661338] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 9.661384] #PF: supervisor write access in kernel mode
[ 9.661411] #PF: error_code(0x0002) - not-present page
[ 9.661440] PGD 0 P4D 0
[ 9.661454] Oops: 0002 [#1] PREEMPT SMP NOPTI
(full snippet with backtrace included as attachment)
This has progressively gotten worse. Let me know what else to send and I'm happy to do it.
# inxi -c0 -C --gpu --memory --machine --sensors --system
Fixes: cfbb6b004744 ("drm/amdgpu: Rework reset domain to be refcounted.")
Signed-off-by: Zhang Boyang <zhangboyang.id@gmail.com>
Link:a8bce489-8ccc-aa95-3de6-f854e03ad557@suddenlinkmail.com/"> https://lore.kernel.org/lkml/a8bce489-8ccc-aa95-3de6-f854e03ad557@suddenlinkmail.com/
Link:AT9WHR.3Z1T3VI9A2AQ3@att.net/"> https://lore.kernel.org/lkml/AT9WHR.3Z1T3VI9A2AQ3@att.net/
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
<snip>
Not sure when we will get it, but it couldn't happen quick enough. Every kernel update locks the box on reboot.
Edit:
6.0 with change from [1] applied:
https://drive.google.com/file/d/1ZKZVSs4tlwVlpNQuq7cwcawAbKTURZQI/view?usp=sharing linux-6.0-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1O8Blidk_8tCMigf759aoB2-olI_XTfNx/view?usp=sharing linux-headers-6.0-1-x86_64.pkg.tar.zst
[1] https://lore.kernel.org/all/20220930214110.1074226-2-zhangboyang.id%40gmail.com/