FS#50397 - [linux] radeon: ring 0 stalled for more than 10250msec
Attached to Project:
Arch Linux
Opened by Wiktor (typh00nz) - Sunday, 14 August 2016, 21:44 GMT
Last edited by freswa (frederik) - Sunday, 13 September 2020, 14:01 GMT
Opened by Wiktor (typh00nz) - Sunday, 14 August 2016, 21:44 GMT
Last edited by freswa (frederik) - Sunday, 13 September 2020, 14:01 GMT
|
Details
Description:
xf86-video-ati 1:7.7.0-1 breaks my OS, causing black screen and freezing whole machine. -- Reboot -- kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10250msec kernel: radeon 0000:01:00.0: failed to get a new IB (-35) kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib ! kernel: radeon 0000:01:00.0: failed to get a new IB (-35) kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib ! kernel: BUG: unable to handle kernel paging request at ffffc90400f70ffc kernel: IP: [<ffffffffa06e7ff5>] radeon_ring_backup+0xd5/0x170 [radeon] kernel: RIP [<ffffffffa06e7ff5>] radeon_ring_backup+0xd5/0x170 [radeon] -- Reboot -- Additional info: 4.7.0-1-ARCH, xf86-video-ati 1:7.7.0-1, radeon hd 6850m, Steps to reproduce: Noticed while playing dota2: |
This task depends upon
Card: Advanced Micro Devices [AMD/ATI] RV770 [Radeon HD 4870]
Display Server: X.Org 1.19.3 driver: N/A Resolution: 1920x1080@60.00hz, 1920x1080@60.00hz
GLX Renderer: Gallium 0.4 on AMD RV770 (DRM 2.49.0 / 4.11.3-2-ck-nehalem, LLVM 4.0.0)
GLX Version: 3.0 Mesa 17.1.0
It happens on stock arch kernel too, i use the xf86-video-ati driver, this has been occuring since i began using this GPU about 1 year ago
some kernels seem to work better than others, dmesg often shows output akin to this ..
perf: interrupt took too long (2711 > 2500), lowering kernel.perf_event_max_sample_rate to 73000
perf: interrupt took too long (3512 > 3388), lowering kernel.perf_event_max_sample_rate to 56000
perf: interrupt took too long (4459 > 4390), lowering kernel.perf_event_max_sample_rate to 44000
perf: interrupt took too long (5613 > 5573), lowering kernel.perf_event_max_sample_rate to 35000
hawker64 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to schedule IB !
hawker64 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to schedule IB !
hawker64 kernel: radeon 0000:02:00.0: scheduling IB failed (-2).
hawker64 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to schedule IB
--------------------------------------------------------------
Mar 26 19:22:12 hawker64 kernel: [drm:radeon_uvd_cs_parse [radeon]] *ERROR* Illegal UVD message type (-1)!
Mar 26 19:22:12 hawker64 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Invalid command stream !
Mar 26 19:22:23 hawker64 kernel: radeon 0000:02:00.0: ring 0 stalled for more than 10360msec
Mar 26 19:22:23 hawker64 kernel: radeon 0000:02:00.0: GPU lockup (current fence id 0x00000000004ead16 last fence id 0x00000000004eaddd on ring 0)
Mar 26 19:22:23 hawker64 kernel: radeon 0000:02:00.0: failed to get a new IB (-35)
Mar 26 19:22:23 hawker64 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
Mar 26 19:22:23 hawker64 kernel: radeon 0000:02:00.0: couldn't schedule ib'
Mar 26 19:22:23 hawker64 kernel: [drm:radeon_uvd_suspend [radeon]] *ERROR* Error destroying UVD (-22)!
sorry i have no further logs available, but the original error posted by OP , here is what i often see -http://archlinux.uk/misc/gpulockup.html.-
thankfully not experienced this for some time now, currently on kernel 4.13.9-1-ck-hehalem
(radeon ring/fence errors)
Sometimes i wont experience the hangs for a couple of month, srolling back or switchng betweeen linux-ck & linux does often suffice.
Feb 20 18:09:12 blade kernel: radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000001f1c last fence id 0x0000000000002015 on ring 0)
Feb 20 18:09:12 blade kernel: radeon 0000:02:00.0: failed to get a new IB (-35)
Feb 20 18:09:12 blade kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
Feb 20 18:09:12 blade kernel: radeon 0000:02:00.0: Saved 7961 dwords of commands on ring 0.
Feb 20 18:09:12 blade kernel: radeon 0000:02:00.0: GPU softreset: 0x00000019
-------------------------------------------------------------------------------------------------
Linux blade 4.15.4-1-ck-nehalem #1 SMP PREEMPT Sun Feb 18 09:18:16 EST 2018 x86_64 GNU/Linux
02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV770 [Radeon HD 4870]
Subsystem: PC Partner Limited / Sapphire Technology RV770 [Radeon HD 4870]
Kernel driver in use: radeon
Kernel modules: radeon
ofc it could well be faulty hardware on my end but my google fu seems to validate this as an ongoing driver issue, if symptoms persist i gonna swap out GPU, or i might well still be moaning here in a decade.
regards.
01:05.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RS780L [Radeon 3000]
[drm:radeon_cs_ioctl [radeon]] *ERROR* Invalid command stream !
[drm:radeon_cs_parser_relocs [radeon]] *ERROR* gem object lookup failed >
[drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -2!
radeon 0000:01:05.0: ring 0 stalled for more than 10276msec
I suggest you try 4.18-rc2 / amd-staging-drm-next and if the issue is still present there report it upstream and work with upstream on a resolution.
it's probably not, I've seen similar or same radeon stall error after resuming from suspend to ram for the past months
bug related
https://bugs.archlinux.org/task/55611
xorg-server 1.19
xf86-video-ati 1:7.10
Related forum threads:
https://bbs.archlinux.org/viewtopic.php?id=237659
https://bbs.archlinux.org/viewtopic.php?pid=1787035
CPU: AMD Phenom II X4 P960
iGPU: Radeon Mobility 4570M
dGPU: Radeon 6470M
Linux 4.17.3
Xorg: 1.20
I'm experiencing something similar (random hang after sleep; I got my cursor, but nothing works, only alt-stamp-b (not the entire magic sequence, just b)), came out the system was temporarily frozen while trying to register a usb device; while I still couldn't fix the problem, at least it doesn't force me to hard reboot anymore.
In case (a), the lockup usually follows a resume from hibernation at random times.
Relevant kernel log is:
------------------8<--------------------
radeon 0000:01:00.0: ring 0 stalled for more than 10210msec
radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000004ebd5a last fence id 0x00000000004ebd9b on ring 0)
radeon 0000:01:00.0: Saved 2073 dwords of commands on ring 0.
radeon 0000:01:00.0: GPU softreset: 0x00000008
radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030
radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003
radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200000C0
radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00020186
radeon 0000:01:00.0: R_008680_CP_STAT = 0x80028645
radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00004001
radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030
radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003
radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200080C0
radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000
radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000
radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[drm] PCIE gen 2 link speeds already enabled
[drm] PCIE GART of 512M enabled (table at 0x0000000000142000).
radeon 0000:01:00.0: WB enabled
radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000010000c00 and cpu addr 0x00000000daf5e3fe
radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x00000000000521d0 and cpu addr 0x00000000295680ce
[drm] ring test on 0 succeeded in 1 usecs
[drm] ring test on 5 succeeded in 1 usecs
[drm] UVD initialized successfully.
------------------>8--------------------
In case (b), the lockup may happen after a normal boot. Relevant kernel log looks the same as above:
------------------8<--------------------
kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10134msec
Sep 12 15:43:27 FDGKD5J-ITI-A402B kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000128ee7 last fence id 0x0000000000128ef7 on ring 0)
radeon 0000:01:00.0: Saved 505 dwords of commands on ring 0.
radeon 0000:01:00.0: GPU softreset: 0x00000008
radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030
radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003
radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200000C0
radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00020186
radeon 0000:01:00.0: R_008680_CP_STAT = 0x80028645
radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00004001
radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030
radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003
radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200080C0
radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000
radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000
radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[drm] PCIE gen 2 link speeds already enabled
[drm] PCIE GART of 512M enabled (table at 0x0000000000142000).
radeon 0000:01:00.0: WB enabled
radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000010000c00 and cpu addr 0x00000000645f1bf2
radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x00000000000521d0 and cpu addr 0x000000006c69701c
[drm] ring test on 0 succeeded in 1 usecs
[drm] ring test on 5 succeeded in 1 usecs
[drm] UVD initialized successfully.
------------------>8--------------------
So I'd blame KMS. Please gimme hope before I ditch this piece of kr@p.