FS#50397 - [linux] radeon: ring 0 stalled for more than 10250msec

Attached to Project: Arch Linux
Opened by Wiktor (typh00nz) - Sunday, 14 August 2016, 21:44 GMT
Last edited by freswa (frederik) - Sunday, 13 September 2020, 14:01 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 11
Private No

Details

Description:

xf86-video-ati 1:7.7.0-1 breaks my OS, causing black screen and freezing whole machine.


-- Reboot --
kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10250msec
kernel: radeon 0000:01:00.0: failed to get a new IB (-35)
kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
kernel: radeon 0000:01:00.0: failed to get a new IB (-35)
kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
kernel: BUG: unable to handle kernel paging request at ffffc90400f70ffc
kernel: IP: [<ffffffffa06e7ff5>] radeon_ring_backup+0xd5/0x170 [radeon]
kernel: RIP [<ffffffffa06e7ff5>] radeon_ring_backup+0xd5/0x170 [radeon]
-- Reboot --



Additional info:

4.7.0-1-ARCH,
xf86-video-ati 1:7.7.0-1,
radeon hd 6850m,



Steps to reproduce:

Noticed while playing dota2:
This task depends upon

Closed by  freswa (frederik)
Sunday, 13 September 2020, 14:01 GMT
Reason for closing:  No response
Comment by cirrus (cirrus) - Thursday, 01 June 2017, 22:52 GMT
i experience this using
Card: Advanced Micro Devices [AMD/ATI] RV770 [Radeon HD 4870]
Display Server: X.Org 1.19.3 driver: N/A Resolution: 1920x1080@60.00hz, 1920x1080@60.00hz
GLX Renderer: Gallium 0.4 on AMD RV770 (DRM 2.49.0 / 4.11.3-2-ck-nehalem, LLVM 4.0.0)
GLX Version: 3.0 Mesa 17.1.0
It happens on stock arch kernel too, i use the xf86-video-ati driver, this has been occuring since i began using this GPU about 1 year ago
some kernels seem to work better than others, dmesg often shows output akin to this ..
perf: interrupt took too long (2711 > 2500), lowering kernel.perf_event_max_sample_rate to 73000
perf: interrupt took too long (3512 > 3388), lowering kernel.perf_event_max_sample_rate to 56000
perf: interrupt took too long (4459 > 4390), lowering kernel.perf_event_max_sample_rate to 44000
perf: interrupt took too long (5613 > 5573), lowering kernel.perf_event_max_sample_rate to 35000
hawker64 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to schedule IB !
hawker64 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to schedule IB !
hawker64 kernel: radeon 0000:02:00.0: scheduling IB failed (-2).
hawker64 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to schedule IB
--------------------------------------------------------------
Mar 26 19:22:12 hawker64 kernel: [drm:radeon_uvd_cs_parse [radeon]] *ERROR* Illegal UVD message type (-1)!
Mar 26 19:22:12 hawker64 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Invalid command stream !
Mar 26 19:22:23 hawker64 kernel: radeon 0000:02:00.0: ring 0 stalled for more than 10360msec
Mar 26 19:22:23 hawker64 kernel: radeon 0000:02:00.0: GPU lockup (current fence id 0x00000000004ead16 last fence id 0x00000000004eaddd on ring 0)
Mar 26 19:22:23 hawker64 kernel: radeon 0000:02:00.0: failed to get a new IB (-35)
Mar 26 19:22:23 hawker64 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
Mar 26 19:22:23 hawker64 kernel: radeon 0000:02:00.0: couldn't schedule ib'
Mar 26 19:22:23 hawker64 kernel: [drm:radeon_uvd_suspend [radeon]] *ERROR* Error destroying UVD (-22)!
sorry i have no further logs available, but the original error posted by OP , here is what i often see -http://archlinux.uk/misc/gpulockup.html.-
thankfully not experienced this for some time now, currently on kernel 4.13.9-1-ck-hehalem
(radeon ring/fence errors)
Comment by mattia (nTia89) - Tuesday, 03 October 2017, 20:01 GMT
is this issue still valid?
Comment by cirrus (cirrus) - Tuesday, 30 January 2018, 13:05 GMT
This issue seems to appear only on certain kernels for me at least.
Sometimes i wont experience the hangs for a couple of month, srolling back or switchng betweeen linux-ck & linux does often suffice.
Comment by cirrus (cirrus) - Tuesday, 20 February 2018, 20:19 GMT
still ..
Feb 20 18:09:12 blade kernel: radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000001f1c last fence id 0x0000000000002015 on ring 0)
Feb 20 18:09:12 blade kernel: radeon 0000:02:00.0: failed to get a new IB (-35)
Feb 20 18:09:12 blade kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
Feb 20 18:09:12 blade kernel: radeon 0000:02:00.0: Saved 7961 dwords of commands on ring 0.
Feb 20 18:09:12 blade kernel: radeon 0000:02:00.0: GPU softreset: 0x00000019

-------------------------------------------------------------------------------------------------
Linux blade 4.15.4-1-ck-nehalem #1 SMP PREEMPT Sun Feb 18 09:18:16 EST 2018 x86_64 GNU/Linux

02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV770 [Radeon HD 4870]
Subsystem: PC Partner Limited / Sapphire Technology RV770 [Radeon HD 4870]
Kernel driver in use: radeon
Kernel modules: radeon

ofc it could well be faulty hardware on my end but my google fu seems to validate this as an ongoing driver issue, if symptoms persist i gonna swap out GPU, or i might well still be moaning here in a decade.
regards.
Comment by sgar (garnica) - Monday, 25 June 2018, 18:25 GMT
I have the same issue running linux-zen 4.17.2-1

01:05.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RS780L [Radeon 3000]

[drm:radeon_cs_ioctl [radeon]] *ERROR* Invalid command stream !
[drm:radeon_cs_parser_relocs [radeon]] *ERROR* gem object lookup failed >
[drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -2!
radeon 0000:01:05.0: ring 0 stalled for more than 10276msec
Comment by loqs (loqs) - Monday, 25 June 2018, 19:02 GMT
@garnica as you will have noticed no action has been taken on this bug report in the past 21 months or is likely to be in the future.
I suggest you try 4.18-rc2 / amd-staging-drm-next and if the issue is still present there report it upstream and work with upstream on a resolution.
Comment by Archie The Penguin (opus10) - Tuesday, 26 June 2018, 05:35 GMT
>ofc it could well be faulty hardware on my end
it's probably not, I've seen similar or same radeon stall error after resuming from suspend to ram for the past months

bug related
https://bugs.archlinux.org/task/55611
Comment by sgar (garnica) - Tuesday, 26 June 2018, 08:43 GMT
Error is gone when downgrading to:

xorg-server 1.19
xf86-video-ati 1:7.10

Related forum threads:
https://bbs.archlinux.org/viewtopic.php?id=237659
https://bbs.archlinux.org/viewtopic.php?pid=1787035
Comment by Maxim (Zeben) - Sunday, 08 July 2018, 14:40 GMT
I've got the same issue on HP Pavilion G6 laptop. Solved by removing xf86-video-ati driver; now OS uses mesa-releated modesetting driver I guess, but all acceleration-releated things now works...
CPU: AMD Phenom II X4 P960
iGPU: Radeon Mobility 4570M
dGPU: Radeon 6470M
Linux 4.17.3
Xorg: 1.20
Comment by kev levrone (kevlevrone) - Friday, 10 May 2019, 08:41 GMT
Did you try waiting a couple of minutes and see if it gets back to normal? https://goo.gl/KyvnZF
I'm experiencing something similar (random hang after sleep; I got my cursor, but nothing works, only alt-stamp-b (not the entire magic sequence, just b)), came out the system was temporarily frozen while trying to register a usb device; while I still couldn't fix the problem, at least it doesn't force me to hard reboot anymore.
Comment by Marco Emilio Poleggi (sphakka) - Thursday, 12 September 2019, 14:32 GMT
For me (my card is a Radeon HD 2400 PRO/XT, RV610) this happens with both (a) `linux-4.19.72-1-lts` and (b) `linux-5.2.13.arch1-1` w/ or w/o `xf86-video-ati` but *with* KMS enabled (also in initramfs). It's a GPU lockup: the GPU resets successfully but X never comes back (tasks reported stuck for at least ~500s). I don't know what may trigger it as nothing special is running on my PC -- the heaviest apps running are Firefox, Emacs and LibreOffice. No gaming or other graphically demanding app.

In case (a), the lockup usually follows a resume from hibernation at random times.
Relevant kernel log is:

------------------8<--------------------
radeon 0000:01:00.0: ring 0 stalled for more than 10210msec
radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000004ebd5a last fence id 0x00000000004ebd9b on ring 0)
radeon 0000:01:00.0: Saved 2073 dwords of commands on ring 0.
radeon 0000:01:00.0: GPU softreset: 0x00000008
radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030
radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003
radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200000C0
radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00020186
radeon 0000:01:00.0: R_008680_CP_STAT = 0x80028645
radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00004001
radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030
radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003
radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200080C0
radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000
radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000
radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[drm] PCIE gen 2 link speeds already enabled
[drm] PCIE GART of 512M enabled (table at 0x0000000000142000).
radeon 0000:01:00.0: WB enabled
radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000010000c00 and cpu addr 0x00000000daf5e3fe
radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x00000000000521d0 and cpu addr 0x00000000295680ce
[drm] ring test on 0 succeeded in 1 usecs
[drm] ring test on 5 succeeded in 1 usecs
[drm] UVD initialized successfully.
------------------>8--------------------


In case (b), the lockup may happen after a normal boot. Relevant kernel log looks the same as above:

------------------8<--------------------
kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10134msec
Sep 12 15:43:27 FDGKD5J-ITI-A402B kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000128ee7 last fence id 0x0000000000128ef7 on ring 0)
radeon 0000:01:00.0: Saved 505 dwords of commands on ring 0.
radeon 0000:01:00.0: GPU softreset: 0x00000008
radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030
radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003
radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200000C0
radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00020186
radeon 0000:01:00.0: R_008680_CP_STAT = 0x80028645
radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00004001
radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030
radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003
radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200080C0
radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000
radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000
radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[drm] PCIE gen 2 link speeds already enabled
[drm] PCIE GART of 512M enabled (table at 0x0000000000142000).
radeon 0000:01:00.0: WB enabled
radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000010000c00 and cpu addr 0x00000000645f1bf2
radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x00000000000521d0 and cpu addr 0x000000006c69701c
[drm] ring test on 0 succeeded in 1 usecs
[drm] ring test on 5 succeeded in 1 usecs
[drm] UVD initialized successfully.
------------------>8--------------------

So I'd blame KMS. Please gimme hope before I ditch this piece of kr@p.

Loading...