FS#22791 - [kernel26] drm - radeon GPU lockup

Attached to Project: Arch Linux
Opened by Linas (Linas) - Sunday, 06 February 2011, 22:20 GMT
Last edited by Andreas Radke (AndyRTR) - Monday, 22 August 2011, 08:29 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan de Groot (JGC)
Thomas Bächler (brain0)
Andreas Radke (AndyRTR)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
Running the system normally, the screen went black, then painted back to the previous screen. However, I could not move the mouse, switch to another virtual terminal, etc. Had to restart.

messages.log logged the following error:
Feb 6 15:39:30 localhost kernel: ------------[ cut here ]------------
Feb 6 15:39:30 localhost kernel: WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:235 radeon_fence_wait+0x376/0x3e0 [radeon]()
Feb 6 15:39:30 localhost kernel: Hardware name: Aspire M1641
Feb 6 15:39:30 localhost kernel: GPU lockup (waiting for 0x0004EBED last fence id 0x0004EBEC)
Feb 6 15:39:30 localhost kernel: Modules linked in: fuse nls_cp437 vfat fat ext4 jbd2 crc16 saa7134_alsa tda1004x saa7134_dvb videobuf_dvb dvb_core raid1 usb_storage md_mod snd_hda_codec_atihdmi snd_hda_codec_realtek tda827x tda8290 tuner ir_lirc_codec lirc_dev ir_sony_decoder ir_jvc_decoder ir_rc6_decoder ir_rc5_decoder saa7134 ohci_hcd ir_nec_decoder radeon v4l2_common videodev ttm v4l1_compat v4l2_compat_ioctl32 drm_kms_helper videobuf_dma_sg videobuf_core ir_common ir_core drm tveeprom firewire_ohci snd_hda_intel firewire_core snd_hda_codec forcedeth ehci_hcd crc_itu_t psmouse sg i2c_algo_bit usbcore snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm i2c_nforce2 snd_timer processor i2c_core snd soundcore snd_page_alloc wmi serio_raw thermal button evdev pcspkr ext3 jbd mbcache sr_mod cdrom sd_mod pata_acpi pata_amd ahci libahci libata scsi_mod
Feb 6 15:39:30 localhost kernel: Pid: 2125, comm: X Not tainted 2.6.36-ARCH #1
Feb 6 15:39:30 localhost kernel: Call Trace:
Feb 6 15:39:30 localhost kernel: [<ffffffff8105511a>] warn_slowpath_common+0x7a/0xb0
Feb 6 15:39:30 localhost kernel: [<ffffffff810551f1>] warn_slowpath_fmt+0x41/0x50
Feb 6 15:39:30 localhost kernel: [<ffffffffa03935c6>] radeon_fence_wait+0x376/0x3e0 [radeon]
Feb 6 15:39:30 localhost kernel: [<ffffffff81075de0>] ? autoremove_wake_function+0x0/0x40
Feb 6 15:39:30 localhost kernel: [<ffffffffa03ab3d1>] radeon_ib_get+0x121/0x1e0 [radeon]
Feb 6 15:39:30 localhost kernel: [<ffffffffa03acbe9>] radeon_cs_ioctl+0x89/0x1e0 [radeon]
Feb 6 15:39:30 localhost kernel: [<ffffffffa03aad28>] ? radeon_gem_wait_idle_ioctl+0xe8/0x110 [radeon]
Feb 6 15:39:30 localhost kernel: [<ffffffffa02b3914>] drm_ioctl+0x3d4/0x4b0 [drm]
Feb 6 15:39:30 localhost kernel: [<ffffffff811ef1f0>] ? rb_insert_color+0x110/0x150
Feb 6 15:39:30 localhost kernel: [<ffffffffa03acb60>] ? radeon_cs_ioctl+0x0/0x1e0 [radeon]
Feb 6 15:39:30 localhost kernel: [<ffffffff811bc425>] ? tomoyo_init_request_info+0x35/0x60
Feb 6 15:39:30 localhost kernel: [<ffffffff8113e835>] do_vfs_ioctl+0x95/0x530
Feb 6 15:39:30 localhost kernel: [<ffffffff8113ed51>] sys_ioctl+0x81/0xa0
Feb 6 15:39:30 localhost kernel: [<ffffffff8100d249>] ? do_device_not_available+0x9/0x10
Feb 6 15:39:30 localhost kernel: [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
Feb 6 15:39:30 localhost kernel: ---[ end trace f12aadc94aef2dfe ]---
Feb 6 15:39:30 localhost kernel: [drm] Disabling audio support
Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: GPU softreset
Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_008010_GRBM_STATUS=0xE57C24E0
Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_008014_GRBM_STATUS2=0x00113303
Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_000E50_SRBM_STATUS=0x200010C0
Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE
Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_008010_GRBM_STATUS=0xA0003030
Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_008014_GRBM_STATUS2=0x00000003
Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_000E50_SRBM_STATUS=0x200080C0
Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: GPU reset succeed
Feb 6 15:39:30 localhost kernel: [drm] ring test succeeded in 1 usecs
Feb 6 15:39:30 localhost kernel: [drm] ib test succeeded in 1 usecs
Feb 6 15:39:30 localhost kernel: [drm] Enabling audio support

Using kernel 2.6.36.3-1 due to the recent problems with 2.6.37 (blank screen when running X). xorg-server 1.9.3.901
This task depends upon

Closed by  Andreas Radke (AndyRTR)
Monday, 22 August 2011, 08:29 GMT
Reason for closing:  Upstream
Additional comments about closing:  Upstream has been informed, there's nothing more we can do here.
Comment by Andreas Radke (AndyRTR) - Monday, 07 February 2011, 05:46 GMT
Please give more details:

- do you use kms, that is strongly recommended nowadays
- post early dmesg log for drm module loading
- try with and without ati-dri module
- make sure your system is fully up to date (-Syu it from a good mirror!), give versions for kernel, libdrm, libgl, mesa, ati-dri, xf86-video-ati, xorg-server
- when does it crash?
- post full Xorg.0.log

And finally look for upstream bug reports!

Comment by Linas (Linas) - Monday, 07 February 2011, 12:39 GMT
- do you use kms, that is strongly recommended nowadays
Yes

- post early dmesg log for drm module loading
Feb 6 13:11:33 localhost kernel: [drm] Initialized drm 1.1.0 20060810
Feb 6 13:11:33 localhost kernel: firewire_ohci: Added fw-ohci device 0000:01:07.0, OHCI v1.10, 4 IR + 8 IT contexts, quirks 0x11
Feb 6 13:11:33 localhost kernel: Linux video capture interface: v2.00
Feb 6 13:11:33 localhost kernel: [drm] radeon defaulting to kernel modesetting.
Feb 6 13:11:33 localhost kernel: [drm] radeon kernel modesetting enabled.
Feb 6 13:11:33 localhost kernel: ACPI: PCI Interrupt Link [LNEB] enabled at IRQ 10
Feb 6 13:11:33 localhost kernel: radeon 0000:02:00.0: PCI INT A -> Link[LNEB] -> GSI 10 (level, low) -> IRQ 10
Feb 6 13:11:33 localhost kernel: [drm] initializing kernel modesetting (RV635 0x1002:0x9598).
Feb 6 13:11:33 localhost kernel: [drm] register mmio base: 0xFEBF0000
Feb 6 13:11:33 localhost kernel: [drm] register mmio size: 65536
Feb 6 13:11:33 localhost kernel: ATOM BIOS: RV635
Feb 6 13:11:33 localhost kernel: radeon 0000:02:00.0: VRAM: 512M 0x00000000 - 0x1FFFFFFF (512M used)
Feb 6 13:11:33 localhost kernel: radeon 0000:02:00.0: GTT: 512M 0x20000000 - 0x3FFFFFFF
Feb 6 13:11:33 localhost kernel: [drm] Detected VRAM RAM=512M, BAR=256M
Feb 6 13:11:33 localhost kernel: [drm] RAM width 128bits DDR
Feb 6 13:11:33 localhost kernel: [TTM] Zone kernel: Available graphics memory: 2027754 kiB.
Feb 6 13:11:33 localhost kernel: [TTM] Initializing pool allocator.
Feb 6 13:11:33 localhost kernel: [drm] radeon: 512M of VRAM memory ready
Feb 6 13:11:33 localhost kernel: [drm] radeon: 512M of GTT memory ready.
Feb 6 13:11:33 localhost kernel: radeon 0000:02:00.0: radeon: using MSI.
Feb 6 13:11:33 localhost kernel: [drm] radeon: irq initialized.
Feb 6 13:11:33 localhost kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072
Feb 6 13:11:33 localhost kernel: [drm] Loading RV635 Microcode
(...)
Feb 6 13:11:33 localhost kernel: input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input4
Feb 6 13:11:33 localhost kernel: IR JVC protocol handler initialized
Feb 6 13:11:33 localhost kernel: IR Sony protocol handler initialized
Feb 6 13:11:33 localhost kernel: [drm] ring test succeeded in 1 usecs
Feb 6 13:11:33 localhost kernel: [drm] radeon: ib pool ready.
Feb 6 13:11:33 localhost kernel: [drm] ib test succeeded in 0 usecs
Feb 6 13:11:33 localhost kernel: [drm] Enabling audio support
Feb 6 13:11:33 localhost kernel: [drm] Radeon Display Connectors
Feb 6 13:11:33 localhost kernel: [drm] Connector 0:
Feb 6 13:11:33 localhost kernel: [drm] VGA
Feb 6 13:11:33 localhost kernel: [drm] DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
Feb 6 13:11:33 localhost kernel: [drm] Encoders:
Feb 6 13:11:33 localhost kernel: [drm] CRT2: INTERNAL_KLDSCP_DAC2
Feb 6 13:11:33 localhost kernel: [drm] Connector 1:
Feb 6 13:11:33 localhost kernel: [drm] HDMI-A
Feb 6 13:11:33 localhost kernel: [drm] HPD1
Feb 6 13:11:33 localhost kernel: [drm] DDC: 0x7e60 0x7e60 0x7e64 0x7e64 0x7e68 0x7e68 0x7e6c 0x7e6c
Feb 6 13:11:33 localhost kernel: [drm] Encoders:
Feb 6 13:11:33 localhost kernel: [drm] DFP1: INTERNAL_UNIPHY
Feb 6 13:11:33 localhost kernel: [drm] Connector 2:
Feb 6 13:11:33 localhost kernel: [drm] DVI-I
Feb 6 13:11:33 localhost kernel: [drm] HPD2
Feb 6 13:11:33 localhost kernel: [drm] DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
Feb 6 13:11:33 localhost kernel: [drm] Encoders:
Feb 6 13:11:33 localhost kernel: [drm] CRT1: INTERNAL_KLDSCP_DAC1
Feb 6 13:11:33 localhost kernel: [drm] DFP2: INTERNAL_KLDSCP_LVTMA
(...)
Feb 6 13:11:33 localhost kernel: [drm] Internal thermal controller with fan control
Feb 6 13:11:33 localhost kernel: [drm] radeon: power management initialized
Feb 6 13:11:33 localhost kernel: tuner 2-004b: chip found @ 0x96 (saa7133[0])
Feb 6 13:11:33 localhost kernel: [drm] fb mappable at 0xD0141000
Feb 6 13:11:33 localhost kernel: [drm] vram apper at 0xD0000000
Feb 6 13:11:33 localhost kernel: [drm] size 7258112
Feb 6 13:11:33 localhost kernel: [drm] fb depth is 24
Feb 6 13:11:33 localhost kernel: [drm] pitch is 6912
Feb 6 13:11:33 localhost kernel: tda829x 2-004b: setting tuner address to 61
Feb 6 13:11:33 localhost kernel: tda829x 2-004b: type set to tda8290+75a
Feb 6 13:11:33 localhost kernel: hda_codec: ALC1200: SKU not ready 0x411111f0
Feb 6 13:11:33 localhost kernel: hda_codec: ALC1200: BIOS auto-probing.
Feb 6 13:11:33 localhost kernel: Console: switching to colour frame buffer device 210x65
Feb 6 13:11:33 localhost kernel: fb0: radeondrmfb frame buffer device
Feb 6 13:11:33 localhost kernel: drm: registered panic notifier
Feb 6 13:11:33 localhost kernel: [drm] Initialized radeon 2.6.0 20080528 for 0000:02:00.0 on minor 0

- make sure your system is fully up to date (-Syu it from a good mirror!), give versions for kernel, libdrm, libgl, mesa, ati-dri, xf86-video-ati, xorg-server
kernel26: 2.6.36.3-1
libdrm: 2.4.23-1
libgl: 7.10-1
mesa: 7.10-1
ati-dri: 7.10-1
xf86-video-ati: 6.13.2-2 (upgraded later to 6.14.0-1)
xorg-server: 1.9.3.901-1 (upgraded later to 1.9.4-1)

- when does it crash?
It was in normal operation (I was writing an email). Not something reproducible.

- try with and without ati-dri module
I could try removing ati-dri. What do you expect to see? It's not like I could check if whatever race condition happened was fixed.

- post full Xorg.0.log
I don't have the Xorg.0.log of that run. I have the one after restarting, though.


FWIW, I grepped older logs for lockup and the only other instance found was after the recent upgrade which broke X and led me to downgrade the kernel where the following packages from above were upgraded:
libdrm (2.4.22-3 -> 2.4.23-1), libgl (7.9.0.git20101207-2 -> 7.10-1), ati-dri (7.9.0.git20101207-2 -> 7.10-1), kernel26 (2.6.36.3-1 -> 2.6.37-5), mesa (7.9.0.git20101207-2 -> 7.10-1, xorg-server-common (1.9.2-2 -> 1.9.3.901-1), xorg-server (1.9.2-2 -> 1.9.3.901-1)

Maybe it's the same bug but 2.6.36 has some big lock which hides it most of times? It may be completely unrelated as well. Strangely, such backtrace only appeared once, although the X failure was consistent:

Feb 4 09:01:12 localhost kernel: X D 000000010004cfde 0 1885 1884 0x00400004
Feb 4 09:01:12 localhost kernel: ffff880139651988 0000000000000086 ffff880139651858 ffffffff00000000
Feb 4 09:01:12 localhost kernel: 00000000000132c0 ffff880136d2a9a0 ffff880139651fd8 ffff880139651fd8
Feb 4 09:01:12 localhost kernel: ffff880139651fd8 ffff880136d2ac80 ffff880139651fd8 ffff880139650000
Feb 4 09:01:12 localhost kernel: Call Trace:
Feb 4 09:01:12 localhost kernel: [<ffffffff8104e168>] ? update_curr+0xd8/0x210
Feb 4 09:01:12 localhost kernel: [<ffffffff81015bee>] ? __switch_to_xtra+0x14e/0x180
Feb 4 09:01:12 localhost kernel: [<ffffffff81066476>] ? lock_timer_base.clone.23+0x36/0x70
Feb 4 09:01:12 localhost kernel: [<ffffffff813a58a6>] __mutex_lock_slowpath+0x136/0x310
Feb 4 09:01:12 localhost kernel: [<ffffffff813a5a91>] mutex_lock+0x11/0x30
Feb 4 09:01:12 localhost kernel: [<ffffffffa02f10d9>] radeon_ring_lock+0x29/0x60 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa0318d0b>] r600_gpu_is_lockup+0xfb/0x220 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa02d824e>] radeon_fence_wait+0x34e/0x3e0 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffff81077db0>] ? autoremove_wake_function+0x0/0x40
Feb 4 09:01:12 localhost kernel: [<ffffffffa02d846c>] radeon_fence_wait_next+0x8c/0xb0 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa02f1087>] radeon_ring_alloc+0x47/0x70 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa02f10e4>] radeon_ring_lock+0x34/0x60 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa0318d0b>] r600_gpu_is_lockup+0xfb/0x220 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa02d824e>] radeon_fence_wait+0x34e/0x3e0 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffff811f7c90>] ? rb_insert_color+0x110/0x150
Feb 4 09:01:12 localhost kernel: [<ffffffff81077db0>] ? autoremove_wake_function+0x0/0x40
Feb 4 09:01:12 localhost kernel: [<ffffffffa02d8b3c>] radeon_sync_obj_wait+0xc/0x10 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa025b359>] ttm_bo_wait+0xf9/0x1b0 [ttm]
Feb 4 09:01:12 localhost kernel: [<ffffffffa02f063e>] radeon_gem_wait_idle_ioctl+0x8e/0x110 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa01f5474>] drm_ioctl+0x3d4/0x4b0 [drm]
Feb 4 09:01:12 localhost kernel: [<ffffffffa02f05b0>] ? radeon_gem_wait_idle_ioctl+0x0/0x110 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffff8101717a>] ? save_i387_xstate+0x10a/0x230
Feb 4 09:01:12 localhost kernel: [<ffffffff81067a26>] ? recalc_sigpending+0x16/0x40
Feb 4 09:01:12 localhost kernel: [<ffffffff8100b34d>] ? do_signal+0x17d/0x7c0
Feb 4 09:01:12 localhost kernel: [<ffffffff810162ac>] ? fpu_finit+0x1c/0x30
Feb 4 09:01:12 localhost kernel: [<ffffffff81146075>] do_vfs_ioctl+0x95/0x530
Feb 4 09:01:12 localhost kernel: [<ffffffff81146591>] sys_ioctl+0x81/0xa0
Feb 4 09:01:12 localhost kernel: [<ffffffff8100bf12>] system_call_fastpath+0x16/0x1b
(it repeated three times, each one just after the previous)
Comment by Andreas Radke (AndyRTR) - Saturday, 19 February 2011, 16:04 GMT
Sorry. I can't help here when you use an outdated kernel. Please make a full -Syu and make sure to have the latest official kernel26, xorg-server, xf86-video-ati and mesa packages. If it is still crashing in the kernel drm module you need to file an upstream kernel bug.
Comment by Eric (eric2) - Saturday, 26 February 2011, 18:13 GMT
I have this problem too... once a week, so not very often.
In my case (maybe also in your case), the mouse CAN be moved, it's just that the mouse pointer becomes invisible.
When this happens again, try to move the mouse around to menus, taskbar or links in your browser window... after a while the mouse pointer reappears and everything is normal again.

My system is up to date, except for mysql.
kernel26: 2.6.37-5 (also happens with kernel patched with Con Kolivas' patch)
libdrm: 2.4.23-2
libgl, mesa and ati-dri: 7.10.0.git20110215-1
xf86-video-ati: 6.14.0-1
xorg-server: 1.9.4-1
I've this problem for more than a month, so possibly the previous versions of the packages above had the same problem.

It is not possible to reproduce, but I think this problem happened more often during working with fullscreen windows.

Maybe related, but since I have this problem I see this very ofthen in my kernel.log:
[drm:radeon_vga_detect] *ERROR* VGA-1: probed a monitor but no|invalid EDID
   drm.log (4.5 KiB)
   xorg.log (42.4 KiB)
Comment by Sergej Pupykin (sergej) - Monday, 28 February 2011, 17:50 GMT
lockups were fixed in 2.6.38

edid spam in upstream: https://bugs.freedesktop.org/show_bug.cgi?id=34457
Comment by Sergej Pupykin (sergej) - Monday, 28 February 2011, 17:51 GMT Comment by Andreas Radke (AndyRTR) - Thursday, 24 March 2011, 18:22 GMT
My dmesg is clear now with kernel 2.6.38 - can we close this now?
Comment by Eric (eric2) - Thursday, 07 April 2011, 07:19 GMT
Still happens with kernel26 2.6.38.2-1
Comment by Maël Lavault (moimael) - Wednesday, 13 April 2011, 16:42 GMT
Still append for me to , and very frequently with gnome-shell, even with latest version and testing repo, semmes to be a problem with gallium at driver and the HDMI output.
Comment by Andreas Radke (AndyRTR) - Thursday, 14 April 2011, 19:47 GMT
Maybe this is something new. Please bring it upstream or it won't get fixed.
Comment by Eric (eric2) - Thursday, 14 April 2011, 22:20 GMT
You can find many similar bugs when you search for "gpu lockup" or "radeon_fence" in the various bug reports of different distros and freedesktop.org. Some bugs were fixed and some were still (?) not fixed.

I've posted my report here:
https://bugs.freedesktop.org/show_bug.cgi?id=34313

Loading...