FS#22791 - [kernel26] drm - radeon GPU lockup
Attached to Project:
Arch Linux
Opened by Linas (Linas) - Sunday, 06 February 2011, 22:20 GMT
Last edited by Andreas Radke (AndyRTR) - Monday, 22 August 2011, 08:29 GMT
Opened by Linas (Linas) - Sunday, 06 February 2011, 22:20 GMT
Last edited by Andreas Radke (AndyRTR) - Monday, 22 August 2011, 08:29 GMT
|
Details
Description:
Running the system normally, the screen went black, then painted back to the previous screen. However, I could not move the mouse, switch to another virtual terminal, etc. Had to restart. messages.log logged the following error: Feb 6 15:39:30 localhost kernel: ------------[ cut here ]------------ Feb 6 15:39:30 localhost kernel: WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:235 radeon_fence_wait+0x376/0x3e0 [radeon]() Feb 6 15:39:30 localhost kernel: Hardware name: Aspire M1641 Feb 6 15:39:30 localhost kernel: GPU lockup (waiting for 0x0004EBED last fence id 0x0004EBEC) Feb 6 15:39:30 localhost kernel: Modules linked in: fuse nls_cp437 vfat fat ext4 jbd2 crc16 saa7134_alsa tda1004x saa7134_dvb videobuf_dvb dvb_core raid1 usb_storage md_mod snd_hda_codec_atihdmi snd_hda_codec_realtek tda827x tda8290 tuner ir_lirc_codec lirc_dev ir_sony_decoder ir_jvc_decoder ir_rc6_decoder ir_rc5_decoder saa7134 ohci_hcd ir_nec_decoder radeon v4l2_common videodev ttm v4l1_compat v4l2_compat_ioctl32 drm_kms_helper videobuf_dma_sg videobuf_core ir_common ir_core drm tveeprom firewire_ohci snd_hda_intel firewire_core snd_hda_codec forcedeth ehci_hcd crc_itu_t psmouse sg i2c_algo_bit usbcore snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm i2c_nforce2 snd_timer processor i2c_core snd soundcore snd_page_alloc wmi serio_raw thermal button evdev pcspkr ext3 jbd mbcache sr_mod cdrom sd_mod pata_acpi pata_amd ahci libahci libata scsi_mod Feb 6 15:39:30 localhost kernel: Pid: 2125, comm: X Not tainted 2.6.36-ARCH #1 Feb 6 15:39:30 localhost kernel: Call Trace: Feb 6 15:39:30 localhost kernel: [<ffffffff8105511a>] warn_slowpath_common+0x7a/0xb0 Feb 6 15:39:30 localhost kernel: [<ffffffff810551f1>] warn_slowpath_fmt+0x41/0x50 Feb 6 15:39:30 localhost kernel: [<ffffffffa03935c6>] radeon_fence_wait+0x376/0x3e0 [radeon] Feb 6 15:39:30 localhost kernel: [<ffffffff81075de0>] ? autoremove_wake_function+0x0/0x40 Feb 6 15:39:30 localhost kernel: [<ffffffffa03ab3d1>] radeon_ib_get+0x121/0x1e0 [radeon] Feb 6 15:39:30 localhost kernel: [<ffffffffa03acbe9>] radeon_cs_ioctl+0x89/0x1e0 [radeon] Feb 6 15:39:30 localhost kernel: [<ffffffffa03aad28>] ? radeon_gem_wait_idle_ioctl+0xe8/0x110 [radeon] Feb 6 15:39:30 localhost kernel: [<ffffffffa02b3914>] drm_ioctl+0x3d4/0x4b0 [drm] Feb 6 15:39:30 localhost kernel: [<ffffffff811ef1f0>] ? rb_insert_color+0x110/0x150 Feb 6 15:39:30 localhost kernel: [<ffffffffa03acb60>] ? radeon_cs_ioctl+0x0/0x1e0 [radeon] Feb 6 15:39:30 localhost kernel: [<ffffffff811bc425>] ? tomoyo_init_request_info+0x35/0x60 Feb 6 15:39:30 localhost kernel: [<ffffffff8113e835>] do_vfs_ioctl+0x95/0x530 Feb 6 15:39:30 localhost kernel: [<ffffffff8113ed51>] sys_ioctl+0x81/0xa0 Feb 6 15:39:30 localhost kernel: [<ffffffff8100d249>] ? do_device_not_available+0x9/0x10 Feb 6 15:39:30 localhost kernel: [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b Feb 6 15:39:30 localhost kernel: ---[ end trace f12aadc94aef2dfe ]--- Feb 6 15:39:30 localhost kernel: [drm] Disabling audio support Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: GPU softreset Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_008010_GRBM_STATUS=0xE57C24E0 Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_008014_GRBM_STATUS2=0x00113303 Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_000E50_SRBM_STATUS=0x200010C0 Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_008010_GRBM_STATUS=0xA0003030 Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_008014_GRBM_STATUS2=0x00000003 Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: R_000E50_SRBM_STATUS=0x200080C0 Feb 6 15:39:30 localhost kernel: radeon 0000:02:00.0: GPU reset succeed Feb 6 15:39:30 localhost kernel: [drm] ring test succeeded in 1 usecs Feb 6 15:39:30 localhost kernel: [drm] ib test succeeded in 1 usecs Feb 6 15:39:30 localhost kernel: [drm] Enabling audio support Using kernel 2.6.36.3-1 due to the recent problems with 2.6.37 (blank screen when running X). xorg-server 1.9.3.901 |
This task depends upon
Closed by Andreas Radke (AndyRTR)
Monday, 22 August 2011, 08:29 GMT
Reason for closing: Upstream
Additional comments about closing: Upstream has been informed, there's nothing more we can do here.
Monday, 22 August 2011, 08:29 GMT
Reason for closing: Upstream
Additional comments about closing: Upstream has been informed, there's nothing more we can do here.
- do you use kms, that is strongly recommended nowadays
- post early dmesg log for drm module loading
- try with and without ati-dri module
- make sure your system is fully up to date (-Syu it from a good mirror!), give versions for kernel, libdrm, libgl, mesa, ati-dri, xf86-video-ati, xorg-server
- when does it crash?
- post full Xorg.0.log
And finally look for upstream bug reports!
Yes
- post early dmesg log for drm module loading
Feb 6 13:11:33 localhost kernel: [drm] Initialized drm 1.1.0 20060810
Feb 6 13:11:33 localhost kernel: firewire_ohci: Added fw-ohci device 0000:01:07.0, OHCI v1.10, 4 IR + 8 IT contexts, quirks 0x11
Feb 6 13:11:33 localhost kernel: Linux video capture interface: v2.00
Feb 6 13:11:33 localhost kernel: [drm] radeon defaulting to kernel modesetting.
Feb 6 13:11:33 localhost kernel: [drm] radeon kernel modesetting enabled.
Feb 6 13:11:33 localhost kernel: ACPI: PCI Interrupt Link [LNEB] enabled at IRQ 10
Feb 6 13:11:33 localhost kernel: radeon 0000:02:00.0: PCI INT A -> Link[LNEB] -> GSI 10 (level, low) -> IRQ 10
Feb 6 13:11:33 localhost kernel: [drm] initializing kernel modesetting (RV635 0x1002:0x9598).
Feb 6 13:11:33 localhost kernel: [drm] register mmio base: 0xFEBF0000
Feb 6 13:11:33 localhost kernel: [drm] register mmio size: 65536
Feb 6 13:11:33 localhost kernel: ATOM BIOS: RV635
Feb 6 13:11:33 localhost kernel: radeon 0000:02:00.0: VRAM: 512M 0x00000000 - 0x1FFFFFFF (512M used)
Feb 6 13:11:33 localhost kernel: radeon 0000:02:00.0: GTT: 512M 0x20000000 - 0x3FFFFFFF
Feb 6 13:11:33 localhost kernel: [drm] Detected VRAM RAM=512M, BAR=256M
Feb 6 13:11:33 localhost kernel: [drm] RAM width 128bits DDR
Feb 6 13:11:33 localhost kernel: [TTM] Zone kernel: Available graphics memory: 2027754 kiB.
Feb 6 13:11:33 localhost kernel: [TTM] Initializing pool allocator.
Feb 6 13:11:33 localhost kernel: [drm] radeon: 512M of VRAM memory ready
Feb 6 13:11:33 localhost kernel: [drm] radeon: 512M of GTT memory ready.
Feb 6 13:11:33 localhost kernel: radeon 0000:02:00.0: radeon: using MSI.
Feb 6 13:11:33 localhost kernel: [drm] radeon: irq initialized.
Feb 6 13:11:33 localhost kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072
Feb 6 13:11:33 localhost kernel: [drm] Loading RV635 Microcode
(...)
Feb 6 13:11:33 localhost kernel: input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input4
Feb 6 13:11:33 localhost kernel: IR JVC protocol handler initialized
Feb 6 13:11:33 localhost kernel: IR Sony protocol handler initialized
Feb 6 13:11:33 localhost kernel: [drm] ring test succeeded in 1 usecs
Feb 6 13:11:33 localhost kernel: [drm] radeon: ib pool ready.
Feb 6 13:11:33 localhost kernel: [drm] ib test succeeded in 0 usecs
Feb 6 13:11:33 localhost kernel: [drm] Enabling audio support
Feb 6 13:11:33 localhost kernel: [drm] Radeon Display Connectors
Feb 6 13:11:33 localhost kernel: [drm] Connector 0:
Feb 6 13:11:33 localhost kernel: [drm] VGA
Feb 6 13:11:33 localhost kernel: [drm] DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
Feb 6 13:11:33 localhost kernel: [drm] Encoders:
Feb 6 13:11:33 localhost kernel: [drm] CRT2: INTERNAL_KLDSCP_DAC2
Feb 6 13:11:33 localhost kernel: [drm] Connector 1:
Feb 6 13:11:33 localhost kernel: [drm] HDMI-A
Feb 6 13:11:33 localhost kernel: [drm] HPD1
Feb 6 13:11:33 localhost kernel: [drm] DDC: 0x7e60 0x7e60 0x7e64 0x7e64 0x7e68 0x7e68 0x7e6c 0x7e6c
Feb 6 13:11:33 localhost kernel: [drm] Encoders:
Feb 6 13:11:33 localhost kernel: [drm] DFP1: INTERNAL_UNIPHY
Feb 6 13:11:33 localhost kernel: [drm] Connector 2:
Feb 6 13:11:33 localhost kernel: [drm] DVI-I
Feb 6 13:11:33 localhost kernel: [drm] HPD2
Feb 6 13:11:33 localhost kernel: [drm] DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
Feb 6 13:11:33 localhost kernel: [drm] Encoders:
Feb 6 13:11:33 localhost kernel: [drm] CRT1: INTERNAL_KLDSCP_DAC1
Feb 6 13:11:33 localhost kernel: [drm] DFP2: INTERNAL_KLDSCP_LVTMA
(...)
Feb 6 13:11:33 localhost kernel: [drm] Internal thermal controller with fan control
Feb 6 13:11:33 localhost kernel: [drm] radeon: power management initialized
Feb 6 13:11:33 localhost kernel: tuner 2-004b: chip found @ 0x96 (saa7133[0])
Feb 6 13:11:33 localhost kernel: [drm] fb mappable at 0xD0141000
Feb 6 13:11:33 localhost kernel: [drm] vram apper at 0xD0000000
Feb 6 13:11:33 localhost kernel: [drm] size 7258112
Feb 6 13:11:33 localhost kernel: [drm] fb depth is 24
Feb 6 13:11:33 localhost kernel: [drm] pitch is 6912
Feb 6 13:11:33 localhost kernel: tda829x 2-004b: setting tuner address to 61
Feb 6 13:11:33 localhost kernel: tda829x 2-004b: type set to tda8290+75a
Feb 6 13:11:33 localhost kernel: hda_codec: ALC1200: SKU not ready 0x411111f0
Feb 6 13:11:33 localhost kernel: hda_codec: ALC1200: BIOS auto-probing.
Feb 6 13:11:33 localhost kernel: Console: switching to colour frame buffer device 210x65
Feb 6 13:11:33 localhost kernel: fb0: radeondrmfb frame buffer device
Feb 6 13:11:33 localhost kernel: drm: registered panic notifier
Feb 6 13:11:33 localhost kernel: [drm] Initialized radeon 2.6.0 20080528 for 0000:02:00.0 on minor 0
- make sure your system is fully up to date (-Syu it from a good mirror!), give versions for kernel, libdrm, libgl, mesa, ati-dri, xf86-video-ati, xorg-server
kernel26: 2.6.36.3-1
libdrm: 2.4.23-1
libgl: 7.10-1
mesa: 7.10-1
ati-dri: 7.10-1
xf86-video-ati: 6.13.2-2 (upgraded later to 6.14.0-1)
xorg-server: 1.9.3.901-1 (upgraded later to 1.9.4-1)
- when does it crash?
It was in normal operation (I was writing an email). Not something reproducible.
- try with and without ati-dri module
I could try removing ati-dri. What do you expect to see? It's not like I could check if whatever race condition happened was fixed.
- post full Xorg.0.log
I don't have the Xorg.0.log of that run. I have the one after restarting, though.
FWIW, I grepped older logs for lockup and the only other instance found was after the recent upgrade which broke X and led me to downgrade the kernel where the following packages from above were upgraded:
libdrm (2.4.22-3 -> 2.4.23-1), libgl (7.9.0.git20101207-2 -> 7.10-1), ati-dri (7.9.0.git20101207-2 -> 7.10-1), kernel26 (2.6.36.3-1 -> 2.6.37-5), mesa (7.9.0.git20101207-2 -> 7.10-1, xorg-server-common (1.9.2-2 -> 1.9.3.901-1), xorg-server (1.9.2-2 -> 1.9.3.901-1)
Maybe it's the same bug but 2.6.36 has some big lock which hides it most of times? It may be completely unrelated as well. Strangely, such backtrace only appeared once, although the X failure was consistent:
Feb 4 09:01:12 localhost kernel: X D 000000010004cfde 0 1885 1884 0x00400004
Feb 4 09:01:12 localhost kernel: ffff880139651988 0000000000000086 ffff880139651858 ffffffff00000000
Feb 4 09:01:12 localhost kernel: 00000000000132c0 ffff880136d2a9a0 ffff880139651fd8 ffff880139651fd8
Feb 4 09:01:12 localhost kernel: ffff880139651fd8 ffff880136d2ac80 ffff880139651fd8 ffff880139650000
Feb 4 09:01:12 localhost kernel: Call Trace:
Feb 4 09:01:12 localhost kernel: [<ffffffff8104e168>] ? update_curr+0xd8/0x210
Feb 4 09:01:12 localhost kernel: [<ffffffff81015bee>] ? __switch_to_xtra+0x14e/0x180
Feb 4 09:01:12 localhost kernel: [<ffffffff81066476>] ? lock_timer_base.clone.23+0x36/0x70
Feb 4 09:01:12 localhost kernel: [<ffffffff813a58a6>] __mutex_lock_slowpath+0x136/0x310
Feb 4 09:01:12 localhost kernel: [<ffffffff813a5a91>] mutex_lock+0x11/0x30
Feb 4 09:01:12 localhost kernel: [<ffffffffa02f10d9>] radeon_ring_lock+0x29/0x60 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa0318d0b>] r600_gpu_is_lockup+0xfb/0x220 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa02d824e>] radeon_fence_wait+0x34e/0x3e0 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffff81077db0>] ? autoremove_wake_function+0x0/0x40
Feb 4 09:01:12 localhost kernel: [<ffffffffa02d846c>] radeon_fence_wait_next+0x8c/0xb0 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa02f1087>] radeon_ring_alloc+0x47/0x70 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa02f10e4>] radeon_ring_lock+0x34/0x60 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa0318d0b>] r600_gpu_is_lockup+0xfb/0x220 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa02d824e>] radeon_fence_wait+0x34e/0x3e0 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffff811f7c90>] ? rb_insert_color+0x110/0x150
Feb 4 09:01:12 localhost kernel: [<ffffffff81077db0>] ? autoremove_wake_function+0x0/0x40
Feb 4 09:01:12 localhost kernel: [<ffffffffa02d8b3c>] radeon_sync_obj_wait+0xc/0x10 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa025b359>] ttm_bo_wait+0xf9/0x1b0 [ttm]
Feb 4 09:01:12 localhost kernel: [<ffffffffa02f063e>] radeon_gem_wait_idle_ioctl+0x8e/0x110 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffffa01f5474>] drm_ioctl+0x3d4/0x4b0 [drm]
Feb 4 09:01:12 localhost kernel: [<ffffffffa02f05b0>] ? radeon_gem_wait_idle_ioctl+0x0/0x110 [radeon]
Feb 4 09:01:12 localhost kernel: [<ffffffff8101717a>] ? save_i387_xstate+0x10a/0x230
Feb 4 09:01:12 localhost kernel: [<ffffffff81067a26>] ? recalc_sigpending+0x16/0x40
Feb 4 09:01:12 localhost kernel: [<ffffffff8100b34d>] ? do_signal+0x17d/0x7c0
Feb 4 09:01:12 localhost kernel: [<ffffffff810162ac>] ? fpu_finit+0x1c/0x30
Feb 4 09:01:12 localhost kernel: [<ffffffff81146075>] do_vfs_ioctl+0x95/0x530
Feb 4 09:01:12 localhost kernel: [<ffffffff81146591>] sys_ioctl+0x81/0xa0
Feb 4 09:01:12 localhost kernel: [<ffffffff8100bf12>] system_call_fastpath+0x16/0x1b
(it repeated three times, each one just after the previous)
In my case (maybe also in your case), the mouse CAN be moved, it's just that the mouse pointer becomes invisible.
When this happens again, try to move the mouse around to menus, taskbar or links in your browser window... after a while the mouse pointer reappears and everything is normal again.
My system is up to date, except for mysql.
kernel26: 2.6.37-5 (also happens with kernel patched with Con Kolivas' patch)
libdrm: 2.4.23-2
libgl, mesa and ati-dri: 7.10.0.git20110215-1
xf86-video-ati: 6.14.0-1
xorg-server: 1.9.4-1
I've this problem for more than a month, so possibly the previous versions of the packages above had the same problem.
It is not possible to reproduce, but I think this problem happened more often during working with fullscreen windows.
Maybe related, but since I have this problem I see this very ofthen in my kernel.log:
[drm:radeon_vga_detect] *ERROR* VGA-1: probed a monitor but no|invalid EDID
edid spam in upstream: https://bugs.freedesktop.org/show_bug.cgi?id=34457
I've posted my report here:
https://bugs.freedesktop.org/show_bug.cgi?id=34313