FS#17705 - [xf86-video-intel] Hangcheck timer elapsed... GPU hung

Attached to Project: Arch Linux
Opened by Christoph Meister (Sleepy) - Monday, 04 January 2010, 17:24 GMT
Last edited by Tobias Powalowski (tpowa) - Wednesday, 15 February 2012, 08:06 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan de Groot (JGC)
Architecture i686
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 20
Private No

Details

Description:

Since a few days (I think unrelated to any update, but not 100% sure), I'm experiencing serve lockups. Usually happens when scrolling complex pages in firefox, but also with chromium, open office and mplayer. The whole screen freezes, but I can still move the mouse (and mpd also keeps on playing). Switching to tty1 is possible, but restarting X doesn't work. Screen just turns black, making a reboot necessary. Happened every 3 hours on Sunday, so far twice today.

It's a MSI Wind PC 2723:
Intel Atom N230
Intel GMA950 (Intel Corporation 82945G/GZ Integrated Graphics Controller (rev 02))


Additional info:
* package version(s):
kernel26 2.6.32.2-2
xf86-video-intel 2.9.1-1
mesa 7.7-1
intel-dri 7.7-1

* config and/or log files etc.
/var/log/everything.log:

[...]
Jan 4 10:52:42 server -- MARK --
Jan 4 11:15:19 server -- MARK --
Jan 4 11:35:19 server -- MARK --
Jan 4 11:51:00 server kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Jan 4 11:51:00 server kernel: render error detected, EIR: 0x00000000
Jan 4 11:51:00 server kernel: i915: Waking up sleeping processes
Jan 4 11:51:00 server kernel: reboot required
Jan 4 11:51:00 server kernel: [drm:i915_wait_request] *ERROR* i915_wait_request returns -5 (awaiting 574670 at 574669)
Jan 4 11:51:00 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Jan 4 11:51:00 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Jan 4 11:51:00 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Jan 4 11:51:00 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Jan 4 11:51:00 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Jan 4 11:51:00 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Jan 4 11:51:00 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Jan 4 11:51:00 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Jan 4 11:51:00 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Jan 4 11:51:00 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Jan 4 11:51:00 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Jan 4 11:51:00 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Jan 4 11:51:00 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
[...]

Steps to reproduce:
...
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Wednesday, 15 February 2012, 08:06 GMT
Reason for closing:  Upstream
Comment by Christoph Meister (Sleepy) - Monday, 04 January 2010, 18:32 GMT
Could perhaps be the same as  FS#17123 . Symptoms sound similar, although the logs look different.
Comment by Christoph Meister (Sleepy) - Wednesday, 06 January 2010, 18:24 GMT
Anybody got any idea? This is happening all the time...

Jan 6 19:01:30 server kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Jan 6 19:01:30 server kernel: render error detected, EIR: 0x00000000
Jan 6 19:01:30 server kernel: i915: Waking up sleeping processes
Jan 6 19:01:30 server kernel: reboot required
Jan 6 19:01:30 server kernel: [drm:i915_wait_request] *ERROR* i915_wait_request returns -5 (awaiting 1878375 at 1878373)
Jan 6 19:01:30 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Jan 6 19:01:30 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged


Jan 6 19:18:23 server kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Jan 6 19:18:23 server kernel: render error detected, EIR: 0x00000000
Jan 6 19:18:23 server kernel: i915: Waking up sleeping processes
Jan 6 19:18:23 server kernel: reboot required
Jan 6 19:18:23 server kernel: [drm:i915_wait_request] *ERROR* i915_wait_request returns -5 (awaiting 63725 at 63724)
Jan 6 19:18:23 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Jan 6 19:18:23 server kernel: [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
Comment by Andrea Scarpino (BaSh) - Thursday, 07 January 2010, 11:22 GMT
this still happen with xf86-video-intel 2.10 too
Comment by Christoph Meister (Sleepy) - Sunday, 10 January 2010, 10:07 GMT
I finally rolled back to Kernel 2.6.31.6-1 and everything works fine (so far). Disabling KMS didn't do anything for me, but it sure looks like the new features of 2.6.32 are responsible for this crash. Still using xf86-video-intel 2.9.1-1 right now.
Comment by Tomas M. (eldragon) - Monday, 11 January 2010, 12:34 GMT Comment by Christoph Meister (Sleepy) - Sunday, 07 February 2010, 08:42 GMT
I updated my system two days ago after I had the Kernel on ignore for about a month. I'm no running the standard 2.6.32.7-1 Kernel for about 40 hours and everything is fine. No crashes, no glitches.
Comment by héctor (hacosta) - Saturday, 13 March 2010, 14:17 GMT
This bug started to show up again with the latest update
Comment by Tomas M. (eldragon) - Saturday, 13 March 2010, 14:52 GMT
last update of what?
Comment by Christoph Meister (Sleepy) - Saturday, 13 March 2010, 18:36 GMT
I'm using the ck-Kernel from the AUR (2.6.33-1, 2.6.32-7 before that) right now and I haven't experienced this problem for weeks... if that helps.
Comment by Daniele C. (legolas558) - Friday, 19 March 2010, 11:30 GMT
when using patch in http://bugs.freedesktop.org/show_bug.cgi?id=27187 and playing a video, I get a crash with the bug mentioned here
Comment by Daniele C. (legolas558) - Thursday, 08 April 2010, 22:04 GMT
the upstream kernel bug has been closed, so I think this should be closed also
Comment by Tomas M. (eldragon) - Friday, 09 April 2010, 00:35 GMT
yes, this has been long fixed. sorry for the delay, had forgotten all about the arch bug report
Comment by Colin Pitrat (LiFo2) - Saturday, 24 April 2010, 12:43 GMT
I still have the error on my side.

I have this in dmesg:
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
render error detected, EIR: 0x00000000
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 3134 at 3128)

And nothing particular in /var/log/messages.log since I updated and put i915.powersave=0 on kernel commandline (before this, I had the "*ERROR* Execbuf while wedged" message).
Comment by Colin Pitrat (LiFo2) - Saturday, 24 April 2010, 12:46 GMT
I forgot to specify the versions of packages I use:
xf86-video-intel 2.10.0-1
kernel26 2.6.33.2-1
libdrm 2.4.19-2
Comment by Colin Pitrat (LiFo2) - Tuesday, 11 May 2010, 18:18 GMT
Removing all screen power saving in KDE control center seems to help
Comment by Indan Zupancic (i3839) - Tuesday, 11 May 2010, 20:29 GMT
Colin, upgrade to libdrm 2.4.20 and intel 2.11 (or newer) before saying it still doesn't work.
Comment by Colin Pitrat (LiFo2) - Tuesday, 11 May 2010, 21:10 GMT
OK, as those packages are in testing, I didn't had them when I upgraded. I update them and I'll tell you if it solves the problem tomorrow. Thanks.

Strange, I got a stack this time:

NET: Registered protocol family 10
lo: Disabled Privacy Extensions
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
render error detected, EIR: 0x00000000
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 76535 at 76533)
------------[ cut here ]------------
WARNING: at drivers/gpu/drm/i915/i915_gem_tiling.c:490 i915_gem_set_tiling+0x241/0x250 [i915]()
Hardware name: 2371Y29
failed to reset object for tiling switch
Modules linked in: ipv6 fuse joydev hdaps input_polldev cpufreq_ondemand cpufreq_powersave cpufreq_conservative acpi_cpufreq freq_table pcmcia ndiswrapper snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device sdhci_pci sdhci mmc_core yenta_socket rsrc_nonstatic snd_intel8x0 snd_pcm_oss pcmcia_core snd_mixer_oss snd_ac97_codec ac97_bus ipw2200 snd_pcm ppdev snd_timer thinkpad_acpi iTCO_wdt iTCO_vendor_support irtty_sir nsc_ircc libipw snd led_class soundcore sir_dev shpchp uhci_hcd sr_mod nvram snd_page_alloc ehci_hcd lp battery ac parport_pc i2c_i801 pci_hotplug cdrom psmouse irda crc_ccitt sg processor cfg80211 parport usbcore rfkill thermal lib80211 serio_raw evdev e1000 rtc_cmos rtc_core rtc_lib i915 drm_kms_helper drm i2c_algo_bit button i2c_core video output intel_agp agpgart ext3 jbd mbcache sd_mod ata_piix ata_generic libata scsi_mod
Pid: 3530, comm: X Tainted: P 2.6.33-ARCH #1
Call Trace:
[<c1043b4d>] warn_slowpath_common+0x6d/0xa0
[<f82e7961>] ? i915_gem_set_tiling+0x241/0x250 [i915]
[<f82e7961>] ? i915_gem_set_tiling+0x241/0x250 [i915]
[<c1043bc6>] warn_slowpath_fmt+0x26/0x30
[<f82e7961>] i915_gem_set_tiling+0x241/0x250 [i915]
[<f813810c>] drm_ioctl+0x28c/0x410 [drm]
[<f82e7720>] ? i915_gem_set_tiling+0x0/0x250 [i915]
[<c100bd40>] ? restore_i387_fxsave+0x70/0x80
[<c1102604>] vfs_ioctl+0x34/0xa0
[<f8137e80>] ? drm_ioctl+0x0/0x410 [drm]
[<c1102db6>] do_vfs_ioctl+0x66/0x580
[<c100c7aa>] ? restore_i387_xstate+0x11a/0x250
[<c10f473d>] ? rw_verify_area+0x5d/0xd0
[<c1069cc7>] ? ktime_get_ts+0xf7/0x130
[<c1003149>] ? restore_sigcontext+0xb9/0xe0
[<c110332f>] sys_ioctl+0x5f/0x80
[<c100371f>] sysenter_do_call+0x12/0x28
---[ end trace 8e00df6a60a4e5ee ]---
Comment by Colin Pitrat (LiFo2) - Wednesday, 12 May 2010, 20:00 GMT
Sorry to re-open again, but I still have it with libdrm 2.4.20 and intel 2.11.
Comment by Tomas M. (eldragon) - Wednesday, 12 May 2010, 20:33 GMT
Colin, if you are still suffering from this. i suggest you take a look at www.intellinuxgraphics.org and find how to report a proper bug upstream.

you should build and install any of the 2.6.34-rc kernel series which include extra debugging info for intel GPUs

Comment by Alphazo (alphazo) - Tuesday, 18 May 2010, 13:41 GMT
I'm having the same issue on a brand new install based upon an Intel 855GM Chipset.

May 17 14:23:00 slimdog kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
May 17 14:23:00 slimdog kernel: render error detected, EIR: 0x00000000
May 17 14:23:00 slimdog kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 35255 at 35254)
May 17 14:24:00 slimdog kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
May 17 14:24:00 slimdog kernel: render error detected, EIR: 0x00000000

I don't have testing repo enabled so I use:
- xf86-video-intel 2.10.0-1
- libdrm 2.4.19-2

Is there a simple way to just upgrade those two packages from testing without switching the rest of the system to testing?
Comment by Jan de Groot (JGC) - Tuesday, 18 May 2010, 13:47 GMT
The bug was marked as fixed in testing, so you'll need the Xorg packages from testing. To run Xorg from testing, you'll need to update libgl, mesa, intel-dri, xorg-server and any xf86-* driver package you have installed.
Comment by Colin Pitrat (LiFo2) - Tuesday, 18 May 2010, 14:06 GMT
Alphazo, using testing is very easy and you won't have much packages that will be updated. Just edit your /etc/pacman.conf and uncomment those two lines:
[testing]
Include = /etc/pacman.d/mirrorlist

then pacman -Suy

However, for me, using testing didn't help. I still have the freezes when I use KDE. I switched to windowmaker and I don't have any problem with it ...
With KDE, switching off all screen power saving in KDE control center seemed to help but didn't totally solved the problem (freeze takes more time to occur).
Comment by pete (drg006) - Monday, 31 May 2010, 19:10 GMT
The packages in testing didn't help me either. Is there a work-around?

00:02.1 Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)

[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
render error detected, EIR: 0x00000000
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 119996 at 119995)
------------[ cut here ]------------
WARNING: at drivers/gpu/drm/i915/i915_gem_tiling.c:332 i915_gem_set_tiling+0x1f5/0x200 [i915]()
Hardware name: Presario V2000 (PM064UA#ABA)
failed to reset object for tiling switch
Modules linked in: nls_cp437 vfat fat ipv6 fuse ext2 mbcache usbhid hid usb_storage pcmcia snd_seq_dummy snd_seq_oss sdhci_pci 8139too sdhci snd_seq_midi_event yenta_socket ipw2200 libipw mmc_core snd_intel8x0m snd_seq tifm_7xx1 firewire_ohci 8139cp rsrc_nonstatic cfg80211 mii tifm_core firewire_core led_class crc_itu_t snd_intel8x0 snd_seq_device pcmcia_core rfkill i915 drm_kms_helper snd_ac97_codec drm ac97_bus lib80211 intel_agp shpchp i2c_i801 iTCO_wdt snd_pcm_oss snd_mixer_oss joydev uhci_hcd i2c_algo_bit ehci_hcd iTCO_vendor_support snd_pcm i2c_core snd_timer pci_hotplug agpgart wmi sg ac video processor battery thermal usbcore button output pcspkr snd soundcore psmouse snd_page_alloc serio_raw evdev rtc_cmos rtc_core rtc_lib reiserfs sr_mod cdrom sd_mod pata_acpi ata_generic ata_piix libata scsi_mod
Pid: 1507, comm: X Not tainted 2.6.34-ARCH #1
Call Trace:
[<c10430fd>] warn_slowpath_common+0x6d/0xa0
[<def94015>] ? i915_gem_set_tiling+0x1f5/0x200 [i915]
[<def94015>] ? i915_gem_set_tiling+0x1f5/0x200 [i915]
[<c1043176>] warn_slowpath_fmt+0x26/0x30
[<def94015>] i915_gem_set_tiling+0x1f5/0x200 [i915]
[<de9a32ce>] drm_ioctl+0x1ae/0x410 [drm]
[<def93e20>] ? i915_gem_set_tiling+0x0/0x200 [i915]
[<c102a521>] ? ptep_set_access_flags+0x51/0x60
[<c10d338e>] ? do_wp_page+0x43e/0x7c0
[<c10f49cc>] ? do_sync_write+0x9c/0xd0
[<c1102834>] vfs_ioctl+0x34/0xa0
[<de9a3120>] ? drm_ioctl+0x0/0x410 [drm]
[<c1102f46>] do_vfs_ioctl+0x66/0x580
[<c1026cd0>] ? do_page_fault+0x0/0x3b0
[<c1026ea7>] ? do_page_fault+0x1d7/0x3b0
[<c1068a1f>] ? ktime_get_ts+0xff/0x130
[<c11034bf>] sys_ioctl+0x5f/0x80
[<c105e839>] ? sys_clock_gettime+0x69/0xa0
[<c100379f>] sysenter_do_call+0x12/0x28
---[ end trace cd1a8af9e152e063 ]---

Comment by Colin Pitrat (LiFo2) - Monday, 31 May 2010, 20:21 GMT
I dropped KDE and switched to windowmaker. It helps a lot, I only had the problem once since.
But of course, it's far from a perfect solution !
Comment by Alphazo (alphazo) - Monday, 31 May 2010, 20:24 GMT
Haven't switched to Testing but after disabling screensaver and power saving modes I haven't seen the problem after a week of activity.
Comment by Tomas M. (eldragon) - Monday, 31 May 2010, 20:34 GMT
there are several things to point out:

a) the original bug reported here has been fixed long ago.

[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung

could mean lots of things.

b) intel packages in testing suffer for other bugs concerning tiling.

c) xscreensaver glknots is known to kill the intel GPU too. (another different bug, which has been worked around in mesa with commit 8accf0a8 )

all these bugs report a hangcheck timer error
Comment by Witold Czaplewski (Witi) - Wednesday, 16 June 2010, 08:03 GMT Comment by Manuel Gaul (Inkaine) - Thursday, 17 June 2010, 22:17 GMT
No, it is not. This patch is already included in kernver 2.6.34, but the bug remains.

I'm currently testing kernel command line option "pci=nocrs" as proposed here: http://www.pubbs.net/201003/kernel/58672-regresion-2634-rc1-drmi915hangcheckelapsed-error-hangcheck-timer-elapsed-gpu-hung.html

EDIT: No change, even with "pci=nocrs". Strangely on my system rolling back to kernel 2.6.33.4 prevents the above crash.
Comment by Alois Nespor (anespor) - Tuesday, 06 July 2010, 07:56 GMT
Comment by Alois Nespor (anespor) - Monday, 26 July 2010, 11:54 GMT
GPU hung for Gen3 graphics (like 945GM) should by fixed in 2.6.35-rc6
Comment by Rawcut (rawcut) - Sunday, 01 August 2010, 18:32 GMT
With Gen4 hardware, I found that bug appears with upgrade of mesa and libdrm. I have no more issue when i rollback to mesa 7.7-1 and libdrm 2.4.19-2 (Tested with kernel 2.6.32 and 2.6.34)
Comment by Alois Nespor (anespor) - Sunday, 08 August 2010, 13:44 GMT
I have also no more issue when i rollback to mesa 7.7 and libdrm 2.4.19 with kernel-2.6.35.
Comment by Alois Nespor (anespor) - Thursday, 09 September 2010, 18:43 GMT
yes, on Gen4 mesa 7.7 and libdrm 2.4.19 i have no issue. It is Mesa bug.
Comment by Maxwell Draven (Ravenman) - Monday, 29 November 2010, 19:30 GMT
I have this issue too.

I have kernel26 2.6.36.1-3, xorg-server 1.9.2-2, xf86-video-intel 2.13.0-4 and intel-dri 7.9-1 installed in my system.

VGA card: Intel Corporation 82946GZ/GL Integrated Graphics Controller.
Comment by Balló György (City-busz) - Saturday, 25 December 2010, 17:22 GMT
Same here with Intel 845G, and I can't fix it. I've tried downgrade lots of things, but no helps me.
The latest package versions, which I tried:
kernel26 2.6.36.2-1
libdrm 2.4.23-1
xf86-video-intel 2.13.902-1
xorg-server 1.9.3-1

I experience the following error:
- When GDM start (or with older intel drivers, it happens later), the following error message shown in kernel.log:
render error detected, EIR: 0x00000010
[drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking
render error detected, EIR: 0x00000010

- After some time of usage, all graphic accelerations turned off, and some color displayed incorrectly. Then I got the following in kernel.log:
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung

- And in Xorg.0.log:
[ 823.803] (EE) intel(0): Detected a hung GPU, disabling acceleration.

This bug is independent from libgl/mesa/intel-dri, because I removed all of them, and this bug still happens.
I don't know which component is broken, but I think it's maybe related to kernel's Intel DRM module.
Comment by Karol Błażewicz (karol) - Saturday, 25 December 2010, 17:32 GMT
[karol@black ~]$ lspci | grep -i vga
00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01)

[karol@black ~]$ pacman -Q kernel26 libdrm xf86-video-intel xorg-server
kernel26 2.6.36.2-1
libdrm 2.4.22-3
xf86-video-intel 2.13.0-4
xorg-server 1.9.2-2

No hangups, running just fine.
Comment by Maxwell Draven (Ravenman) - Friday, 14 January 2011, 01:50 GMT
I have this issue in one Toshiba Satellite Pro L450 too.

I have kernel 2.6.36.3-1, xorg-server 1.9.2-2, xf86-video-intel 2.13.0-4 and intel-dri 7.9.0.git20101207-2 installed in my system.

VGA card: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)
Comment by Greg Bair (gregbair) - Sunday, 27 February 2011, 17:44 GMT
I also get this bug. When it happens, the screen goes wonky (kinda like looking at an old scrambled analog cable station). If I switch to TTY1 then back, it's back to normal except the colors on menu items are a bit off (non-disabled button text is grayed out) also, the pointer cursor does not change, it remains the same cursor as when the bug occurs and mouse movement gets slightly "jumpy".

From Xorg.0.log, I get this message just after it happens:

[ 10805.935] (EE) intel(0): Detected a hung GPU, disabling acceleration.

Then I get these messages repeatedly:

[ 11856.016] (WW) intel(0): intel_uxa_prepare_access: bo map failed: Input/output error
[ 11856.171] (EE) intel(0): failed to set cursor: Input/output error



[greg@coruscant ~]$ lspci | grep -i vga
00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01)
[greg@coruscant ~]$ pacman -Q kernel26 libdrm xf86-video-intel xorg-server
kernel26 2.6.37.1-1
libdrm 2.4.23-2
xf86-video-intel 2.14.0-2
xorg-server 1.9.4-1
Comment by Karol Błażewicz (karol) - Sunday, 27 February 2011, 18:32 GMT
@ Greg Bair (gregbair)
Can you try the setup I posted https://bugs.archlinux.org/task/17705#comment70157 ?
Comment by Maxwell Draven (Ravenman) - Tuesday, 15 March 2011, 15:19 GMT
This issue still continues in my upgraded system (kernel26 2.6.37.3-1, xorg-server 1.9.4.901-1, xf86-video-intel 2.14.0-3, intel-dri 7.10.1-1, libgl 7.10.1-1 and libdrm 2.4.23-2).

If I downgrade to xf86-video-intel 2.12.0-3, everything works fine. My computer has one VGA Intel Corporation 82946GZ/GL Integrated Graphics Controller.
Comment by Indan Zupancic (i3839) - Tuesday, 15 March 2011, 23:24 GMT
Can the people that get this hang capture debug/dri/0/i915_error_state?

Depending on username and where you mounted debugfs, modify the below to your needs and run it at boot:

[code]
#!/bin/bash
PATH="/bin:/usr/bin"

M=/sys/kernel/debug
USER=root

mount $M
cd /tmp/

while true; do
if grep -q 0 $M/dri/0/i915_wedged; then
sleep 1;
else
mkdir dump
dmesg > dump/dmesg
cp /var/log/Xorg.0.log dump/
cp -a $M/dri/0/* dump/
tar czf dump.tgz dump
rm -rf dump
chown $USER:users dump.tgz
mv dump.tgz /home/$USER
sync;
sleep 5;
reboot;
exit;
fi
done
[/code]
---

Alternatively, you can probably get to console by doing alt + sysrq + r to get your keyboard back,
alt + sysrq + k, v, e or i, followed by a alt+F1/2/3 to get to the console.
Comment by Greg Bair (gregbair) - Sunday, 17 April 2011, 16:12 GMT
I now get just a whole system freeze since the latest update of xf86-video-intel. I reinstalled Arch because I borked something unrelated horribly, so this is a fresh install with nothing besides base,base-devel,gnome, gnome-extras and firefox installed.

I noticed the freeze happens much sooner if using Firefox than if I use Epiphany.

kernel26 2.6.38.2-1
libdrm 2.4.25-1
xf86-video-intel 2.15.0-1
xorg-server 1.10.1-1

@Karol Błażewicz (karol):
I'm not sure how to downgrade to those packages.
Comment by Karol Błażewicz (karol) - Sunday, 17 April 2011, 16:18 GMT
@Greg Bair (gregbair)
Read the wiki and use ARM. You will need to downgrade a bunch of xorg-related packages e.g.: xorg-server xorg-server-utils xf86-video-intel xf86-video-vesa xf86-input-evdev xf86-input-keyboard xf86-input-mouse intel-dri libgl libdrm mesa xorg-utils xorg-xinit xf86-video-fbdev.
Comment by JM (fijam) - Monday, 02 May 2011, 18:09 GMT
This may be related to: https://bugs.freedesktop.org/show_bug.cgi?id=36147 . Can the issue be resolved by turning relaxed fencing off?
Comment by Indan Zupancic (i3839) - Monday, 02 May 2011, 18:25 GMT
No, relaxed fencing is new, this bug is much older.

Problem is that a lot hung GPUs look the same, though can have very different causes. GPU problems usually show up as either a hung GPU, or screen corruption, and sometimes both.

Either Intel graphic drivers, or Arch's version seem to be very unstable lately. I got an uptime of 20 days with the VESA driver, and just couldn't be bothered to downgrade all X related stuff that messed it up with the newest kernel yet. I guess I'll wait till there's a new xf86-video-intel version or something.
Comment by Jun Wu (quark) - Monday, 04 July 2011, 15:11 GMT
Arch x86_64 also has this bug.

xf86-video-intel 2.15.0-2
intel-dri 7.10.3-1
Comment by Jonathan De Nil (ulukai) - Friday, 12 August 2011, 20:18 GMT
I have a fully updated system, only stable repos and have experienced this problem multiple times today while playing Red Eclipse.

Aug 12 21:57:54 localhost kernel: [ 86.093936] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36131, limit 35000
Aug 12 21:58:04 localhost kernel: [ 96.076672] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36597, limit 35000
Aug 12 21:58:06 localhost kernel: [ 98.534936] hda-intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.
Aug 12 21:58:19 localhost kernel: [ 111.050887] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 40023, limit 35000
Aug 12 21:58:24 localhost kernel: [ 116.042267] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36544, limit 35000
Aug 12 21:58:34 localhost kernel: [ 126.025064] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36898, limit 35000
Aug 12 21:58:44 localhost kernel: [ 136.007959] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37308, limit 35000
Aug 12 21:58:54 localhost kernel: [ 145.990642] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36478, limit 35000
Aug 12 21:59:09 localhost kernel: [ 160.964824] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 40067, limit 35000
Aug 12 21:59:14 localhost kernel: [ 165.956221] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36519, limit 35000
Aug 12 21:59:24 localhost kernel: [ 175.938990] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36632, limit 35000
Aug 12 21:59:34 localhost kernel: [ 185.921793] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36621, limit 35000
Aug 12 21:59:44 localhost kernel: [ 195.904541] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36940, limit 35000
Aug 12 21:59:54 localhost kernel: [ 205.887367] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36860, limit 35000
Aug 12 22:00:04 localhost kernel: [ 215.870128] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37190, limit 35000
Aug 12 22:00:14 localhost kernel: [ 225.852939] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37380, limit 35000
Aug 12 22:00:24 localhost kernel: [ 235.835725] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37523, limit 35000
Aug 12 22:00:34 localhost kernel: [ 245.818514] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37080, limit 35000
Aug 12 22:00:44 localhost kernel: [ 255.801301] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37439, limit 35000
Aug 12 22:00:54 localhost kernel: [ 265.784079] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37255, limit 35000
Aug 12 22:01:04 localhost kernel: [ 275.766871] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37172, limit 35000
Aug 12 22:01:14 localhost kernel: [ 285.749665] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37159, limit 35000
Aug 12 22:01:24 localhost kernel: [ 295.732446] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37221, limit 35000
Aug 12 22:01:34 localhost kernel: [ 305.715255] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37426, limit 35000
Aug 12 22:01:44 localhost kernel: [ 315.697989] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37275, limit 35000
Aug 12 22:01:54 localhost kernel: [ 325.680801] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37524, limit 35000
Aug 12 22:02:04 localhost kernel: [ 335.663561] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36981, limit 35000
Aug 12 22:02:14 localhost kernel: [ 345.646376] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37001, limit 35000
Aug 12 22:02:24 localhost kernel: [ 355.629161] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37703, limit 35000
Aug 12 22:02:34 localhost kernel: [ 365.611952] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 35339, limit 35000
Aug 12 22:02:44 localhost kernel: [ 375.594733] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36254, limit 35000
Aug 12 22:02:54 localhost kernel: [ 385.577521] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37380, limit 35000
Aug 12 22:03:04 localhost kernel: [ 395.560339] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 38159, limit 35000
Aug 12 22:03:14 localhost kernel: [ 405.543091] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37523, limit 35000
Aug 12 22:03:24 localhost kernel: [ 415.525879] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37236, limit 35000
Aug 12 22:03:29 localhost kernel: [ 420.517276] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 35474, limit 35000
Aug 12 22:03:44 localhost kernel: [ 435.491456] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37056, limit 35000
Aug 12 22:03:54 localhost kernel: [ 445.474219] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37961, limit 35000
Aug 12 22:04:04 localhost kernel: [ 455.457023] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36814, limit 35000
Aug 12 22:04:14 localhost kernel: [ 465.439774] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37870, limit 35000
Aug 12 22:04:24 localhost kernel: [ 475.422591] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37667, limit 35000
Aug 12 22:04:34 localhost kernel: [ 485.405351] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37930, limit 35000
Aug 12 22:04:44 localhost kernel: [ 495.388169] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37720, limit 35000
Aug 12 22:04:54 localhost kernel: [ 505.370954] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37387, limit 35000
Aug 12 22:05:04 localhost kernel: [ 515.353742] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37403, limit 35000
Aug 12 22:05:14 localhost kernel: [ 525.336559] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37632, limit 35000
Aug 12 22:05:24 localhost kernel: [ 535.319288] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37194, limit 35000
Aug 12 22:05:34 localhost kernel: [ 545.302120] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36855, limit 35000
Aug 12 22:05:44 localhost kernel: [ 555.284892] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37294, limit 35000
Aug 12 22:05:54 localhost kernel: [ 565.267677] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37222, limit 35000
Aug 12 22:06:04 localhost kernel: [ 575.250432] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37765, limit 35000
Aug 12 22:06:14 localhost kernel: [ 585.233248] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37669, limit 35000
Aug 12 22:06:24 localhost kernel: [ 595.216015] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37399, limit 35000
Aug 12 22:06:34 localhost kernel: [ 605.198820] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37482, limit 35000
Aug 12 22:06:44 localhost kernel: [ 615.181570] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37782, limit 35000
Aug 12 22:06:54 localhost kernel: [ 625.164395] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36969, limit 35000
Aug 12 22:07:04 localhost kernel: [ 635.147138] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37078, limit 35000
Aug 12 22:07:14 localhost kernel: [ 645.129940] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37901, limit 35000
Aug 12 22:07:24 localhost kernel: [ 655.112736] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37133, limit 35000
Aug 12 22:07:34 localhost kernel: [ 665.095561] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37591, limit 35000
Aug 12 22:07:44 localhost kernel: [ 675.078295] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37430, limit 35000
Aug 12 22:07:54 localhost kernel: [ 685.061113] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37503, limit 35000
Aug 12 22:08:04 localhost kernel: [ 695.043874] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 38221, limit 35000
Aug 12 22:08:09 localhost kernel: [ 700.035295] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 35099, limit 35000
Aug 12 22:08:24 localhost kernel: [ 715.009472] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37813, limit 35000
Aug 12 22:08:34 localhost kernel: [ 724.992254] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 36464, limit 35000
Aug 12 22:08:44 localhost kernel: [ 734.975071] intel ips 0000:00:1f.6: MCP limit exceeded: Avg power 37297, limit 35000
Aug 12 22:08:54 localhost kernel: [ 744.954533] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Aug 12 22:08:54 localhost kernel: [ 744.954543] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Aug 12 22:08:54 localhost kernel: [ 744.958002] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 363679 at 363672, next 363680)
Aug 12 22:08:56 localhost kernel: [ 746.765038] gnome-shell[1445]: segfault at 0 ip 00007f62a4945ca5 sp 00007fffbd864150 error 4 in i965_dri.so[7f62a48f6000+b4000]

Gnome-Shell crashes and it takes me back to console login screen. When logging in again, the driver is switched to Gallium 0.4 on LLVM.The air that's coming out of the vent is really hot, but this would be normal after playing 3D games for a while I suppose. However, there is no dust in the vent and a good airflow is in place so overheating shouldn't be happening.

$ uname -a
Linux 3.0-ARCH #1 SMP PREEMPT Sat Aug 6 16:18:35 CEST 2011 x86_64 Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz GenuineIntel GNU/Linux

installed versions:
local/intel-dri 7.11-1 [1.46 MB]
Mesa DRI drivers for Intel
local/libgl 7.11-1 [16.68 MB]
Mesa 3-D graphics library and DRI software rasterizer
local/xf86-video-intel 2.15.0-2 [0.77 MB] (xorg-drivers xorg)
X.org Intel i810/i830/i915/945G/G965+ video drivers

Loading...