FS#34611 - [linux] 3.8.x - 3.11.x drm and i915 hangs the GPU on X11. (e.g. Lenovo X200, Lenovo T400 with

Attached to Project: Arch Linux
Opened by George Amanakis (gamanakis) - Thursday, 04 April 2013, 20:40 GMT
Last edited by Tobias Powalowski (tpowa) - Thursday, 10 October 2013, 10:45 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 9
Private No

Details

Description:
Since the 3.8.x. kernels, the drm and i915 hangs the GPU on X11. (e.g. Lenovo X200, Lenovo T400 with Intel X4500)
See here: https://bbs.archlinux.org/viewtopic.php?id=160186

Steps to reproduce:
Just install the new repo kernels. Shortly after the mouse, screen and keyboard hangs periodically.

Solution:
doing a git bisect from 3.7 through 3.8 on the official git kernel repo the following commit is the culprit:
commit 69787f7da6b2adc4054357a661aaa1701a9ca76f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Tue Oct 23 18:23:34 2012 +0000

drm: run the hpd irq event code directly

All drivers already have a work item to run the hpd code, so we don't
need to launch a new one in the helper code. Dave Airlie mentioned
that the cancel+re-queue might paper over DP related hpd ping-pongs,
hence why this is split out.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Reversing this commit on the 3.8 kernels resolve the issue.
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Thursday, 10 October 2013, 10:45 GMT
Reason for closing:  Upstream
Comment by Tobias Powalowski (tpowa) - Friday, 05 April 2013, 14:08 GMT
You need to report this upstream that they can do something.
Comment by George Amanakis (gamanakis) - Friday, 05 April 2013, 22:22 GMT
Upstream bug filed.
Comment by cfr (cfr42) - Friday, 05 April 2013, 23:43 GMT
It would be helpful if people could provide links to upstream bugs when they file them - especially if they go to the trouble of noting the fact here already!
Comment by George Amanakis (gamanakis) - Sunday, 07 April 2013, 21:55 GMT Comment by Tobias Powalowski (tpowa) - Thursday, 23 May 2013, 20:02 GMT
Status on 3.9?
Comment by jstjohn (jstjohn) - Wednesday, 29 May 2013, 17:14 GMT
I've been having similar problems that started around the 3.9 series. I'm currently running 3.9.4-1-ARCH, and I've had X crash several times already today. I believe it happened when I was using 3.9.3, but I didn't bother to investigate until it started happening frequently (today), so I can't say with certainty.

I'm using xf86-video-intel-2.21.8-1 and xorg-server-1.14.1-1. I'm using a Dell Inspiron 1764 laptop with an Intel Core i5 M430.

Here are the systemd journal logs from when it crashed:

May 29 18:42:35 hostname kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
May 29 18:42:35 hostname kernel: [drm] capturing error event; look for more information in/sys/kernel/debug/dri/0/i915_error_state
May 29 18:42:37 hostname kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
May 29 18:42:37 hostname kernel: [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
May 29 18:42:37 hostname kernel: [drm:i915_reset] *ERROR* Failed to reset chip.
May 29 18:42:38 hostname polkitd[1279]: Unregistered Authentication Agent for unix-session:1 (system bus name :1.24, object path /org/gnome/PolicyKit1/AuthenticationAgent, locale en_US.utf8) (disconnected from bus)
May 29 18:42:38 hostname gnome-session[1174]: Gdk-WARNING: gnome-session: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
May 29 18:42:38 hostname colord[885]: device removed: xrandr-LVDS1
May 29 18:42:38 hostname colord[885]: Profile removed: icc-ff8f325d5d3d400361fed656c4e65e3c

I also have the output of `cat /sys/kernel/debug/dri/0/i915_error_state`. I decided to not post it right away because it's a 1.5 MB text file... If it's deemed necessary, I can provide it.

I have not tried the patch provided by George.
Comment by jstjohn (jstjohn) - Friday, 07 June 2013, 15:52 GMT
Surprisingly, X hasn't crashed a single time since I posted my previous comment.
Comment by enrico stano (enricostn) - Friday, 14 June 2013, 10:20 GMT
I'm getting same freezes and errors using:

kernel 3.9.4-1-ARCH
xf86-video-intel 2.21.8-1
xorg-server 1.14.1-1

maybe could be worth of note that those X freezes started when I installed gnome 3.8, if I switch back to xmonad WM I don't see any X freeze.
Comment by Eugene Lipchansky (NSky) - Friday, 14 June 2013, 10:41 GMT
I experience this for a few months already)
Comment by Tobias Powalowski (tpowa) - Tuesday, 30 July 2013, 10:37 GMT
Status on 3.10.x?
Comment by Eugene Lipchansky (NSky) - Tuesday, 30 July 2013, 19:49 GMT
3.10.3-1-ARCH
the problem still exists
Comment by Vyacheslav Stetskevych (tskevy) - Friday, 23 August 2013, 15:41 GMT
+1
kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
happens several times a day, the screen freezes then after about 15 seconds it unfreezes again, with the above message in the logs.
Comment by Paul (shellclear) - Monday, 09 September 2013, 23:59 GMT
The problema still remains...
Linux Arch 3.10.10-1-ARCH #1 SMP PREEMPT Fri Aug 30 11:30:06 CEST 2013 x86_64 GNU/Linux
Sep 09 20:52:25 Arch kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

pacman -Qi xf86-video-intel
Name : xf86-video-intel
Version : 2.21.15-1
Description : X.org Intel i810/i830/i915/945G/G965+ video drivers
Architecture : x86_64
URL : http://xorg.freedesktop.org/
Licenses : custom
Groups : xorg-drivers xorg
Provides : xf86-video-intel-uxa xf86-video-intel-sna
Depends On : intel-dri libxvmc pixman xcb-util>=0.3.9
Optional Deps : None
Required By : None
Optional For : None
Conflicts With : xorg-server<1.14.0 X-ABI-VIDEODRV_VERSION<14 X-ABI-VIDEODRV_VERSION>=15 xf86-video-intel-sna xf86-video-intel-uxa xf86-video-i810 xf86-video-intel-legacy
Replaces : xf86-video-intel-uxa xf86-video-intel-sna
Installed Size : 1733.00 KiB
Packager : Laurent Carlier <lordheavym@gmail.com>
Build Date : Wed 21 Aug 2013 12:25:27 PM BRT
Install Date : Sat 24 Aug 2013 11:51:24 AM BRT
Install Reason : Explicitly installed
Install Script : Yes
Validated By : Signature

pacman -Qi xorg-server
Name : xorg-server
Version : 1.14.2-2
Description : Xorg X server
Architecture : x86_64
URL : http://xorg.freedesktop.org
Licenses : custom
Groups : xorg
Provides : X-ABI-VIDEODRV_VERSION=14 X-ABI-XINPUT_VERSION=19 X-ABI-EXTENSION_VERSION=7.0 x-server
Depends On : libxdmcp libxfont libpciaccess libdrm pixman>=0.28.0 libgcrypt libxau xorg-server-common xf86-input-evdev
Optional Deps : None
Required By : None
Optional For : None
Conflicts With : nvidia-utils<=290.10
Replaces : None
Installed Size : 3365.00 KiB
Packager : Jan de Groot <jgc@archlinux.org>
Build Date : Mon 01 Jul 2013 07:49:52 AM BRT
Install Date : Fri 09 Aug 2013 07:44:21 PM BRT
Install Reason : Explicitly installed
Install Script : No
Validated By : Signature




Comment by Tobias Powalowski (tpowa) - Tuesday, 17 September 2013, 09:59 GMT
Status on 3.11.1?
Comment by Paul (shellclear) - Wednesday, 18 September 2013, 00:03 GMT
Comment by Paul (shellclear) - Wednesday, 18 September 2013, 00:43 GMT
The problem still remains


Sep 17 21:43:09 archlinux kernel: [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
Sep 17 21:43:09 archlinux kernel: [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
Sep 17 21:43:09 archlinux kernel: [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x9c86000 ctx 1) at 0x9c861d8

Loading...