FS#41045 - [linux] opengl programs crash after/while suspend to ram with 3.16 kernel

Attached to Project: Arch Linux
Opened by Matthias Krüger (matthiaskrgr) - Tuesday, 01 July 2014, 12:18 GMT
Last edited by Tobias Powalowski (tpowa) - Tuesday, 07 October 2014, 16:53 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan de Groot (JGC)
Thomas Bächler (brain0)
Andreas Radke (AndyRTR)
Laurent Carlier (lordheavy)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 30
Private No

Details

When I run a opengl app and the suspend to ram, it will crash.
Also after suspending to ram, I cannot run any opengl apps.

reproducing the problem in gdb:

(gdb) exec-file /usr/bin/glxgears
(gdb) run
Starting program: /usr/bin/glxgears
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff468c1ef in ?? () from /usr/lib/xorg/modules/dri/i965_dri.so
(gdb)


This task depends upon

Closed by  Tobias Powalowski (tpowa)
Tuesday, 07 October 2014, 16:53 GMT
Reason for closing:  Fixed
Comment by Ukyoi D (ukyoi) - Friday, 04 July 2014, 14:46 GMT
I have the same problem with Intel GM45 chipest (Thinkpad X200).
Comment by babu (babubabubabo) - Monday, 07 July 2014, 08:09 GMT
Same issue here

please also have a look at https://bbs.archlinux.org/viewtopic.php?id=183359
Comment by Michael Fuchs (mukl) - Wednesday, 06 August 2014, 14:33 GMT
Either it is not fixed or the bug I reported is a separate one: https://bugs.archlinux.org/task/41083
Comment by Anton Tsyganenko (anton-tsyganenko) - Wednesday, 06 August 2014, 15:19 GMT
the bug is not fixed yet, after upgrading the system a few days ago (to linux 3.16 or intel-dri 10.2.5-1), it started happen every time I try to run OpenGl-apps. https://bbs.archlinux.org/viewtopic.php?pid=1443930#p1443930
Comment by Frederic Bezies (fredbezies) - Thursday, 07 August 2014, 13:57 GMT
Can confirm this bug with linux 3.16.0-2 kernel from testing. Intel-dri is not guilty for this bug, because I'm using it without problem and kernel 3.14 / 3.15.

Downgrading kernel to lts version or to 3.15.8-1 fixes the bug.

intel_do_flush_locked failed: argument invalide (invalid argument) is found twice in a log I generated with journalctl -b on my laptop with linux 3.16 kernel and trying to launch gnome shell :

août 07 15:41:44 fredo-arch-laptop gnome-session[353]: intel_do_flush_locked failed: Argument invalide
août 07 15:41:44 fredo-arch-laptop polkitd[260]: Unregistered Authentication Agent for unix-session:c1 (system bus name :1.24, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale fr_FR.UTF-8) (disconnected from bus)
août 07 15:41:44 fredo-arch-laptop gnome-session[353]: gnome-session[353]: WARNING: Child process 424 was already dead.
août 07 15:41:44 fredo-arch-laptop gnome-session[353]: WARNING: Child process 424 was already dead.
août 07 15:41:45 fredo-arch-laptop gdm-Xorg-:0[253]: The XKEYBOARD keymap compiler (xkbcomp) reports:
août 07 15:41:45 fredo-arch-laptop gdm-Xorg-:0[253]: > Warning: Compat map for group 2 redefined
août 07 15:41:45 fredo-arch-laptop gdm-Xorg-:0[253]: > Using new definition
août 07 15:41:45 fredo-arch-laptop gdm-Xorg-:0[253]: > Warning: Compat map for group 3 redefined
août 07 15:41:45 fredo-arch-laptop gdm-Xorg-:0[253]: > Using new definition
août 07 15:41:45 fredo-arch-laptop gdm-Xorg-:0[253]: > Warning: Compat map for group 4 redefined
août 07 15:41:45 fredo-arch-laptop gdm-Xorg-:0[253]: > Using new definition
août 07 15:41:45 fredo-arch-laptop gdm-Xorg-:0[253]: Errors from xkbcomp are not fatal to the X server
août 07 15:41:46 fredo-arch-laptop polkitd[260]: Registered Authentication Agent for unix-session:c1 (system bus name :1.31 [gnome-shell --mode=gdm], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale fr_FR.UTF-8)
août 07 15:41:46 fredo-arch-laptop gnome-session[353]: Gjs-Message: JS LOG: No permission to trigger offline updates: Polkit.Error: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: Action org.freedesktop.packagekit.trigger-offline-update is not registered
août 07 15:41:46 fredo-arch-laptop gnome-session[353]: Gjs-Message: JS LOG: GNOME Shell started at Thu Aug 07 2014 15:41:46 GMT+0200 (CEST)
août 07 15:41:46 fredo-arch-laptop dbus[234]: [system] Activating via systemd: service name='org.freedesktop.GeoClue2' unit='geoclue.service'
août 07 15:41:46 fredo-arch-laptop systemd[1]: Starting Location Lookup Service...
août 07 15:41:46 fredo-arch-laptop dbus[234]: [system] Successfully activated service 'org.freedesktop.GeoClue2'
août 07 15:41:46 fredo-arch-laptop systemd[1]: Started Location Lookup Service.
août 07 15:41:46 fredo-arch-laptop gnome-session[353]: intel_do_flush_locked failed: Argument invalide
août 07 15:41:46 fredo-arch-laptop gnome-session[353]: WARNING: App 'gnome-shell.desktop' respawning too quickly
août 07 15:41:46 fredo-arch-laptop polkitd[260]: Unregistered Authentication Agent for unix-session:c1 (system bus name :1.31, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale fr_FR.UTF-8) (disconnected from bus)
août 07 15:41:46 fredo-arch-laptop gnome-session[353]: gnome-session[353]: WARNING: App 'gnome-shell.desktop' respawning too quickly
août 07 15:41:46 fredo-arch-laptop gnome-session[353]: Unrecoverable failure in required component gnome-shell.desktop
août 07 15:41:47 fredo-arch-laptop gnome-session[353]: (gnome-settings-daemon:375): GLib-GIO-CRITICAL **: g_dbus_proxy_call_internal: assertion 'G_IS_DBUS_PROXY (proxy)' failed
août 07 15:42:04 fredo-arch-laptop gdm-Xorg-:0[253]: (II) AIGLX: Suspending AIGLX clients for VT switch






Comment by babu (babubabubabo) - Thursday, 07 August 2014, 14:12 GMT
I can confirm this error.

It started with the linux 3.15.x. Could not run opengl applications after resume from suspend.

Since 3.16, I even can not run opengl right after boot.
Comment by C. Simon (mentalis) - Monday, 11 August 2014, 08:52 GMT
I also experience the same issue on my GMA45 Lenvo X200 laptop,
if I put the laptop to sleep and resume it, I get

intel_do_flush_locked failed: Invalid argument

when running opengl apps, mpv, flash player keep crashing too.
I have to reboot to keep going.

Running kernel 3.15.8 x86_64 Here.
Comment by AMM (amish) - Thursday, 14 August 2014, 03:36 GMT
For me issue did not occur in Kernel 3.15 but started appearing in Kernel 3.16.

It happens even after system restart i.e. without suspend/resume

I am using KDE - kwin crashes with intel_do_flush_locked failed: invalid argument.

Since kwin crashes, effectively I can not use desktop at all because:

You can not see window title - window buttons (close-minimize)
You can not use alt-tab to switch between multiple windows
etc.

Currently I fixed this by disabling Desktop effects and OpenGL.
Comment by Frederic Bezies (fredbezies) - Thursday, 14 August 2014, 08:46 GMT
Since 3.16 is in core, I decided to uninstall Gnome and replace it by Mate Desktop. GDM is crashing a lot.

My lscpi :

00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) (prog-if 00 [VGA controller])
Subsystem: Toshiba America Info Systems Device ff67
Flags: bus master, fast devsel, latency 0, IRQ 46
Memory at d0000000 (64-bit, non-prefetchable) [size=4M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
I/O ports at 5110 [size=8]
Expansion ROM at <unassigned> [disabled]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915
Comment by Quentin Stievenart (acieroid) - Thursday, 14 August 2014, 09:11 GMT
You might want to switch to linux-lts to avoid this problem instead of changing your desktop environment.

Same problem happens here on a Lenovo X200, lspci follows.

00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) (prog-if 00 [VGA controller])
Subsystem: Lenovo Device 20e4
Flags: bus master, fast devsel, latency 0, IRQ 48
Memory at f2000000 (64-bit, non-prefetchable) [size=4M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
I/O ports at 1800 [size=8]
Expansion ROM at <unassigned> [disabled]
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 3
Kernel driver in use: i915
Kernel modules: i915

Trilby also posted a potential solution (switching to uxa): https://bbs.archlinux.org/viewtopic.php?pid=1446267#p1446267
I haven't tried this yet but it might be worth a try.
Comment by Frederic Bezies (fredbezies) - Thursday, 14 August 2014, 09:22 GMT
I wanted to test Mate Desktop anyway, so I took this opportunity. And I don't really like LTS kernels ;)

Will try uxa trick... With a little luck :)

Thanks for the forum thread.

Your "trick" is working. Looks like sna is far from betting ready for prime time.
Comment by Anton Tsyganenko (anton-tsyganenko) - Thursday, 14 August 2014, 13:28 GMT
does it affect x86-systems?
Comment by Matthias Krüger (matthiaskrgr) - Thursday, 14 August 2014, 13:30 GMT
This bug was fixed for me in previous kernel version(s) but as I upgraded to 3.16.0-2 it is present again.
x86_64-system here, lenovo t400
Comment by Diego Viola (diegoviola) - Friday, 15 August 2014, 08:32 GMT
Same problem here.

OpenGL is broken, Xv is also broken.

I get this in my dmesg:

[ 12.993393] [drm:i915_gem_init] *ERROR* Failed to initialize GPU, declaring it wedged

glxgears and other OpenGL apps return this: intel_do_flush_locked failed: Invalid argument

Arch Linux (x86_64)
Comment by Sergi (Cauerpi) - Friday, 15 August 2014, 10:19 GMT
Yes, i have Same problem here i use kernel 3.16.1 any solution ?

I have x86_64.

Thanks.

Comment by Anton Tsyganenko (anton-tsyganenko) - Friday, 15 August 2014, 12:35 GMT
Has anyone tried to produce it on x86-systems? If it appears only on x86_64-systems, it can be very helpful, I think.

I have x86_64.
Comment by Francis Herne (FLHerne) - Saturday, 16 August 2014, 19:16 GMT
It probably can't be reproduced on an x86, since every single report (in this bug or the thread linked above) refers to a GM45/x4500 or doesn't provide enough information to tell. That chipset only works with Core2 which is x86_64.
Comment by Anton Tsyganenko (anton-tsyganenko) - Saturday, 16 August 2014, 19:27 GMT
I have an Intel Celeron processor.
Comment by Francis Herne (FLHerne) - Saturday, 16 August 2014, 20:20 GMT
Celerons are pretty much just rebadged versions of a huge range of other processor families - everything from Pentium IIs to recentish Sandy Bridge chips. Could you check the exact model?
Comment by Anton Tsyganenko (anton-tsyganenko) - Sunday, 17 August 2014, 09:31 GMT
Not sure I that's what you need, output of lscpu:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 23
Model name: Intel(R) Celeron(R) CPU 743 @ 1.30GHz
Stepping: 10
CPU MHz: 1296.740
BogoMIPS: 2594.54
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
NUMA node0 CPU(s): 0
Comment by Francis Herne (FLHerne) - Sunday, 17 August 2014, 10:33 GMT
Yeah, that's a Penryn-based one - essentially a low-specced Core2 chip.
Certainly seems like this bug is specific to the x4500 IGPs.
Comment by Anton Tsyganenko (anton-tsyganenko) - Sunday, 17 August 2014, 12:07 GMT
a bad fix of problem: https://wiki.archlinux.org/index.php/IntelGMA#X_freeze.2Fcrash_with_intel_driver - disable DRI

after that openGL-apps will run very slow instead of crashing.

Comment by babu (babubabubabo) - Sunday, 17 August 2014, 13:03 GMT
For additional information:

I have a IGP of the x4500 family, too.
Comment by Laurent Carlier (lordheavy) - Tuesday, 19 August 2014, 21:11 GMT
upstream kernel bug, not yet fixed:
https://bugzilla.kernel.org/show_bug.cgi?id=82481
Comment by Francis Herne (FLHerne) - Tuesday, 19 August 2014, 23:42 GMT
The rename (and possibly upstream link) seems wrong. The failure after suspend as reported in this bug occurs since 3.15.

The upstream report is for a similar/the same bug occurring immediately on boot, which does seem to occur only in 3.16. It doesn't mention failure after suspend.
Comment by Diego Viola (diegoviola) - Wednesday, 20 August 2014, 06:28 GMT
They have submitted some patches here that apparently solves the problem:

https://bugs.freedesktop.org/show_bug.cgi?id=76554

Have you guys tried it?
Comment by Diego Viola (diegoviola) - Wednesday, 20 August 2014, 17:28 GMT Comment by babu (babubabubabo) - Thursday, 21 August 2014, 15:06 GMT
Today i built the 3.17-rc1 and it seems to be fixed.
I have done a few suspendToRam/resume cycles.
As mentioned by Jiri Kosina in https://bugs.freedesktop.org/show_bug.cgi?id=76554#c84, the error can still appear, but not that often.
So we will see, what the next resumes will bring :-)
Comment by Ava Lemke (ava_lemke) - Saturday, 23 August 2014, 21:35 GMT
I can't believe this shit is still broken. WTF arch?
Comment by John (graysky) - Monday, 25 August 2014, 07:00 GMT
Some Intel interrogated video hardware is rendered useless by this recent kernel bug. On my system, useless means X is a black screen on my Intel GMA X4500HD. Others have reported this bug with newer hardware as well.[1] Upstream has a patch which was submitted to the stable team on Aug 7th, but I find no traces of it in the stable queue.[2] The developer himself mentioned on Aug 7th, that it will get backported to stable kernels eventually, I think we should consider it important enough to add to our kernel until it is accepted.[3]

Upstream report: https://bugs.freedesktop.org/show_bug.cgi?id=76554
Upstream patch: https://bugs.freedesktop.org/attachment.cgi?id=104224

1. https://bbs.archlinux.org/viewtopic.php?id=185650
2. https://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/tree/queue-3.16
3. https://bugs.freedesktop.org/show_bug.cgi?id=76554#c103

Additional info:
* package version(s) 3.16.1-1
* the error is reported in dmesg:

[drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 (valid? 1) head 00000298 tail 00000000 start 001d8000 [expected 001d8000]
[drm:i915_gem_init] *ERROR* Failed to initialize GPU, declaring it wedged

Steps to reproduce:
1. Run the affected hardware and boot the machine.
Comment by Robert Orzanna (orschiro) - Tuesday, 26 August 2014, 11:18 GMT
Will the patch be backported?

I can confirm the issue on my X200T with an Intel X4500 and a x86_64 system. Switching to UXA mode does not help. The crash on start of xinitrc and resume from suspend remains:

[drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 (valid? 1) head 00000298 tail 00000000 start 001d8000 [expected 001d8000]
[drm:i915_gem_init] *ERROR* Failed to initialize GPU, declaring it wedged
Comment by Jason Lenz (lenzj) - Friday, 29 August 2014, 14:21 GMT
I have a Lenovo T400 and am getting the following kernel error during boot.

[drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 (valid? 1) head 0000c020 tail 00000000 start 00042000 [expected 00042000]
[drm:i915_gem_init] *ERROR* Failed to initialize GPU, declaring it wedged

Kernel boot continues after a long pause, and console works fine. However when running Xorg and starting certain graphics applications, the program immediately crashes. One such application is firefox with flash (viewing streaming video's etc).

This has been solved by switching from SNA mode to UXA mode using Xorg config file below. Streaming videos now working fine. Kernel boot error message obviously still persists though.

More info at -> https://wiki.archlinux.org/index.php/Intel_graphics#SNA_issues

----/etc/X11/xorg.conf.d/20-intel.conf--------
Section "Device"
Identifier "Intel Graphics"
Driver "intel"
#Option "AccelMethod" "sna"
Option "AccelMethod" "uxa"
#Option "AccelMethod" "glamor"
EndSection
----End of file----
Comment by John (graysky) - Friday, 29 August 2014, 14:50 GMT
@lenzj - does the patch I referenced fix it for you? https://bugs.freedesktop.org/attachment.cgi?id=104224
Comment by Jason Lenz (lenzj) - Friday, 29 August 2014, 20:16 GMT
@graysky - I applied the patch. Everything seems to be running great now. No error message during kernel boot. Graphical applications run both in SNA and UXA mode. It's been a while since I've patched and compiled a custom kernel, so it took me a while but things are running better. Thank you. I would be in favor of adding this patch to the official Arch kernel build if it's going to be a while before the developer implements it.
Comment by C. Simon (mentalis) - Saturday, 30 August 2014, 19:31 GMT
@graysky I recompiled the 3.16.1 kernel two days ago with the patch and on my side it looks like all the issues I have been having with opengl apps crashing are fixed, I can say that SNA is working fine again here.

Running on x200 Intel X4500(GMA45).
Comment by Robert Orzanna (orschiro) - Saturday, 30 August 2014, 19:43 GMT
@mentalis

I am also running an X200. Mind sharing your PKGBUILD that you used to recompile the kernel?

Thanks ahead!
Comment by Alexis Viguié (Siphoné) - Saturday, 30 August 2014, 19:49 GMT
Having the same problem with a GM45. I hope the patch will be included in the next version or the package so I won't have to use the LTS kernel anymore.
Comment by C. Simon (mentalis) - Saturday, 30 August 2014, 20:15 GMT Comment by Robert Orzanna (orschiro) - Sunday, 31 August 2014, 05:42 GMT
@mentalis

Thank you a lot! I successfully recompiled the kernel and the error is resolved!
Comment by Robert Orzanna (orschiro) - Tuesday, 02 September 2014, 08:10 GMT
I was too early with my conclusion. The error message is resolved. But Xorg still crashes if I (un)plug my power cable of my laptop or sometimes on resume from hibernation. I tried several xorg files in addition but none of them could resolve the error completely. I used the xorg files as mentioned here: https://wiki.archlinux.org/index.php/Intel_graphics#SNA_issues

The last error log:

[ 8733.710] (II) intel: Driver for Intel(R) HD Graphics: 2000-6000
[ 8733.710] (II) intel: Driver for Intel(R) Iris(TM) Graphics: 5100, 6100
[ 8733.710] (II) intel: Driver for Intel(R) Iris(TM) Pro Graphics: 5200, 6200, P6300
[ 8733.710] (++) using VT number 1

[ 8733.710] (--) controlling tty is VT number 1, auto-enabling KeepTty
[ 8733.710] xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)
[ 8733.711] (EE) No devices detected.
[ 8733.711] (EE)
Fatal server error:
[ 8733.711] (EE) no screens found(EE)
[ 8733.711] (EE)
Please consult the The X.Org Foundation support
at http://wiki.x.org
for help.
[ 8733.711] (EE) Please also check the log file at "/home/orschiro/.local/share/xorg/Xorg.0.log" for additional information.
[ 8733.711] (EE)
Comment by babu (babubabubabo) - Monday, 08 September 2014, 09:52 GMT
Just searched for the patch in the git repo of the 3.16.2 kernel. The patch is NOT included in the 3.16.2 version. So there is no need for us to try the new kernel which is in the testing repo right now.
Comment by Ukyoi D (ukyoi) - Wednesday, 10 September 2014, 15:14 GMT
Linux-ck 3.16.2 works just fine. Is that because @John (graysky) patched it? Thanks, Graysky.
Comment by John (graysky) - Wednesday, 10 September 2014, 19:43 GMT
Yes, I patch linux-ck with the upstream patch which is why your system works. Been doing it since 3.16.1-2: http://pkgbuild.com/git/aur-mirror.git/commit/linux-ck?id=2845e542e70065b20984ac3eff670ab700fd2cef
Comment by John (graysky) - Monday, 22 September 2014, 16:00 GMT

Loading...