FS#55629 : [linux] Intel i915 driver issue in kernel 4.13 requiring restart.

FS#55629 - [linux] Intel i915 driver issue in kernel 4.13 requiring restart.

Attached to Project: Arch Linux
Opened by John Bennett (Lindows) - Thursday, 14 September 2017, 03:10 GMT
Last edited by Doug Newgard (Scimmia) - Sunday, 08 October 2017, 23:25 GMT

Task Type	Bug Report
Category	Kernel
Status	Closed
Assigned To	Tobias Powalowski (tpowa) Jan Alexander Steffens (heftig)
Architecture	x86_64
Severity	Critical
Priority	Normal
Reported Version
Due in Version	Undecided
Due Date	Undecided
Percent Complete
Votes	35 Adrien BRESSON (abresson) (2017-10-09) alexander sanoll (sonix07) (2017-10-05) Sören (soeren) (2017-10-05) Baptiste Grenier (gwarf) (2017-10-04) Vladimir Krivopalov (Argenet) (2017-10-04) Amy Wilson (PinkCathodeCat) (2017-10-04) Mike Appleby (appleby) (2017-10-03) Angus Gibson (angusg) (2017-10-03) Nikola (nikola3244) (2017-10-03) Eric Blau (eblau) (2017-10-03) Mike C (ggg377) (2017-10-03) Jonas Platte (jP_wanN) (2017-10-03) Joost Rijneveld (joostrijneveld) (2017-10-03) Pierre Durand (Pierrre) (2017-10-03) John Lindgren (jlindgren) (2017-09-30) Mikkel Oscar Lyderik (moscar) (2017-09-30) Antonio Corbi bellot (acorbi) (2017-09-30) Kilian Lackhove (crabman) (2017-09-30) Reto (Sc13ntist) (2017-09-30) Axel Kellermann (akellerm) (2017-09-29) sjung (tummychow) (2017-09-29) Vasyl Demin (zersaa) (2017-09-29) Simon Wydooghe (HyperBaton) (2017-09-29) Jon Gjengset (Jonhoo) (2017-09-29) treeshateorcs (budkin) (2017-09-28) Tom Vincent (tlvince) (2017-09-28) Ugo Pozo (ugopozo) (2017-09-28) xduugu (xduugu) (2017-09-25) Daniel Bermond (Bermond) (2017-09-21) Henri (Valta) Osmankäämi (cgx) (2017-09-19) Ronan (ronjouch) (2017-09-19) Harish (sitwano) (2017-09-16) Konrad Czechowski (fector) (2017-09-15) John Bennett (Lindows) (2017-09-14) c (c) (2017-09-14)
Private	No

Details

Error message on boot of CPU pipe A FIFO underrun due to an issue in the intel i915 driver. Upon start of X Server the entire screen freezes and the machine locks up. Changing virtual terminals does not work and the entire machine requires a shutdown.

This seems to happen on the 4.13.x series of kernel. I have not seen this bug in the 4
.12.x series or the 4.9 series kernels.

Thinkpad T410
Architecture: x86_64
Model name: Intel(R) Core(TM) i5-540 M CPU @ 2.53GHz
Graphics: Intel Ironlake Mobile

Steps to reproduce:
-Boot machine and wait for error message which will be displayed as part of the dmesg on boot.
-Start X server and wait 30-40 seconds. Laptop will freeze and require a restart.

The following error message is displayed in dmesg and journal:
kernel:[drm:intel_cpu_fifo_underrun_irq_handler[i915]]*ERROR* CPU pipe A FIFO underrun.

I have reverted back to kernel 4.9 LTS to avoid this problem.

This task depends upon

Closed by Doug Newgard (Scimmia)
Sunday, 08 October 2017, 23:25 GMT
Reason for closing: Fixed
Additional comments about closing: linux 4.13.5-1

Comment by c (c) - Thursday, 14 September 2017, 08:38 GMT

Same problem here and equally unreproducable with 4.9-lts or 4.12.2 or 4.12.2-ck.

Comment by loqs (loqs) - Thursday, 14 September 2017, 10:44 GMT

If it is not fixed by 4.13.2 I would suggest bisecting the 4.13 kernel to find the bad commit and report it upstream.

Comment by Nico Schottelius (telmich) - Thursday, 14 September 2017, 13:44 GMT

Not a problem on Intel Corporation HD Graphics 620 (rev 02) though - just a heads up.

Comment by loqs (loqs) - Thursday, 14 September 2017, 14:46 GMT

Does it happen with both the modesetting and intel DDX? Are any options being passed to the i915 module if so does removing them have any effect?
One line from dmesg is not very useful out of context see https://01.org/linuxgraphics/documentation/how-report-bugs

Comment by John Bennett (Lindows) - Thursday, 14 September 2017, 15:02 GMT

Switching between SNA and UXA does not fix the problem. So yes both the modsetting and DDX are effected.

The GUI I am using is Cinnamon if that is of any help

Comment by François Guerraz (kubrick) - Thursday, 14 September 2017, 15:28 GMT

The problem happens too on Wayland on a Skylake architecture. At least the machine doesn't crash with wayland, it survives the FIFO underrun after a few seconds of glitches.

Comment by c (c) - Thursday, 14 September 2017, 21:41 GMT

For me it's on Sandybridge (gen6, gt2).

Comment by Gerrit Großkopf (kingcreole) - Friday, 15 September 2017, 07:02 GMT

not sure if it is the same thing for me, however yea my laptop hangs the same way, only things that somehow make some sense in the logs is this:

Sep 15 08:18:48 Marvin kernel: rtc_cmos: probe of 00:01 failed with error -16

Sep 15 08:18:48 Marvin kernel: pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
Sep 15 08:18:48 Marvin kernel: pci 0000:00:14.0: can't derive routing for PCI INT A
Sep 15 08:18:48 Marvin kernel: pci 0000:00:14.0: PCI INT A: not connected

i have a broadwell processor with HD Graphics 5500 and Optimus Setup though with a Nvidia 840M, so i got all them Graphics Drivers Problems xD

Comment by Konrad Czechowski (fector) - Friday, 15 September 2017, 10:53 GMT

Fixed it on my system by adding this kernel parameter: intel_iommu=igfx_off to /etc/default/grub. You could also use intel_iommu=off.
Problem is caused by this line:
CONFIG_INTEL_IOMMU_DEFAULT_ON=y
in config.x86_64. Until 4.13 kernels it was:
# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set

At this moment only linux-hardened package uses this old config.
I don't know why IOMMU is enabled by default now. It always caused me trouble.

Comment by c (c) - Friday, 15 September 2017, 15:09 GMT

@fector that's a nice find. I can confirm that a custom 4.13.2 with IOMMU disabled
completely does not show the symptoms. Been running Xorg for 30minutes and Wayland
for 2hours. And there was no FIFO underrun error in dmesg when booting right when
the KMS switch happens during systemd-init. Video playback and everything else
and no error dmesg so far.

Comment by John Bennett (Lindows) - Friday, 15 September 2017, 15:49 GMT

I disabled intel_iommu and it worked, I did not get an error on boot and nothing in the dmesg. Thanks for all your help!

Comment by c (c) - Friday, 15 September 2017, 16:54 GMT

I don't think this should be closed, at least not without someone creating and linking
or finding and linking the relevant upstream bug report first. But if IOMMU is
generally unstable on Intel then I guess it can be closed with a plan to disable
IOMMU in 4.12.3-2, although I find that hard to believe.

Comment by c (c) - Friday, 15 September 2017, 17:39 GMT

Possibly related Ubuntu bug https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1550779.
Possibly because the kernel versions mentioned may not point at the same regression
but the errors do and at least one recent comment saw this happening with 4.13
and not earlier.

Comment by c (c) - Friday, 15 September 2017, 18:07 GMT

I mean the only reason we saw this in 4.13 is because IOMMU has been
enabled and it makes sense others have been seeing this different
kernels before.

Found upstream bug https://bugs.freedesktop.org/show_bug.cgi?id=100219

Comment by c (c) - Friday, 15 September 2017, 18:09 GMT

Maybe if I disable VT-d in the BIOS this would prevent it even if
I boot a kernel with IOMMU enabled. Worth a test.

Comment by Konrad Czechowski (fector) - Friday, 15 September 2017, 18:20 GMT

Disabled Intel VT in BIOS Setup and removed iommu_intel kernel parameter. Same thing, freeze after startx.

Comment by c (c) - Friday, 15 September 2017, 18:32 GMT

@kingcreole I don't think what you see is the same regression but if it
is fixed by disabling IOMMU it could be related.

Comment by c (c) - Friday, 15 September 2017, 18:33 GMT

@kingcreole there's a similar bug related to IOMMU and with dual gpus
but it's on AMD: https://bugs.archlinux.org/task/53609

Comment by c (c) - Friday, 15 September 2017, 18:35 GMT

@fector I didn't try to change kernel arguments and only tested with
no change in BIOS and a custom kernel that completely disable IOMMU
in the device drivers section. Can you test that?

Comment by c (c) - Friday, 15 September 2017, 18:59 GMT

I've been using VAAPI and all CPU cores for a few hours and there's no
error in dmesg. I'd say 4.13.2 without IOMMU is stable on my machine.

Comment by loqs (loqs) - Friday, 15 September 2017, 19:56 GMT

If someone affected could post a dmesg of an affected boot with the parameter drm.debug=0x1e log_buf_len=1M
to https://bugs.freedesktop.org/show_bug.cgi?id=100219 so upstream can confirm it is the same issue.

Comment by Gerrit Großkopf (kingcreole) - Friday, 15 September 2017, 20:13 GMT

thanks @c i disabled it, will test later :)
update: it finally worked, it restarted and i was finaly able to login again :)

Comment by c (c) - Saturday, 16 September 2017, 00:58 GMT

I saw a GPU hang running Firefox, but I don't know it's related
even though I think there have been sync object patches in 4.13.

[33898.274495] drm/i915: Resetting chip after gpu hang
[33900.194189] asynchronous wait on fence i915:[global]:13057f timed out

Comment by Harish (sitwano) - Saturday, 16 September 2017, 06:58 GMT

The arch wiki should definitely have a notice to disable IOMMU somewhere, preferably in the https://wiki.archlinux.org/index.php/intel_graphics page.

Comment by c (c) - Saturday, 16 September 2017, 14:53 GMT

@sitwano I never used IOMMU on Intel before. Is it really broken 4 out of 5 times?
I thought many people use VT-d to dedicate a GPU or NIC to a VM and it's not
some experimental/buggy feature, if the mainboard+BIOS is fine.
I mean I don't mind editing the wiki, but I'm surprised.

Comment by Harish (sitwano) - Saturday, 16 September 2017, 17:07 GMT

@c I'm a total noob and I don't know what any of the formalities are. I've had this happen to me since kernel 4.12.something and I'm running a thinkpad x250 laptop. If this issue is happening to me on a thinkpad then I assume it is also happening to a large group of people. Just my guess! It's never a terrible idea to suggest to turn the IOMMU off in the wiki if a user seems to be running into this same problem. I mean I would never have come across this fix if it was not for this bug report :/

Comment by So As (Archilious) - Sunday, 17 September 2017, 01:37 GMT

I have kernel 4.13.2-1 and things work smoothly but have this error message by dmesg:

[drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=46836 end=46837) time 447 us, min 763, max 767, scanline start 762, end 779
[10327.815484] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=48452 end=48453) time 385 us, min 763, max 767, scanline start 755, end 768
[10432.817354] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=54737 end=54738) time 481 us, min 763, max 767, scanline start 760, end 777
[10671.321949] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=69013 end=69014) time 369 us, min 763, max 767, scanline start 758, end 771
[11120.815604] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=95918 end=95919) time 452 us, min 763, max 767, scanline start 750, end 767
[11141.816256] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=97175 end=97176) time 181 us, min 763, max 767, scanline start 759, end 768
[11224.814967] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=102143 end=102144) time 201 us, min 763, max 767, scanline start 762, end 771
[11245.815758] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=103400 end=103401) time 202 us, min 763, max 767, scanline start 762, end 768
[11398.815188] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=112558 end=112559) time 457 us, min 763, max 767, scanline start 754, end 767
[11474.813774] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=117107 end=117108) time 281 us, min 763, max 767, scanline start 755, end 764
[12562.921421] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=182237 end=182238) time 340 us, min 763, max 767, scanline start 756, end 768
[13262.815423] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=224130 end=224131) time 421 us, min 763, max 767, scanline start 754, end 769
[13318.815706] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=227482 end=227483) time 207 us, min 763, max 767, scanline start 761, end 771
[13534.816570] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=240411 end=240412) time 204 us, min 763, max 767, scanline start 759, end 768
[13735.932096] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=252449 end=252450) time 396 us, min 763, max 767, scanline start 759, end 773
[14264.815898] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=284106 end=284107) time 349 us, min 763, max 767, scanline start 758, end 770

I'm on a Sony Vaio VPCEG series with intel core i3 2310M and 2nd gen intel integrated graphics using Cinnamon desktop.

Comment by c (c) - Sunday, 17 September 2017, 12:26 GMT

I think the GPU hang is the same issue from these and it happens primarily by Firefox.
If I kill and stop using Firefox, there are no GPU hangs after the first one happened,
or until I reboot.
https://bugs.freedesktop.org/show_bug.cgi?id=101237
https://bugs.freedesktop.org/show_bug.cgi?id=99720

Is there a kconfig option to disable sync objects? I didn't find it.

Comment by c (c) - Tuesday, 19 September 2017, 22:33 GMT

I don't have another 4.12 kernel than 4.12-ck at hand to test, but I ran heavy tasks
for more than a day, including GPU use, and none of the issues, not even the GPU
hangs happened with that. I'm thinking 4.13 DRM is in a bad state right now.
Looking through Greg's 4.13 stable queue, there's no DRM fixes so far but a long
list of XFS patches.

Comment by loqs (loqs) - Wednesday, 20 September 2017, 09:35 GMT

4.12 is now EOL https://lkml.org/lkml/2017/9/20/181 so the maintainer has to balance the impact of newly discovered issue that will not be fixed in 4.12 vs the documented breakage in this bug report.
Also https://bugs.archlinux.org/task/55629#comment161179 is still outstanding is everyone assuming it is the same issue?

Comment by Simon Wydooghe (HyperBaton) - Friday, 29 September 2017, 07:03 GMT

I seem to be experiencing something similar since upgrading to 4.13 yesterday. I'm running Wayland with GNOME. I can boot without a problem. For me it manifests as a sudden intermittent freezing. It will freeze for 30 seconds, then I have like 1 second to continue working. This pattern repeats a number of times until Wayland or gnome-shell crashes and I'm back at the GDM login screen. I think it's correlated to video playback. When I play a video in mpv, it seems to trigger the freezing. I've also turned off IOMMU now and so far no more freezes during video playback. Laptop is a Dell Latitude E5450 (CPU: Intel Core i5-5300U, graphics: Intel Corporation HD Graphics 5500 + NVIDIA Corporation GM108M [GeForce 830M]).

Comment by c (c) - Friday, 29 September 2017, 16:58 GMT

@HyperBaton the freezes are most likely GPU hangs which I've noticed
more frequently with 4.12 and 4.13. Even a completely IOMMU free
4.13.4 had occasional GPU freezes and I can confirm that I was using
VAAPI for a prolonged time while Firefox's GPU use triggered it
reliably. Quitting Firefox made it disappear but Firefox is just a
user of Mesa and DRM and can't be blamed. I think it's a combination
of Mesa and 4.12 or 4.13 DRM that provokes the bug.

You say you didn't see it without IOMMU, but I'm certain that IOMMU
helped increase chances of the bug and you're now merely less likely
to hit it.

The three bugzilla entries I posted above are all about this and it
seems the issue has only become more prominent with 4.12 and 4.13.

Comment by Tom Vincent (tlvince) - Friday, 29 September 2017, 17:04 GMT

Although X did not freeze entirely, I experienced "CPU pipe A FIFO underrun" resulting in intermittent screen flickering (i915/Intel 520) even with intel_iommu=off. Downgrading to 4.12 resolves the issue.

Comment by c (c) - Friday, 29 September 2017, 20:37 GMT

drm-tip 4.14 is not better. Same GPU hang error happens quickly by just running
Xorg and a VAAPI client. No firefox necessary. Back to 4.9-lts for now because
4.12 is EOL.

Comment by alexander sanoll (sonix07) - Saturday, 30 September 2017, 10:56 GMT

disabling VT-d is a workaround that fixed 2 of my computers. but no more KVM until this is fixed...
I think this is very likely related to ~~FS#55744~~ .

Comment by Simon Wydooghe (HyperBaton) - Saturday, 30 September 2017, 11:36 GMT

@c You might be correct, my entire graphic stack has had a tendency to crash at times before this, not very predictably.

Comment by John Lindgren (jlindgren) - Saturday, 30 September 2017, 17:19 GMT

Seems like the root cause of the issue(s) is still elusive, but for another data point, my Skylake system exhibited random screen flickering starting with 4.13.x (was fine with 4.12.10 and earlier). Booting with "intel_iommu=off" seems to solve the problem for me as well.

Comment by c (c) - Saturday, 30 September 2017, 21:17 GMT

Disaling VT-d was a great idea. Didn't test plain 4.13.4 yet but linux-zen-4.13.4 hasn't
run into any GPU errors yet after two hours of concurrent VAAPI use and heavy CPU
utilization. Seems that merely disabling IOMMU in the kernel config isn't as effective
as disabling it in BIOS.

Comment by c (c) - Saturday, 30 September 2017, 21:21 GMT

> but no more KVM until this is fixed...

@sonix07 you might know this but to be safe: VT-d is only needed for
KVM if you want to share your physical devices with a VM. The VMM
only needs VT-x. In /proc/cpuinfo it's the vmx flag.

Comment by c (c) - Sunday, 01 October 2017, 18:51 GMT

linux-zen-4.13.4 hasn't exhibited any GPU errors after almost a day of
heavy GPU and CPU utilization, including VAAPI.

I booted custom 4.13.4 vanilla (kconfig disabled IOMMU completely) and it didn't
take an hour before using VAAPI and browsers like Firefox and Chrome caused
GPU errors.

It seems that disabling IOMMU in the kernel isn't a good idea, but keeping
it on and having VT-d disabled in the BIOS works. This is naturally just
a stupid workaround because disabling IOMMU in the kernel should not
cause problems, especially when IOMMU isn't available (BIOS switch).

4.13 and currint drm-tip are in pretty bad shape.

Firefox:
[drm] GPU HANG: ecode 6:0:0x80202f7b, in Compositor [2620], reason: Hang on rcs0, action: reset
[drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
drm/i915: Resetting chip after gpu han

Chrome:
drm/i915: Resetting chip after gpu hang
asynchronous wait on fence i915:[global]:a4255 timed out
drm/i915: Resetting chip after gpu hang

Like I said above applications will use Mesa and Xorg as the API provides
and it's a bug in the graphics stack. If Chrome or Firefox or mpv or
fmmpeg (both when using VAAPI) would do something wrong, the API will
return an error, avoiding GPU hangs. If you can cause a GPU hang, then
this is a local DoS, locking up the desktop for seconds.

Comment by c (c) - Sunday, 01 October 2017, 21:49 GMT

4.14 is the next stable kernel and so far drm-tip (which is 4.14 right now)
contains no remedies.

Has anyone used VT-d on Intel Sandybridge or newer with zero driver issues?
I never had a need and ask myself if this is a new string of regressions
or whether it has always been a lottery.

Comment by loqs (loqs) - Sunday, 01 October 2017, 22:51 GMT

@c which upstream bug report are you posting information such as the issue is still present in drm-tip on?
Have you opened separate reports for each of your issues?

Comment by c (c) - Monday, 02 October 2017, 01:02 GMT

I tested drm-tip just to be sure it's not fixed upstream yet, which could
have been useful to move this ticket forward. The bugs are those linked
in this ticket. Sorry I can't be more involved with the debugging process.

I understand why you might assume it's more than one issue, and it might be
multiple bugs in combination causing problems, but they all are related
to VT-d somehow and as a user of the graphics stack it's all the same bug,
if we exclude the FIFO underrun which is fixed by disabling VT-d. Which then
leaves us with 4.13+ being more likely to hang the GPU than previous kernels.
What is interesting is that, like I found, if you disable IOMMU in the
kernel and VT-d in the BIOS, then that kernel will still provoke hangs,
while a kernel with IOMMU activated but VT-d disabled does not. I find
that the most interesting result so far.

Comment by c (c) - Monday, 02 October 2017, 02:12 GMT

Related https://bugs.archlinux.org/task/55789

Comment by loqs (loqs) - Monday, 02 October 2017, 19:25 GMT

@c It is an interesting finding but how is reporting it here useful?
Issue has 21 votes so I assume 21 affected individuals but no comment is linked to an upstream report from an arch user.
Perhaps closing this bug report as an upstream issue would encourage reporting upstream instead.

Comment by Pierre Durand (Pierrre) - Tuesday, 03 October 2017, 08:08 GMT

I think my problem is the same.
My computer freezes randomly after update to Linux 4.13.3

Dell Latitude E5550
Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz

oct. 02 18:12:27 pierre-dell-latitude kernel: DMAR: DRHD: handling fault status reg 3
oct. 02 18:12:27 pierre-dell-latitude kernel: DMAR: [DMA Write] Request device [00:02.0] fault addr 19e000 [fault reason 23] Unknown
oct. 02 18:12:35 pierre-dell-latitude kernel: [drm] GPU HANG: ecode 8:0:0x85dffffb, in Xwayland [749], reason: Hang on rcs0, action: reset
oct. 02 18:12:35 pierre-dell-latitude kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
oct. 02 18:12:35 pierre-dell-latitude kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
oct. 02 18:12:35 pierre-dell-latitude kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
oct. 02 18:12:35 pierre-dell-latitude kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
oct. 02 18:12:35 pierre-dell-latitude kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
oct. 02 18:12:35 pierre-dell-latitude kernel: drm/i915: Resetting chip after gpu hang
oct. 02 18:12:46 pierre-dell-latitude sudo[18258]: pam_unix(sudo:session): session closed for user root
oct. 02 18:13:17 pierre-dell-latitude kernel: DMAR: DRHD: handling fault status reg 3
oct. 02 18:13:17 pierre-dell-latitude kernel: DMAR: [DMA Write] Request device [00:02.0] fault addr 47a7000 [fault reason 23] Unknown
oct. 02 18:13:25 pierre-dell-latitude kernel: drm/i915: Resetting chip after gpu hang
oct. 02 18:13:33 pierre-dell-latitude kernel: drm/i915: Resetting chip after gpu hang
oct. 02 18:13:36 pierre-dell-latitude kernel: asynchronous wait on fence i915:gnome-shell[725]/1:493a timed out
oct. 02 18:13:41 pierre-dell-latitude kernel: drm/i915: Resetting chip after gpu hang
oct. 02 18:13:44 pierre-dell-latitude kernel: asynchronous wait on fence i915:gnome-shell[725]/1:493b timed out
oct. 02 18:13:49 pierre-dell-latitude kernel: drm/i915: Resetting chip after gpu hang

Then my computer freezes.

I've attached the crash dump

crash.txt (34.6 KiB)

Comment by Jonas Platte (jP_wanN) - Tuesday, 03 October 2017, 11:25 GMT

Same issue here with i7-5500U / HD 5500 graphics. Doesn't seem to effect i7-5775C / Intel Iris Pro 6200 graphics. intel_iommu=igfx_off seems to fix it.

However, I'm a little bit confused by this line in the report:

-Start X server and wait 30-40 seconds. Laptop will freeze and require a restart.

This sounds like you get a graphical output for a short while. For me, it locks up immediately when I try to start X (unless, curiously, I had wayland running beforehand).

@c: Regarding the Firefox issues when just iommu is disabled and not VT-d: That might be even more hardware specific. I have Firefox (nightly) running almost all the time when my laptop is turned on but I haven't had any problems before the 4.13 upgrade, or after turning off iommu.

Comment by Mike C (ggg377) - Tuesday, 03 October 2017, 12:13 GMT

Same here with Intel i5-5200U HD5500. In my 2 years of using Arch I have not dealt with a broken kernel out of the box. I will apply the iommu fix (or keep using LTS) and hope this gets fixed soon. Also, I upgraded with pacman and the kernel crashed pacman, wiping clear a large part of my /usr/lib directory, causing systemd, dbus and a lot of other things to stop working. I was lucky I had backups as I had to do a clean reinstall. Unfortunately I can't provide more details as I wanted to get on with my work.

Comment by Eric Blau (eblau) - Tuesday, 03 October 2017, 13:25 GMT

I'm hitting this same issue since upgrading to 4.13.3-1. I've reported the bug upstream:

https://bugs.freedesktop.org/show_bug.cgi?id=103076

The response from upstream was to disable iommu:

DMAR and death is nothing new, see bug 89360. Standard practice is to disable iommu, with intel_iommu=igfx_off.

Running with intel_iommu=igfx_off solves the problem for me. I get almost an immediately lockup in X without the option. With the option, my laptop runs normally.

Comment by Vladimir Krivopalov (Argenet) - Thursday, 05 October 2017, 04:42 GMT

I've run into the same issue, tried adding intel_iommu=off to grub but the issue still occurs, although it's become more intermittent.
Downgrading to 4.12 helps.

Comment by c (c) - Friday, 06 October 2017, 02:04 GMT

It took more than two hours of heavy CPU and GPU utilization, but I was able to trigger the
GPU errors discussed above with BIOS-IOMMU=off and intel_iommu=igfx_off on 4.13.5, which
validates my claim that IOMMU only makes it easier to trigger and there are bigger
bugs in 4.13.5.

An anecdote on my experience with intel-drm over the last two years:
Ever since atomic modesetting started in 4.2, the DRM stack has gotten more
regressive, which is funny since before I never thought bout intel-drm at all.
It all used to work, no errors, no tearing (started with Sandybridge and
solved only with native Wayland or xf86-video-intel ddx in TearFree mode;
no, generic modesetting driver and glamor for that matter isn't tear free yet).
4.13 is wild with GPU hangs, fence timeouts and atomic-ms crashes :-).
One of the 4.13.5 GPU hangs today credited systemd-login, which I think means
it was mpv owned by logind, which owns the Xorg session. Something else than the
usual Firefox or Chrome compositor.

4.14 (maybe even 4.9?) will be extended-lts (4+ years) releases, by the way.

Comment by loqs (loqs) - Friday, 06 October 2017, 10:21 GMT

linux 4.13.5-1 reverts CONFIG_INTEL_IOMMU_DEFAULT_ON=y to # CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
This should fix the issue for most affected systems
@c as this does not resolve your issue please report the issue upstream.

Comment by loqs (loqs) - Friday, 06 October 2017, 23:21 GMT

@Lindows can you confirm with linux 4.13.5-1 currently in testing that behaviour has been restored to the same level as 4.12?
So this FS can be closed as what remains would no longer be a packing and integration issue.

Comment by Sledge Sulaweyo (sulaweyo) - Sunday, 08 October 2017, 11:38 GMT

4.13.5-1 solves the issue for me as well

	Tasks related to this task (0)

Duplicate tasks of this task (2)
~~FS#55744 - [linux] 4.13 hard freezes the computer with xorg and nouveau~~
~~FS#55934 - [linux] Kernel crash when starting X (startx) with kernel 4.13.~~

Arch Linux

FS#55629 - [linux] Intel i915 driver issue in kernel 4.13 requiring restart.

Details

Loading...