FS#53657 - [linux] 4.10.x periodically crashes on Intel Broadwell Graphics (Gen8, Intel HD Graphics 5500)

Attached to Project: Arch Linux
Opened by Vasily Khoruzhick (anarsoul) - Wednesday, 12 April 2017, 23:10 GMT
Last edited by Jan de Groot (JGC) - Friday, 31 May 2019, 06:49 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 9
Private No

Details

Description:

linux 4.10.x periodically crashes on hardware with Intel Broadwell Graphics with following backtrace:

[71880.979021] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[71880.979077] IP: gen8_ppgtt_alloc_page_directories.isra.14+0x11f/0x270 [i915]
[71880.979103] PGD 0

[71880.979120] Oops: 0002 [#1] PREEMPT SMP
[71880.979135] Modules linked in: tun ctr ccm fuse ecryptfs cbc encrypted_keys cmac rfcomm hid_generic cdc_ether usbnet r8152 mii bnep snd_usb_audio usbhid snd_usbmidi_lib snd_rawmidi snd_seq_device hid uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev btusb btrtl btbcm media btintel bluetooth snd_hda_codec_hdmi lz4 lz4_compress joydev mousedev arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp mei_wdt kvm_intel kvm iTCO_wdt iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel iwlmvm ghash_clmulni_intel pcbc mac80211 i915 snd_hda_codec_realtek snd_hda_codec_generic aesni_intel iwlwifi e1000e drm_kms_helper psmouse aes_x86_64 nls_iso8859_1 crypto_simd snd_hda_intel glue_helper snd_hda_codec nls_cp437 cryptd evdev intel_cstate input_leds
[71880.979412] intel_rapl_perf drm thinkpad_acpi cfg80211 vfat snd_hda_core mac_hid fat pcspkr snd_hwdep intel_gtt snd_pcm nvram syscopyarea intel_pch_thermal rfkill mei_me sysfillrect ptp snd_timer sysimgblt snd fb_sys_fops i2c_i801 i2c_algo_bit soundcore mei lpc_ich pps_core shpchp thermal led_class wmi tpm_tis intel_rst tpm_tis_core ac battery fjes video button sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) sg ip_tables x_tables ext4 crc16 jbd2 fscrypto mbcache sd_mod serio_raw atkbd libps2 ahci libahci libata xhci_pci ehci_pci xhci_hcd scsi_mod ehci_hcd usbcore usb_common i8042 serio trusted tpm
[71880.979630] CPU: 1 PID: 1450 Comm: chromium Tainted: G U O 4.10.8-1-ARCH #1
[71880.979659] Hardware name: LENOVO 20BS0032US/20BS0032US, BIOS N14ET37W (1.15 ) 09/06/2016
[71880.979688] task: ffff8801d5e79c80 task.stack: ffffc90003e14000
[71880.979721] RIP: 0010:gen8_ppgtt_alloc_page_directories.isra.14+0x11f/0x270 [i915]
[71880.979758] RSP: 0018:ffffc90003e17890 EFLAGS: 00010286
[71880.979779] RAX: ffff880102424000 RBX: 0000000000008000 RCX: 0000000000000003
[71880.979804] RDX: 0000000000000000 RSI: ffff8801f9b8c000 RDI: ffff880219370000
[71880.979830] RBP: ffffc90003e178e8 R08: 0000000000000000 R09: 0000000000000000
[71880.979855] R10: 0000000000000000 R11: 0000000000000040 R12: ffff8801d5f86000
[71880.979880] R13: ffff880010252f10 R14: 0000000000000003 R15: 00000000fff6f000
[71880.979906] FS: 00007f438936ea80(0000) GS:ffff88022dc40000(0000) knlGS:0000000000000000
[71880.979943] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[71880.979964] CR2: 0000000000000018 CR3: 00000001d5f3c000 CR4: 00000000003406e0
[71880.979990] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[71880.980015] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[71880.980040] Call Trace:
[71880.980064] gen8_alloc_va_range_3lvl+0xf7/0x9c0 [i915]
[71880.980086] ? swiotlb_map_sg_attrs+0x53/0x120
[71880.980123] gen8_alloc_va_range+0x256/0x490 [i915]
[71880.980155] i915_vma_bind+0xab/0x1a0 [i915]
[71880.980182] __i915_vma_do_pin+0x2a5/0x450 [i915]
[71880.980210] i915_gem_execbuffer_reserve_vma.isra.8+0x144/0x1b0 [i915]
[71880.980243] i915_gem_execbuffer_reserve.isra.9+0x39e/0x3d0 [i915]
[71880.980274] i915_gem_do_execbuffer.isra.15+0x62e/0x1810 [i915]
[71880.980307] ? reservation_object_get_fences_rcu+0x119/0x290
[71880.980339] ? i915_gem_object_wait_reservation+0x200/0x2d0 [i915]
[71880.980371] i915_gem_execbuffer2+0xc5/0x240 [i915]
[71880.980395] drm_ioctl+0x21b/0x4c0 [drm]
[71880.980419] ? i915_gem_execbuffer+0x310/0x310 [i915]
[71880.980439] ? __seccomp_filter+0x67/0x2a0
[71880.980456] ? __vfs_read+0xe1/0x130
[71880.980481] do_vfs_ioctl+0xa3/0x5f0
[71880.980497] ? __fget+0x77/0xb0
[71880.980511] SyS_ioctl+0x79/0x90
[71880.980525] do_syscall_64+0x54/0xc0
[71880.980540] entry_SYSCALL64_slow_path+0x25/0x25
[71880.980558] RIP: 0033:0x7f437e9620d7
[71880.980572] RSP: 002b:00007fff9195f0f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[71880.980600] RAX: ffffffffffffffda RBX: 000026c06f06e000 RCX: 00007f437e9620d7
[71880.980625] RDX: 00007fff9195f140 RSI: 00000000c0406469 RDI: 000000000000009a
[71880.980655] RBP: 00007fff9195f140 R08: 0000000000000000 R09: 0000000000000000
[71880.980685] R10: 0000000000000038 R11: 0000000000000246 R12: 00000000c0406469
[71880.980711] R13: 000000000000009a R14: 0000000000000000 R15: 0000000000000000
[71880.980736] Code: 49 8b bc 24 d8 02 00 00 48 89 c6 48 89 45 c0 48 8b 52 08 48 83 ca 03 e8 70 e0 ff ff 48 8b 45 b0 48 8b 4d c8 48 8b 10 48 8b 45 c0 <48> 89 04 ca 48 8b 45 d0 48 0f ab 08 0f 1f 44 00 00 e9 4d ff ff
[71880.980841] RIP: gen8_ppgtt_alloc_page_directories.isra.14+0x11f/0x270 [i915] RSP: ffffc90003e17890
[71880.980876] CR2: 0000000000000018
[71880.989995] ---[ end trace 15df532a83aeed3d ]---

It appears to be upstream bug https://bugs.freedesktop.org/show_bug.cgi?id=99295 which is fixed in drm-intel-next branch, but wasn't backported to any stable release.

Please consider applying patch from https://bugs.freedesktop.org/show_bug.cgi?id=99295#c22 as a temporary fix.

Additional info:
* package version(s)
* config and/or log files etc.


Steps to reproduce:

There're no specific steps, it crashes once in a day.
This task depends upon

Closed by  Jan de Groot (JGC)
Friday, 31 May 2019, 06:49 GMT
Reason for closing:  Fixed
Additional comments about closing:  By request:
As Broadwell (xps 9343) owner, in the past we have had some kernel version with problems. Now they seem gone. Plus closed upstream
Comment by Jill (KokaKiwi) - Monday, 24 April 2017, 13:16 GMT
Hi, any update for this issue? I have the same problem on my computer and it's actually the third times today it crashed...
Comment by Vasily Khoruzhick (anarsoul) - Monday, 24 April 2017, 21:29 GMT
4.10.11-1 is also affected.
Comment by Manuel Mazzuola (originof) - Wednesday, 26 April 2017, 10:39 GMT
Same problem here on a DELL XPS 13 9350, Iris Graphics 540 (rev 0a) and kernel 4.10.10-1-ARCH.
Anyone have the pixel saver gnome extension installed and enabled?

https://pastebin.com/bYGigvyA
Comment by anotherbugmaster (anotherbugmaster) - Thursday, 27 April 2017, 08:34 GMT
Same thing here, Lenovo X250, kernel 4.10.11-1-ARCH
Comment by Eric Blau (eblau) - Thursday, 27 April 2017, 15:12 GMT
I'm hitting this often. linux-lts (4.9.x stream currently) doesn't hit this issue. I'm downgrading for now until the fix is in a release.
Comment by Mathieu Clabaut (mathieu.clabaut) - Friday, 28 April 2017, 07:50 GMT
Same problem on 4.10.11 after installing the pixel saver gnome extension
Comment by Daniel Playfair Cal (hedgepigdaniel) - Tuesday, 16 May 2017, 05:43 GMT
Happenning to me on a Dell XPS 15 9560
- Kaby lake
- options i915 enable_fbc=1 enable_psr=1 disable_power_well=0
- Not sure what causes it, I was moving the mouse at the time
Comment by Daniel Playfair Cal (hedgepigdaniel) - Tuesday, 16 May 2017, 05:45 GMT Comment by Bart Willems (hersenbeuker) - Tuesday, 06 June 2017, 08:12 GMT
Same problem here, kernel 4.11.3-1, Macbook Air 7,2
Comment by JD Bothma (jbothma) - Monday, 12 June 2017, 13:24 GMT
Ah this is killing me. It's now happening many times a day. 4.11.3-1-ARCH #1 SMP PREEMPT

Is this the patch to apply? https://bugs.freedesktop.org/attachment.cgi?id=131404

I can't find drivers/gpu/drm/i915 in the source downloaded according to https://wiki.archlinux.org/index.php/Kernels/Arch_Build_System. Should I follow https://wiki.archlinux.org/index.php/Compile_kernel_module instead?

In the meantime I'm trying out aur/linux-mainline 4.12rc5-1 (195) (4.23) - hope I'm not shooting myself in the foot! But I need to get work done :/
Comment by Vasily Khoruzhick (anarsoul) - Monday, 12 June 2017, 19:52 GMT
I'd suggest installing linux-lts from core as a workaround. It's 4.9 and isn't affected by this bug.
Comment by JD Bothma (jbothma) - Tuesday, 13 June 2017, 11:15 GMT
Thanks Vasily. Finally figured out how to update my boot loader to use the linux-lts kernel so hopefully I'll be ok now.

Loading...