FS#50619 - [linux] i915 Skyle driver freeze with kernl 4.7.2

Attached to Project: Arch Linux
Opened by Sebastien Bariteau (numkem) - Wednesday, 31 August 2016, 18:15 GMT
Last edited by Jan de Groot (JGC) - Friday, 20 October 2017, 09:22 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description: After upgrading to kernel 4.7.2, the second that I login in my WM (i3), I get a full screen freeze on my Dell XPS 13.

lspci:
00:00.0 Host bridge: Intel Corporation Skylake Host Bridge/DRAM Registers (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Iris Graphics 540 (rev 0a)
00:04.0 Signal processing controller: Intel Corporation Skylake Processor Thermal Subsystem (rev 09)
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21)
00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21)
00:17.0 SATA controller: Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] (rev 21)
00:1c.0 PCI bridge: Intel Corporation Device 9d10 (rev f1)
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)
00:1c.5 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #6 (rev f1)
00:1d.0 PCI bridge: Intel Corporation Device 9d18 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller (rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
01:00.0 PCI bridge: Intel Corporation DSL5110 Thunderbolt Bridge [Falcon Ridge LP 2014]
02:00.0 PCI bridge: Intel Corporation DSL5110 Thunderbolt Bridge [Falcon Ridge LP 2014]
02:01.0 PCI bridge: Intel Corporation DSL5110 Thunderbolt Bridge [Falcon Ridge LP 2014]
02:02.0 PCI bridge: Intel Corporation DSL5110 Thunderbolt Bridge [Falcon Ridge LP 2014]
39:00.0 USB controller: Intel Corporation Device 15b5
3a:00.0 Network controller: Broadcom Corporation BCM4350 802.11ac Wireless Network Adapter (rev 08)
3b:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)
3c:00.0 Non-Volatile memory controller: Toshiba America Info Systems Device 010f (rev 01)

Driver crash in journalctl:
kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
kernel: IP: [< (null)>] (null)
kernel: PGD 464b96067 PUD 464b95067 PMD 0
kernel: Oops: 0010 [#1] PREEMPT SMP
kernel: Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt
kernel: snd_compress dell_laptop dell_smbios dcdbas snd_pcm_dmaengine kvm ac97_bus brcmfmac brcmutil irqbypass snd_hda_intel rtsx_pci_ms crct10dif_pclmul crc32_pclmul cfg
kernel: acpi_pad mac_hid ac sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) ip_tables x_tables xfs libcrc32c crc32c_generic rtsx_pci_sdmmc mmc_cor
kernel: CPU: 3 PID: 930 Comm: Xorg Tainted: G U O 4.7.2-1-ARCH #1
kernel: Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.4 06/14/2016
kernel: task: ffff88046762db80 ti: ffff880464b80000 task.ti: ffff880464b80000
kernel: RIP: 0010:[<0000000000000000>] [< (null)>] (null)
kernel: RSP: 0018:ffff880464b83af0 EFLAGS: 00010282
kernel: RAX: ffff880464b83bb8 RBX: ffff8804675e0480 RCX: b787eed8fa6ceafd
kernel: RDX: 00000000fffff075 RSI: ffff880469d122d0 RDI: ffff8804675e0e40
kernel: RBP: ffff880464b83b78 R08: ffff880469d13578 R09: ffff8804675e0e40
kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8804675e0e40
kernel: R13: 0000000000000000 R14: ffff880469d13578 R15: ffff880469d122d0
kernel: FS: 00007f8ea26a9940(0000) GS:ffff88047ed80000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000000 CR3: 00000004646cc000 CR4: 00000000003406e0
kernel: Stack:
kernel: ffffffffa01356fe ffff880469d10000 ffff880469d122d0 ffff880464b83bb8
kernel: ffff880452764240 00000001fffff075 ffff8804675e0480 ffff880456822f00
kernel: ffff880464b83b40 ffffffffa01385ad ffff880464b83b80 000000007c6e955c
kernel: Call Trace:
kernel: [<ffffffffa01356fe>] ? i915_gem_object_sync+0x1ae/0x330 [i915]
kernel: [<ffffffffa01385ad>] ? i915_gem_object_pin+0x2d/0x30 [i915]
kernel: [<ffffffffa0147ab0>] intel_execlists_submission+0x1d0/0x440 [i915]
kernel: [<ffffffffa0127422>] i915_gem_do_execbuffer.isra.14+0x892/0x12a0 [i915]
kernel: [<ffffffff81579e71>] ? unix_stream_read_generic+0x281/0x8a0
kernel: [<ffffffffa0128b48>] i915_gem_execbuffer2+0xe8/0x250 [i915]
kernel: [<ffffffffa0136399>] ? i915_gem_busy_ioctl+0xc9/0x100 [i915]
kernel: [<ffffffffa00209a2>] drm_ioctl+0x152/0x540 [drm]
kernel: [<ffffffffa0128a60>] ? i915_gem_execbuffer+0x330/0x330 [i915]
kernel: [<ffffffff81217e07>] ? __fget+0x77/0xb0
kernel: [<ffffffff8120cd72>] do_vfs_ioctl+0xa2/0x5d0
kernel: [<ffffffff81217e07>] ? __fget+0x77/0xb0
kernel: [<ffffffff8120d319>] SyS_ioctl+0x79/0x90
kernel: [<ffffffff815de7b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
kernel: Code: Bad RIP value.
kernel: RIP [< (null)>] (null)
kernel: RSP <ffff880464b83af0>
kernel: CR2: 0000000000000000
kernel: ---[ end trace bb0f04a09880d6a1 ]---


Steps to reproduce:
Upgrade with to the latest kernel and kernel-headers (4.7.2) with Intel driver. Crashes every time after login screen (lightdm).

I'm using TLP with the default settings and no other special settings for the i915 driver itself.
This task depends upon

Closed by  Jan de Groot (JGC)
Friday, 20 October 2017, 09:22 GMT
Reason for closing:  No response
Comment by Z.Shang (zshang) - Thursday, 01 September 2016, 02:40 GMT
Same here on ThinkPad X1 Yoga

I don't use dm, startx leads to the exactly same result as yours
sudo startx says:
"modprobe: FATAL: Module i915 not found in directory /lib/modules/4.7.1-1-ARCH
modprobe: FATAL: Module fbcon not found in directory /lib/modules/4.7.1-1-ARCH"

but I only have 4.7.2-1-ARCH and extramodules-4.7-ARCH in /lib/modules

tried ln -s /lib/modules/4.7.2-1-ARCH /lib/modules/4.7.1-1-ARCH
and sudo startx says:
"ERROR: could not insert 'i915': Invalid argument"

also

battery is not found by i3status
dmesg | grep input shows only the keyboard
and nmcli / ip shows only the lo device
Comment by Doug Newgard (Scimmia) - Thursday, 01 September 2016, 02:54 GMT
zshang, completely different issue (and not a bug). Please don't hijack tickets, seek support on the forums, IRC, or mailing list.
Comment by Z.Shang (zshang) - Thursday, 01 September 2016, 03:22 GMT
Scimmia,
the problem appeared right after I upgraded my kernel to 4.7.2 and the behavior (screen freeze) is exactly the same.
Even if it is not the same issue / bug, I believe it is somewhat related to this one.
Comment by Doug Newgard (Scimmia) - Thursday, 01 September 2016, 03:32 GMT
It is not. Again, seek help elsewhere.
Comment by Victor Trac (victortrac) - Sunday, 04 September 2016, 18:43 GMT
Doug: I had this issue on my XPS 13 (9350) as well. I managed to fix it by:
* removing xf86-video-intel
* removing the intel xorg config in /etc/X11/xorg.conf.d/
* removing any i915 module options in /etc/modprobe.d/i915.conf (commented out all lines)
* removing any kernel pararms for i915 in /boot/
* rebuilding initramfs
Comment by Martin Schmidt (Blind) - Tuesday, 14 February 2017, 19:14 GMT
I am seeing similar crashes with my Dell XPS 13 9530.
Kernel 4.9.8

I think this is a kernel driver issue with i915.

There is a bug report on bugs.freedesktop.org, bug # 98528

Log of the newest hang:
Feb 14 09:10:26 rough kernel: perf: interrupt took too long (4911 > 4898), lowering kernel.perf_event_max_sample_rate to 40500
Feb 14 11:09:11 rough kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in Xorg [1423], reason: Hang on render ring, action: reset
Feb 14 11:09:11 rough kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Feb 14 11:09:11 rough kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Feb 14 11:09:11 rough kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Feb 14 11:09:11 rough kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Feb 14 11:09:11 rough kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Feb 14 11:09:11 rough kernel: drm/i915: Resetting chip after gpu hang
Feb 14 11:09:11 rough kernel: [drm] RC6 on
Feb 14 11:09:11 rough kernel: [drm] GuC firmware load skipped
Feb 14 11:09:23 rough kernel: drm/i915: Resetting chip after gpu hang
Feb 14 11:09:23 rough kernel: [drm] RC6 on
Feb 14 11:09:24 rough kernel: [drm] GuC firmware load skipped
Feb 14 11:09:31 rough kernel: drm/i915: Resetting chip after gpu hang
Feb 14 11:09:32 rough kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
Feb 14 11:09:32 rough kernel: IP: [<ffffffffa06ee8a3>] reset_common_ring+0xc3/0x170 [i915]
Feb 14 11:09:32 rough kernel: PGD 0
Feb 14 11:09:32 rough kernel:
Feb 14 11:09:32 rough kernel: Oops: 0000 [#1] PREEMPT SMP
Feb 14 11:09:32 rough kernel: Modules linked in: cdc_mbim cdc_wdm snd_usb_audio snd_usbmidi_lib cdc_ncm snd_rawmidi usbnet mii snd_seq_device hid_generic veth msr fuse rfcomm ctr ccm ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc bnep snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat fat arc4 hid_multitouch iwlmvm mac80211 dell_led snd_hda_codec_realtek snd_hda_codec_generic snd_soc_skl iTCO_wdt iTCO_vendor_support snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core i2c_designware_platform i2c_designware_core iwlwifi snd_soc_sst_match snd_soc_core snd_compress snd_pcm_dmaengine ac97_bus rtsx_pci_ms cfg80211 memstick intel_rapl dell_laptop x86_pkg_temp_thermal
Feb 14 11:09:32 rough kernel: intel_powerclamp coretemp dell_wmi dell_smbios dcdbas kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd intel_cstate intel_rapl_perf snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore i2c_i801 i2c_smbus idma64 shpchp i915 joydev input_leds evdev led_class mousedev mac_hid drm_kms_helper btusb btrtl drm hci_uart mei_me intel_gtt btbcm btqca syscopyarea sysfillrect mei btintel sysimgblt processor_thermal_device fb_sys_fops i2c_algo_bit intel_lpss_pci intel_pch_thermal intel_soc_dts_iosf fan i2c_hid bluetooth thermal wmi uvcvideo rfkill crc16 video intel_lpss_acpi videobuf2_vmalloc intel_lpss videobuf2_memops videobuf2_v4l2 intel_hid videobuf2_core int3403_thermal battery
Feb 14 11:09:32 rough kernel: int340x_thermal_zone sparse_keymap acpi_als int3400_thermal acpi_thermal_rel kfifo_buf videodev fjes industrialio media button tpm_tis tpm_tis_core ac acpi_pad tpm sch_fq_codel ip_tables x_tables hid_logitech_hidpp btrfs xor hid_logitech_dj usbhid hid raid6_pq rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 crc32c_intel ahci xhci_pci libahci rtsx_pci xhci_hcd libata scsi_mod usbcore usb_common i8042 serio nvme nvme_core
Feb 14 11:09:32 rough kernel: CPU: 2 PID: 4906 Comm: kworker/2:2 Tainted: G U 4.9.8-1-ARCH #1
Feb 14 11:09:32 rough kernel: Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.13 12/28/2016
Feb 14 11:09:32 rough kernel: Workqueue: events_long i915_hangcheck_elapsed [i915]
Feb 14 11:09:32 rough kernel: task: ffff880362745880 task.stack: ffffc90003ca4000
Feb 14 11:09:32 rough kernel: RIP: 0010:[<ffffffffa06ee8a3>] [<ffffffffa06ee8a3>] reset_common_ring+0xc3/0x170 [i915]
Feb 14 11:09:32 rough kernel: RSP: 0018:ffffc90003ca7b50 EFLAGS: 00010286
Feb 14 11:09:32 rough kernel: RAX: 0000000000000000 RBX: ffff8801976d4fc0 RCX: 0000000000000001
Feb 14 11:09:32 rough kernel: RDX: 0000000000000004 RSI: 0000000000000206 RDI: 0000000000000206
Feb 14 11:09:32 rough kernel: RBP: ffffc90003ca7b70 R08: ffff880365fe0928 R09: ffff880365fe07a8
Feb 14 11:09:32 rough kernel: R10: ffffea000ca50c00 R11: 00000000000004f7 R12: ffff880365fe2968
Feb 14 11:09:32 rough kernel: R13: 0000000000000000 R14: ffff880365fe0000 R15: ffff880365fe2c10
Feb 14 11:09:32 rough kernel: FS: 0000000000000000(0000) GS:ffff88047ed00000(0000) knlGS:0000000000000000
Feb 14 11:09:32 rough kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 14 11:09:32 rough kernel: CR2: 0000000000000070 CR3: 0000000001a07000 CR4: 00000000003426e0
Feb 14 11:09:32 rough kernel: Stack:
Feb 14 11:09:32 rough kernel: ffff880365fe2968 ffff8801976d4fc0 ffff880365fe8958 ffff880365fe0000
Feb 14 11:09:32 rough kernel: ffffc90003ca7bb8 ffffffffa06d938a 00000000ffffff01 ffff880350a56400
Feb 14 11:09:32 rough kernel: ffff880365fe0000 ffff880365feaa18 ffffffff81606860 ffff880365feaa18
Feb 14 11:09:32 rough kernel: Call Trace:
Feb 14 11:09:32 rough kernel: [<ffffffffa06d938a>] i915_gem_reset+0x15a/0x280 [i915]
Feb 14 11:09:32 rough kernel: [<ffffffff81606860>] ? __wait_on_bit_lock+0xc0/0xc0
Feb 14 11:09:32 rough kernel: [<ffffffffa069c18d>] i915_reset+0x8d/0xe0 [i915]
Feb 14 11:09:32 rough kernel: [<ffffffffa069f42d>] i915_reset_and_wakeup+0xfd/0x180 [i915]
Feb 14 11:09:32 rough kernel: [<ffffffffa06a53aa>] i915_handle_error+0x10a/0x5f0 [i915]
Feb 14 11:09:32 rough kernel: [<ffffffffa06a5af1>] i915_hangcheck_elapsed+0x261/0x570 [i915]
Feb 14 11:09:32 rough kernel: [<ffffffff810a6978>] ? finish_task_switch+0x78/0x1f0
Feb 14 11:09:32 rough kernel: [<ffffffff81098aa5>] process_one_work+0x1e5/0x470
Feb 14 11:09:32 rough kernel: [<ffffffff81098d78>] worker_thread+0x48/0x4e0
Feb 14 11:09:32 rough kernel: [<ffffffff81098d30>] ? process_one_work+0x470/0x470
Feb 14 11:09:32 rough kernel: [<ffffffff81098d30>] ? process_one_work+0x470/0x470
Feb 14 11:09:32 rough kernel: [<ffffffff8109e909>] kthread+0xd9/0xf0
Feb 14 11:09:32 rough kernel: [<ffffffff8102d9f2>] ? __switch_to+0x572/0x630
Feb 14 11:09:32 rough kernel: [<ffffffff8109e830>] ? kthread_park+0x60/0x60
Feb 14 11:09:32 rough kernel: [<ffffffff8160ab15>] ret_from_fork+0x25/0x30
Feb 14 11:09:32 rough kernel: Code: 41 5e 5d c3 41 8b 44 24 28 b9 01 00 00 00 ba 00 00 ff ff 4c 89 f7 8d b0 a0 03 00 00 41 ff 96 80 07 00 00 4d 8b ac 24 68 02 00 00 <49> 8b 45 70 48 39 43 70 74 51 4d 85 ed 74 14 48 c7 c0 50 e6 48
Feb 14 11:09:32 rough kernel: RIP [<ffffffffa06ee8a3>] reset_common_ring+0xc3/0x170 [i915]
Feb 14 11:09:32 rough kernel: RSP <ffffc90003ca7b50>
Feb 14 11:09:32 rough kernel: CR2: 0000000000000070
Feb 14 11:09:32 rough kernel: ---[ end trace 0c9eeeb99502cbd2 ]---
Feb 14 11:09:32 rough kernel: BUG: unable to handle kernel paging request at 000000004eec5500
Feb 14 11:09:32 rough kernel: IP: [<ffffffff810c3c4b>] __wake_up_common+0x2b/0x80

The recipe to get the /sys/class/drm/card0/error does not work, because the laptop has to be hard reset.

In an attempt to avoid these crashes, I now have set enable_rc6=0 in the boot parameter of the kernel and in modprobe.conf. I will report back if I receive similar hangs in the future.

Comment by Martin Schmidt (Blind) - Saturday, 25 February 2017, 22:30 GMT
So far it appears that enable_rc6=0 helps, the error has not re-ocurred.
Also, nobody cares on the kernel bug list :-) (or here)
Comment by mattia (nTia89) - Tuesday, 03 October 2017, 20:10 GMT
is this issue still valid with recent kernel?
I know many i915 paramenters have been reworked...

Loading...