FS#50619 - [linux] i915 Skyle driver freeze with kernl 4.7.2
Attached to Project:
Arch Linux
Opened by Sebastien Bariteau (numkem) - Wednesday, 31 August 2016, 18:15 GMT
Last edited by Jan de Groot (JGC) - Friday, 20 October 2017, 09:22 GMT
Opened by Sebastien Bariteau (numkem) - Wednesday, 31 August 2016, 18:15 GMT
Last edited by Jan de Groot (JGC) - Friday, 20 October 2017, 09:22 GMT
|
Details
Description: After upgrading to kernel 4.7.2, the second
that I login in my WM (i3), I get a full screen freeze on my
Dell XPS 13.
lspci: 00:00.0 Host bridge: Intel Corporation Skylake Host Bridge/DRAM Registers (rev 09) 00:02.0 VGA compatible controller: Intel Corporation Iris Graphics 540 (rev 0a) 00:04.0 Signal processing controller: Intel Corporation Skylake Processor Thermal Subsystem (rev 09) 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21) 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21) 00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21) 00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21) 00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21) 00:17.0 SATA controller: Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] (rev 21) 00:1c.0 PCI bridge: Intel Corporation Device 9d10 (rev f1) 00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1) 00:1c.5 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #6 (rev f1) 00:1d.0 PCI bridge: Intel Corporation Device 9d18 (rev f1) 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller (rev 21) 00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21) 00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21) 00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21) 01:00.0 PCI bridge: Intel Corporation DSL5110 Thunderbolt Bridge [Falcon Ridge LP 2014] 02:00.0 PCI bridge: Intel Corporation DSL5110 Thunderbolt Bridge [Falcon Ridge LP 2014] 02:01.0 PCI bridge: Intel Corporation DSL5110 Thunderbolt Bridge [Falcon Ridge LP 2014] 02:02.0 PCI bridge: Intel Corporation DSL5110 Thunderbolt Bridge [Falcon Ridge LP 2014] 39:00.0 USB controller: Intel Corporation Device 15b5 3a:00.0 Network controller: Broadcom Corporation BCM4350 802.11ac Wireless Network Adapter (rev 08) 3b:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01) 3c:00.0 Non-Volatile memory controller: Toshiba America Info Systems Device 010f (rev 01) Driver crash in journalctl: kernel: BUG: unable to handle kernel NULL pointer dereference at (null) kernel: IP: [< (null)>] (null) kernel: PGD 464b96067 PUD 464b95067 PMD 0 kernel: Oops: 0010 [#1] PREEMPT SMP kernel: Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt kernel: snd_compress dell_laptop dell_smbios dcdbas snd_pcm_dmaengine kvm ac97_bus brcmfmac brcmutil irqbypass snd_hda_intel rtsx_pci_ms crct10dif_pclmul crc32_pclmul cfg kernel: acpi_pad mac_hid ac sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) ip_tables x_tables xfs libcrc32c crc32c_generic rtsx_pci_sdmmc mmc_cor kernel: CPU: 3 PID: 930 Comm: Xorg Tainted: G U O 4.7.2-1-ARCH #1 kernel: Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.4 06/14/2016 kernel: task: ffff88046762db80 ti: ffff880464b80000 task.ti: ffff880464b80000 kernel: RIP: 0010:[<0000000000000000>] [< (null)>] (null) kernel: RSP: 0018:ffff880464b83af0 EFLAGS: 00010282 kernel: RAX: ffff880464b83bb8 RBX: ffff8804675e0480 RCX: b787eed8fa6ceafd kernel: RDX: 00000000fffff075 RSI: ffff880469d122d0 RDI: ffff8804675e0e40 kernel: RBP: ffff880464b83b78 R08: ffff880469d13578 R09: ffff8804675e0e40 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8804675e0e40 kernel: R13: 0000000000000000 R14: ffff880469d13578 R15: ffff880469d122d0 kernel: FS: 00007f8ea26a9940(0000) GS:ffff88047ed80000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000000 CR3: 00000004646cc000 CR4: 00000000003406e0 kernel: Stack: kernel: ffffffffa01356fe ffff880469d10000 ffff880469d122d0 ffff880464b83bb8 kernel: ffff880452764240 00000001fffff075 ffff8804675e0480 ffff880456822f00 kernel: ffff880464b83b40 ffffffffa01385ad ffff880464b83b80 000000007c6e955c kernel: Call Trace: kernel: [<ffffffffa01356fe>] ? i915_gem_object_sync+0x1ae/0x330 [i915] kernel: [<ffffffffa01385ad>] ? i915_gem_object_pin+0x2d/0x30 [i915] kernel: [<ffffffffa0147ab0>] intel_execlists_submission+0x1d0/0x440 [i915] kernel: [<ffffffffa0127422>] i915_gem_do_execbuffer.isra.14+0x892/0x12a0 [i915] kernel: [<ffffffff81579e71>] ? unix_stream_read_generic+0x281/0x8a0 kernel: [<ffffffffa0128b48>] i915_gem_execbuffer2+0xe8/0x250 [i915] kernel: [<ffffffffa0136399>] ? i915_gem_busy_ioctl+0xc9/0x100 [i915] kernel: [<ffffffffa00209a2>] drm_ioctl+0x152/0x540 [drm] kernel: [<ffffffffa0128a60>] ? i915_gem_execbuffer+0x330/0x330 [i915] kernel: [<ffffffff81217e07>] ? __fget+0x77/0xb0 kernel: [<ffffffff8120cd72>] do_vfs_ioctl+0xa2/0x5d0 kernel: [<ffffffff81217e07>] ? __fget+0x77/0xb0 kernel: [<ffffffff8120d319>] SyS_ioctl+0x79/0x90 kernel: [<ffffffff815de7b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4 kernel: Code: Bad RIP value. kernel: RIP [< (null)>] (null) kernel: RSP <ffff880464b83af0> kernel: CR2: 0000000000000000 kernel: ---[ end trace bb0f04a09880d6a1 ]--- Steps to reproduce: Upgrade with to the latest kernel and kernel-headers (4.7.2) with Intel driver. Crashes every time after login screen (lightdm). I'm using TLP with the default settings and no other special settings for the i915 driver itself. |
This task depends upon
I don't use dm, startx leads to the exactly same result as yours
sudo startx says:
"modprobe: FATAL: Module i915 not found in directory /lib/modules/4.7.1-1-ARCH
modprobe: FATAL: Module fbcon not found in directory /lib/modules/4.7.1-1-ARCH"
but I only have 4.7.2-1-ARCH and extramodules-4.7-ARCH in /lib/modules
tried ln -s /lib/modules/4.7.2-1-ARCH /lib/modules/4.7.1-1-ARCH
and sudo startx says:
"ERROR: could not insert 'i915': Invalid argument"
also
battery is not found by i3status
dmesg | grep input shows only the keyboard
and nmcli / ip shows only the lo device
the problem appeared right after I upgraded my kernel to 4.7.2 and the behavior (screen freeze) is exactly the same.
Even if it is not the same issue / bug, I believe it is somewhat related to this one.
* removing xf86-video-intel
* removing the intel xorg config in /etc/X11/xorg.conf.d/
* removing any i915 module options in /etc/modprobe.d/i915.conf (commented out all lines)
* removing any kernel pararms for i915 in /boot/
* rebuilding initramfs
Kernel 4.9.8
I think this is a kernel driver issue with i915.
There is a bug report on bugs.freedesktop.org, bug # 98528
Log of the newest hang:
Feb 14 09:10:26 rough kernel: perf: interrupt took too long (4911 > 4898), lowering kernel.perf_event_max_sample_rate to 40500
Feb 14 11:09:11 rough kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in Xorg [1423], reason: Hang on render ring, action: reset
Feb 14 11:09:11 rough kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Feb 14 11:09:11 rough kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Feb 14 11:09:11 rough kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Feb 14 11:09:11 rough kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Feb 14 11:09:11 rough kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Feb 14 11:09:11 rough kernel: drm/i915: Resetting chip after gpu hang
Feb 14 11:09:11 rough kernel: [drm] RC6 on
Feb 14 11:09:11 rough kernel: [drm] GuC firmware load skipped
Feb 14 11:09:23 rough kernel: drm/i915: Resetting chip after gpu hang
Feb 14 11:09:23 rough kernel: [drm] RC6 on
Feb 14 11:09:24 rough kernel: [drm] GuC firmware load skipped
Feb 14 11:09:31 rough kernel: drm/i915: Resetting chip after gpu hang
Feb 14 11:09:32 rough kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
Feb 14 11:09:32 rough kernel: IP: [<ffffffffa06ee8a3>] reset_common_ring+0xc3/0x170 [i915]
Feb 14 11:09:32 rough kernel: PGD 0
Feb 14 11:09:32 rough kernel:
Feb 14 11:09:32 rough kernel: Oops: 0000 [#1] PREEMPT SMP
Feb 14 11:09:32 rough kernel: Modules linked in: cdc_mbim cdc_wdm snd_usb_audio snd_usbmidi_lib cdc_ncm snd_rawmidi usbnet mii snd_seq_device hid_generic veth msr fuse rfcomm ctr ccm ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc bnep snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat fat arc4 hid_multitouch iwlmvm mac80211 dell_led snd_hda_codec_realtek snd_hda_codec_generic snd_soc_skl iTCO_wdt iTCO_vendor_support snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core i2c_designware_platform i2c_designware_core iwlwifi snd_soc_sst_match snd_soc_core snd_compress snd_pcm_dmaengine ac97_bus rtsx_pci_ms cfg80211 memstick intel_rapl dell_laptop x86_pkg_temp_thermal
Feb 14 11:09:32 rough kernel: intel_powerclamp coretemp dell_wmi dell_smbios dcdbas kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd intel_cstate intel_rapl_perf snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore i2c_i801 i2c_smbus idma64 shpchp i915 joydev input_leds evdev led_class mousedev mac_hid drm_kms_helper btusb btrtl drm hci_uart mei_me intel_gtt btbcm btqca syscopyarea sysfillrect mei btintel sysimgblt processor_thermal_device fb_sys_fops i2c_algo_bit intel_lpss_pci intel_pch_thermal intel_soc_dts_iosf fan i2c_hid bluetooth thermal wmi uvcvideo rfkill crc16 video intel_lpss_acpi videobuf2_vmalloc intel_lpss videobuf2_memops videobuf2_v4l2 intel_hid videobuf2_core int3403_thermal battery
Feb 14 11:09:32 rough kernel: int340x_thermal_zone sparse_keymap acpi_als int3400_thermal acpi_thermal_rel kfifo_buf videodev fjes industrialio media button tpm_tis tpm_tis_core ac acpi_pad tpm sch_fq_codel ip_tables x_tables hid_logitech_hidpp btrfs xor hid_logitech_dj usbhid hid raid6_pq rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 crc32c_intel ahci xhci_pci libahci rtsx_pci xhci_hcd libata scsi_mod usbcore usb_common i8042 serio nvme nvme_core
Feb 14 11:09:32 rough kernel: CPU: 2 PID: 4906 Comm: kworker/2:2 Tainted: G U 4.9.8-1-ARCH #1
Feb 14 11:09:32 rough kernel: Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.13 12/28/2016
Feb 14 11:09:32 rough kernel: Workqueue: events_long i915_hangcheck_elapsed [i915]
Feb 14 11:09:32 rough kernel: task: ffff880362745880 task.stack: ffffc90003ca4000
Feb 14 11:09:32 rough kernel: RIP: 0010:[<ffffffffa06ee8a3>] [<ffffffffa06ee8a3>] reset_common_ring+0xc3/0x170 [i915]
Feb 14 11:09:32 rough kernel: RSP: 0018:ffffc90003ca7b50 EFLAGS: 00010286
Feb 14 11:09:32 rough kernel: RAX: 0000000000000000 RBX: ffff8801976d4fc0 RCX: 0000000000000001
Feb 14 11:09:32 rough kernel: RDX: 0000000000000004 RSI: 0000000000000206 RDI: 0000000000000206
Feb 14 11:09:32 rough kernel: RBP: ffffc90003ca7b70 R08: ffff880365fe0928 R09: ffff880365fe07a8
Feb 14 11:09:32 rough kernel: R10: ffffea000ca50c00 R11: 00000000000004f7 R12: ffff880365fe2968
Feb 14 11:09:32 rough kernel: R13: 0000000000000000 R14: ffff880365fe0000 R15: ffff880365fe2c10
Feb 14 11:09:32 rough kernel: FS: 0000000000000000(0000) GS:ffff88047ed00000(0000) knlGS:0000000000000000
Feb 14 11:09:32 rough kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 14 11:09:32 rough kernel: CR2: 0000000000000070 CR3: 0000000001a07000 CR4: 00000000003426e0
Feb 14 11:09:32 rough kernel: Stack:
Feb 14 11:09:32 rough kernel: ffff880365fe2968 ffff8801976d4fc0 ffff880365fe8958 ffff880365fe0000
Feb 14 11:09:32 rough kernel: ffffc90003ca7bb8 ffffffffa06d938a 00000000ffffff01 ffff880350a56400
Feb 14 11:09:32 rough kernel: ffff880365fe0000 ffff880365feaa18 ffffffff81606860 ffff880365feaa18
Feb 14 11:09:32 rough kernel: Call Trace:
Feb 14 11:09:32 rough kernel: [<ffffffffa06d938a>] i915_gem_reset+0x15a/0x280 [i915]
Feb 14 11:09:32 rough kernel: [<ffffffff81606860>] ? __wait_on_bit_lock+0xc0/0xc0
Feb 14 11:09:32 rough kernel: [<ffffffffa069c18d>] i915_reset+0x8d/0xe0 [i915]
Feb 14 11:09:32 rough kernel: [<ffffffffa069f42d>] i915_reset_and_wakeup+0xfd/0x180 [i915]
Feb 14 11:09:32 rough kernel: [<ffffffffa06a53aa>] i915_handle_error+0x10a/0x5f0 [i915]
Feb 14 11:09:32 rough kernel: [<ffffffffa06a5af1>] i915_hangcheck_elapsed+0x261/0x570 [i915]
Feb 14 11:09:32 rough kernel: [<ffffffff810a6978>] ? finish_task_switch+0x78/0x1f0
Feb 14 11:09:32 rough kernel: [<ffffffff81098aa5>] process_one_work+0x1e5/0x470
Feb 14 11:09:32 rough kernel: [<ffffffff81098d78>] worker_thread+0x48/0x4e0
Feb 14 11:09:32 rough kernel: [<ffffffff81098d30>] ? process_one_work+0x470/0x470
Feb 14 11:09:32 rough kernel: [<ffffffff81098d30>] ? process_one_work+0x470/0x470
Feb 14 11:09:32 rough kernel: [<ffffffff8109e909>] kthread+0xd9/0xf0
Feb 14 11:09:32 rough kernel: [<ffffffff8102d9f2>] ? __switch_to+0x572/0x630
Feb 14 11:09:32 rough kernel: [<ffffffff8109e830>] ? kthread_park+0x60/0x60
Feb 14 11:09:32 rough kernel: [<ffffffff8160ab15>] ret_from_fork+0x25/0x30
Feb 14 11:09:32 rough kernel: Code: 41 5e 5d c3 41 8b 44 24 28 b9 01 00 00 00 ba 00 00 ff ff 4c 89 f7 8d b0 a0 03 00 00 41 ff 96 80 07 00 00 4d 8b ac 24 68 02 00 00 <49> 8b 45 70 48 39 43 70 74 51 4d 85 ed 74 14 48 c7 c0 50 e6 48
Feb 14 11:09:32 rough kernel: RIP [<ffffffffa06ee8a3>] reset_common_ring+0xc3/0x170 [i915]
Feb 14 11:09:32 rough kernel: RSP <ffffc90003ca7b50>
Feb 14 11:09:32 rough kernel: CR2: 0000000000000070
Feb 14 11:09:32 rough kernel: ---[ end trace 0c9eeeb99502cbd2 ]---
Feb 14 11:09:32 rough kernel: BUG: unable to handle kernel paging request at 000000004eec5500
Feb 14 11:09:32 rough kernel: IP: [<ffffffff810c3c4b>] __wake_up_common+0x2b/0x80
The recipe to get the /sys/class/drm/card0/error does not work, because the laptop has to be hard reset.
In an attempt to avoid these crashes, I now have set enable_rc6=0 in the boot parameter of the kernel and in modprobe.conf. I will report back if I receive similar hangs in the future.
Also, nobody cares on the kernel bug list :-) (or here)
I know many i915 paramenters have been reworked...