FS#49966 - [linux] Hard system freeze with 4.6.3-1-ARCH
Attached to Project:
Arch Linux
Opened by Nico Schottelius (telmich) - Wednesday, 06 July 2016, 06:41 GMT
Last edited by Doug Newgard (Scimmia) - Friday, 20 October 2017, 14:50 GMT
Opened by Nico Schottelius (telmich) - Wednesday, 06 July 2016, 06:41 GMT
Last edited by Doug Newgard (Scimmia) - Friday, 20 October 2017, 14:50 GMT
|
Details
Description:
Since upgrading to 4.6.3-1-ARCH my system freezes hard multiple times per day. If sounds is being played at the time of freeze, part of the last seconds is continously repeated. Situation on freeze is that no mouse / keyboard action is being recognised, but screen stays as it is (i.e. xorg screen completely stays). Switching to console is not possible. I have upgraded linux (4.5.4-1 -> 4.6.3-1), afair 4.5.4 did not have this issue Additional info: * package version(s) * config and/or log files etc. System running is a Lenovo X1 Carbon (2015): [15:36] wurzel:~% lspci 00:00.0 Host bridge: Intel Corporation Broadwell-U Host Bridge -OPI (rev 09) 00:02.0 VGA compatible controller: Intel Corporation Broadwell-U Integrated Graphics (rev 09) 00:03.0 Audio device: Intel Corporation Broadwell-U Audio Controller (rev 09) 00:14.0 USB controller: Intel Corporation Wildcat Point-LP USB xHCI Controller (rev 03) 00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI Controller #1 (rev 03) 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (3) I218-V (rev 03) 00:1b.0 Audio device: Intel Corporation Wildcat Point-LP High Definition Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #2 (rev e3) 00:1c.1 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #3 (rev e3) 00:1c.5 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #6 (rev e3) 00:1d.0 USB controller: Intel Corporation Wildcat Point-LP USB EHCI Controller (rev 03) 00:1f.0 ISA bridge: Intel Corporation Wildcat Point-LP LPC Controller (rev 03) 00:1f.3 SMBus: Intel Corporation Wildcat Point-LP SMBus Controller (rev 03) 00:1f.6 Signal processing controller: Intel Corporation Wildcat Point-LP Thermal Management Controller (rev 03) 04:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59) 0a:00.0 SATA controller: Samsung Electronics Co Ltd Device a801 (rev 01) [15:37] wurzel:~% [15:38] wurzel:~% lsmod Module Size Used by fuse 94208 3 hmac 16384 1 drbg 32768 1 ansi_cprng 16384 0 ctr 16384 2 ccm 20480 2 ipt_MASQUERADE 16384 1 nf_nat_masquerade_ipv4 16384 1 ipt_MASQUERADE iptable_nat 16384 1 nf_conntrack_ipv4 16384 2 nf_defrag_ipv4 16384 1 nf_conntrack_ipv4 nf_nat_ipv4 16384 1 iptable_nat xt_addrtype 16384 2 iptable_filter 16384 1 xt_conntrack 16384 1 nf_nat 20480 2 nf_nat_ipv4,nf_nat_masquerade_ipv4 nf_conntrack 90112 5 nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4 br_netfilter 24576 0 bridge 122880 1 br_netfilter stp 16384 1 bridge llc 16384 2 stp,bridge dm_thin_pool 61440 1 dm_persistent_data 57344 1 dm_thin_pool dm_bio_prison 16384 1 dm_thin_pool dm_bufio 24576 1 dm_persistent_data loop 28672 4 uvcvideo 86016 0 videobuf2_vmalloc 16384 1 uvcvideo videobuf2_memops 16384 1 videobuf2_vmalloc videobuf2_v4l2 20480 1 uvcvideo videobuf2_core 36864 2 uvcvideo,videobuf2_v4l2 videodev 151552 3 uvcvideo,videobuf2_core,videobuf2_v4l2 media 32768 2 uvcvideo,videodev btusb 40960 0 btrtl 16384 1 btusb btbcm 16384 1 btusb btintel 16384 1 btusb bluetooth 454656 5 btbcm,btrtl,btusb,btintel joydev 20480 0 mousedev 20480 0 arc4 16384 2 sha256_ssse3 32768 3 sha256_generic 24576 1 sha256_ssse3 nls_iso8859_1 16384 1 nls_cp437 20480 1 vfat 20480 1 fat 65536 1 vfat iwlmvm 262144 0 mac80211 655360 1 iwlmvm iwlwifi 184320 1 iwlmvm mei_wdt 16384 0 iTCO_wdt 16384 0 iTCO_vendor_support 16384 1 iTCO_wdt cfg80211 495616 3 iwlwifi,mac80211,iwlmvm msr 16384 0 intel_rapl 20480 0 x86_pkg_temp_thermal 16384 0 intel_powerclamp 16384 0 coretemp 16384 0 kvm_intel 184320 0 kvm 499712 1 kvm_intel irqbypass 16384 1 kvm pcspkr 16384 0 input_leds 16384 0 psmouse 122880 0 serio_raw 16384 0 intel_pch_thermal 16384 0 i2c_i801 20480 0 lpc_ich 24576 0 shpchp 32768 0 wmi 20480 0 battery 20480 0 thinkpad_acpi 77824 1 ac 16384 0 nvram 16384 1 thinkpad_acpi led_class 16384 3 iwlmvm,thinkpad_acpi,input_leds rfkill 20480 6 cfg80211,thinkpad_acpi,bluetooth snd_hda_codec_hdmi 45056 1 snd_hda_codec_realtek 69632 1 snd_hda_codec_generic 69632 1 snd_hda_codec_realtek fjes 28672 0 mei_me 32768 0 e1000e 217088 0 snd_hda_intel 32768 2 snd_hda_codec 106496 4 snd_hda_codec_realtek,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_intel mei 81920 3 mei_wdt,mei_me tpm_tis 20480 0 tpm 36864 1 tpm_tis ptp 20480 1 e1000e pps_core 20480 1 ptp snd_hda_core 61440 5 snd_hda_codec_realtek,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec,snd_hda_intel snd_hwdep 16384 1 snd_hda_codec thermal 20480 0 evdev 24576 24 mac_hid 16384 0 processor 32768 0 sch_fq_codel 20480 6 snd_pcm_oss 45056 0 snd_mixer_oss 24576 1 snd_pcm_oss snd_pcm 86016 6 snd_pcm_oss,snd_hda_codec_hdmi,snd_hda_codec,snd_hda_intel,snd_hda_core snd_timer 28672 1 snd_pcm snd 65536 14 snd_hda_codec_realtek,snd_pcm_oss,snd_hwdep,snd_timer,snd_hda_codec_hdmi,snd_pcm,snd_hda_codec_generic,snd_hda_codec,snd_hda_intel,thinkpad_acpi,snd_mixer_oss soundcore 16384 1 snd drbd 356352 0 lru_cache 16384 1 drbd libcrc32c 16384 2 drbd,dm_persistent_data crc32c_generic 16384 0 ip_tables 28672 2 iptable_filter,iptable_nat x_tables 28672 5 ip_tables,ipt_MASQUERADE,xt_conntrack,iptable_filter,xt_addrtype ext4 520192 1 crc16 16384 2 ext4,bluetooth jbd2 90112 1 ext4 mbcache 16384 2 ext4 algif_skcipher 20480 0 af_alg 16384 1 algif_skcipher dm_crypt 28672 2 dm_mod 102400 10 dm_persistent_data,dm_bufio,dm_crypt,dm_thin_pool sd_mod 36864 4 crct10dif_pclmul 16384 0 crc32_pclmul 16384 0 atkbd 24576 0 libps2 16384 2 atkbd,psmouse crc32c_intel 24576 1 ghash_clmulni_intel 16384 0 xhci_pci 16384 0 aesni_intel 167936 11 ehci_pci 16384 0 ahci 36864 3 libahci 28672 1 ahci aes_x86_64 20480 1 aesni_intel lrw 16384 1 aesni_intel ehci_hcd 69632 1 ehci_pci gf128mul 16384 1 lrw xhci_hcd 159744 1 xhci_pci glue_helper 16384 1 aesni_intel ablk_helper 16384 1 aesni_intel cryptd 20480 5 ghash_clmulni_intel,aesni_intel,ablk_helper libata 196608 2 ahci,libahci scsi_mod 155648 2 libata,sd_mod usbcore 200704 6 btusb,uvcvideo,ehci_hcd,ehci_pci,xhci_hcd,xhci_pci usb_common 16384 1 usbcore i8042 24576 1 libps2 serio 20480 7 serio_raw,atkbd,i8042,psmouse i915 1204224 5 video 36864 2 i915,thinkpad_acpi button 16384 1 i915 intel_gtt 20480 1 i915 i2c_algo_bit 16384 1 i915 drm_kms_helper 114688 1 i915 syscopyarea 16384 1 drm_kms_helper sysfillrect 16384 1 drm_kms_helper sysimgblt 16384 1 drm_kms_helper fb_sys_fops 16384 1 drm_kms_helper drm 294912 6 i915,drm_kms_helper [15:38] wurzel:~% Steps to reproduce: |
This task depends upon
Closed by Doug Newgard (Scimmia)
Friday, 20 October 2017, 14:50 GMT
Reason for closing: Fixed
Additional comments about closing: OP reports it no longer happens
Friday, 20 October 2017, 14:50 GMT
Reason for closing: Fixed
Additional comments about closing: OP reports it no longer happens
@Nico Schottelius
I think you should attach a log from the system journal. Grep for "(soft lockup|hard lockup|stalls)" in journalctl and see what you have find.
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
Intel(R) Core(TM) i7-3537U CPU @ 2.00GHz
The system is freezing on X11, when going back and forth desktop using compositor effects on Plasma for example.
Mouse lagging when moving on the screen.
Downgraded to 4.6.2 no problem so far: lags seem to have disappeared.
Conversely, if anyone experiences this problem while booting in Legacy BIOS, not UEFI, mode, please mention it here.
http://cirrus.openshells.org/logs.html | grep -i LOCKUP
i had same experience using linux-ck
since upgrading to 4.6.4-1 and/or 4.6.4-1-ck all is well once more.
(edit) booting with BIOS/MBR
X58 Chipset
ATI open dri
I was convinced this was related to ye olde x58 chipset bug https://www.novell.com/support/kb/doc.php?id=7014344 ofc im no expert.
sometimes i would hard lockup where only hitting power button would get it back up, other occasions i'd lose only ethernet and a simple sudo ip link set dev enp5s0 down && sudo ip link set dev enp5s0 up would suffice.
But maybe we are all speaking about 2 different unrelated bugs.
VGA compatible controller: NVIDIA Corporation GK107 [GeForce GT 740] (rev a1)
AMD Athlon(tm) II X4 640 Processor
After reverting back to 4.5.4 the system has not yet suffered any freezes. Current uptime is 9 hours and counting.
lsmod (3.6 KiB)
I suspect that it is somehow related to audio. If my memory doesn't betray me, I never had my system freeze when not playing audio. But it crashed everywhere between couple of minutes and hours when using one of mpd, cmus, spotify or playing videos in firefox or chromium.
alsa-info.sh yields
cat: /sys/module/snd_soc_sst_broadwell/parameters/*: No such file or directory
Output can be found here
http://www.alsa-project.org/db/?f=5ba0985657ed63f7ffe72f2fe8f93d8258cd3018
Attached are my lspci, lsmod, and alsa-info.sh outputs on 4.6.5-3-ck.
Edit: Additionally, I was unable to find any useful information being logged from freezes on 4.7.4-1-ck. I checked journalctl, old /var/log/messages.log.1, /var/log/Xorg.0.log.old, and coredumpctl.
lsmod.txt (7.7 KiB)
alsa-info.txt.BNSighzeO2 (59.4 KiB)
lspci.txt (1.2 KiB)
alsa-info.txt.kBlh7xFRgN (12.9 KiB)
packages.txt (8.2 KiB)
However my system would freeze on boot with no error message displayed,its just stuck forever.
i5 cpu amd7xxxm vga opendrivers.
https://bugzilla.kernel.org/show_bug.cgi?id=141741
Keep 4.8.13 in backup before you upgrade to 4.9
I am noticing the same or a similar issue. Running the current kernel ('linux' package) I have this same freeze. Audio loops (roughly 1 sec). Keyboard is not responsive although I can adjust the keyboard backlight but from what I understand this is operating at a lower level. The OS does not see the keyboard events for a keyboard backlight brightness change with this particular laptop. It has happened when no audio is playing but that is much more rare. It makes me think this could be a pulseaudio related issue in relation to the kernel.
I am currently on kernel 4.9.6-1-ARCH. My machine does not go longer than roughly 3 days before it encounters a freeze. Most of the time it will freeze once a day, most of the time when audio is playing. When I run the LTS kernel ('linux-lts' package), I do not encounter any freezes but I do have other issues such as wifi problems (I believe driver related) and I can't run a low JACK buffer size for some reason. I'm going to see what I can do to get the LTS kernel working for me as it seems to be the only real solution at this stage.
More potentially relevant info:
I am using TLP with all of the default settings (from what I can remember).
I run pulseaudio all of the time.
I have "threadirqs" and "rw" kernel parameters set.
I am using legacy boot - BIOS.
I even went as far as replacing the RAM in this laptop as I thought that could be the issue. I had an issue with bad RAM in a previous desktop of mine that resulted in a similar issue.
Don't know if its helpful but i attach a list of the programs (according to ps_mem) that were running at the time.
Trying out linux-4.10.1-1 now.
I wonder if it could be related to the way Arch configures the kernel? Are there reports of this sort of behaviour from other distributions? I tried a search but didn't find all that much.
I want to try out Debian Testing at some stage to see if this same issue presents itself.
$ uname -a
Linux tiny 4.4.53-rt66-1-rt-lts #1 SMP PREEMPT RT Fri Mar 31 16:51:29 CST 2017 x
$ uptime
19:33:29 up 13 days, 5:57, 6 users, load average: 0.72, 0.55, 0.45
This is with about 5 to 10 hours of use (not suspended) time each day.
uname -a
Linux homu 4.6.5-3-ck #1 SMP PREEMPT Fri Aug 5 18:07:16 EDT 2016 x86_64 GNU/Linux
uptime
18:26:12 up 98 days, 18:41, 13 users, load average: 10.52, 10.94, 10.80
After a while discovering i found help in https://wiki.archlinux.org/index.php/Intel_graphics#X_freeze.2Fcrash_with_intel_driver
I added 'Option "DRI" "False"' to /etc/X11/xorg.conf.d/20-intel.conf which solved the problem. Hope that helps.
insights:
* Hardware: ThinkPad T540p
* current kernel: 4.4.67-1-lts44
* output of lspci | grep VGA is:
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)
p.s. I found that bug thread a bit confusing. Many people with different hardware wrote here and gave their insights ; and said already that it worked for them, but the bug thread is still open. Probably a new bug thread for each case with a 'similar to' link would be more helpful.
[EDIT 27.06.17] Did not helped, see comment below.
It freezes randomly on 4.9, after a couple of hours to a day or two. Sometimes I see a kernel panic "kernel NULL pointer dereference".
It was working perfeclty fine on 4.4.62. I recently upgraded to 4.4.73 (machine rebooted after failure of one HDD) and it's showing the same behaviour as 4.9. So I went back to 4.1.41 and it seems OK so far.
This machine is used as a home server so it runs 24/7 with light workload. The integrated GPU is still activated and the X server is started even though it's not used.
I did not try to disable HyperThreading. As far as I remember, I did try using a dedicated graphics card and the issue was the same.
Can someone confirm it broke again between 4.4.62 and 4.4.73?
I still use the 4.4.71 (AUR linux-lts44). Unfortunately I had some freezes, so the described trick with DRI seems not to be the solution. But the number of system freezes decreased.
I have a Haswell processor with activated hyperthreading (Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz). So I guess that I am not part of this bug with Skylake/Kaby Lake - https://lists.debian.org/debian-devel/2017/06/msg00308.html
The only stable branch for me right now is LTS 4.1.
I'm going to try running on 4.4.68.
CPU: AMD A10-7850K
GPU: AMD Radeon HD 6950
See attached output for lspci, lsmod, alsa-info and pacman -Q.
lsmod.txt (5.5 KiB)
lspci.txt (2.7 KiB)
packages.txt (15.5 KiB)
Nico,
Just to make sure I understood you correctly. Have you been experiencing crashes on your hardware before, and with the latest kernel those are no longer happening?
What hardware are you running on (and which version of the kernel).
Cheers /u
I am running 4.11.9-1 on the affected machine and it is stable now for some months - sorry for the late update!
not so unlikely... https://bugzilla.kernel.org/show_bug.cgi?id=118051#c15