FS#59483 - [linux] Random full freezes on linux-4.17.10-1
Attached to Project:
Arch Linux
Opened by f (bakgwailo) - Monday, 30 July 2018, 04:54 GMT
Last edited by Andreas Radke (AndyRTR) - Tuesday, 01 March 2022, 21:12 GMT
Opened by f (bakgwailo) - Monday, 30 July 2018, 04:54 GMT
Last edited by Andreas Radke (AndyRTR) - Tuesday, 01 March 2022, 21:12 GMT
|
Details
Description:
I am getting pretty frequent full system lockups/freezes on the latest stable kernel. Using the latest LTS kernel seems to be OK. X/Plasma completely lock up, and keyboard input no longer works (i.e. can't shift to virtual terminals, and can't even do things like toggle num-lock). Currrently up to date with stable (test repos not enabled). I have attached my journalctl output. Additional info: * Ryzen 2700x with a GTX-1070 using the latest drivers (396.45-1) Steps to reproduce: Boot the computer and wait a few minutes. |
This task depends upon
Closed by Andreas Radke (AndyRTR)
Tuesday, 01 March 2022, 21:12 GMT
Reason for closing: Fixed
Additional comments about closing: Fixed upstream.
Tuesday, 01 March 2022, 21:12 GMT
Reason for closing: Fixed
Additional comments about closing: Fixed upstream.
With the stable kernel I have only problems.
The computer does not shut down anymore and I also get these messages from systemd-udevd.
https://imgur.com/a/2erltyr
@HeinzDo57 and @bakgwailo what was the last version of linux package without the issue and the first version with the issue?
Jul 30 00:26:26 desktop kernel: PGD 3a7660067 P4D 3a7660067 PUD 0
Jul 30 00:26:26 desktop kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Jul 30 00:26:26 desktop kernel: Modules linked in: snd_hda_codec_hdmi nct6775 hwmon_vid usblp arc4 nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) iwlmvm mac80211 nls_iso8859_1 nls_cp437 vfat fat kvm iwlwifi drm_kms_helper snd_hda_codec_realtek btusb btrtl btbcm snd_hda_codec_generic btintel uvcvideo raid10 irqbypass crct10dif_pclmul videobuf2_vmalloc videobuf2_memops crc32_pclmul snd_hda_intel videobuf2_v4l2 md_mod snd_usb_audio wmi_bmof mxm_wmi cfg80211 bluetooth ghash_clmulni_intel drm snd_hda_codec videobuf2_common snd_usbmidi_lib pcbc videodev snd_rawmidi snd_hda_core snd_seq_device snd_hwdep igb agpgart input_leds media ipmi_devintf snd_pcm led_class ipmi_msghandler ecdh_generic joydev syscopyarea sysfillrect mousedev aesni_intel i2c_algo_bit dca snd_timer aes_x86_64 crypto_simd rfkill cryptd snd glue_helper sysimgblt
Jul 30 00:26:26 desktop kernel: fb_sys_fops ccp(+) sp5100_tco soundcore rng_core i2c_piix4 k10temp pcspkr shpchp evdev rtc_cmos wmi mac_hid pinctrl_amd gpio_amdpt pcc_cpufreq acpi_cpufreq crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 fscrypto sr_mod cdrom sd_mod hid_roccat_konepure hid_roccat hid_roccat_common hid_generic usbhid hid ahci xhci_pci crc32c_intel libahci xhci_hcd libata usbcore scsi_mod usb_common
Jul 30 00:26:26 desktop kernel: CPU: 2 PID: 627 Comm: Xorg Tainted: P O 4.17.10-1-ARCH #1
Jul 30 00:26:26 desktop kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470 Taichi, BIOS P1.50 07/03/2018
Jul 30 00:26:26 desktop kernel: RIP: 0010:select_idle_sibling+0x38d/0x460
Jul 30 00:26:26 desktop kernel: RSP: 0018:ffffa98583defa08 EFLAGS: 00010006
Jul 30 00:26:26 desktop kernel: RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001
Jul 30 00:26:26 desktop kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff96520cb91538
Jul 30 00:26:26 desktop kernel: RBP: 0000000000000047 R08: 000000cac3c9a9b5 R09: 0000000000000002
Jul 30 00:26:26 desktop kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff96520cb91538
Jul 30 00:26:26 desktop kernel: R13: ffff9652c92f8368 R14: ffff96520cb92e00 R15: 0000000000000001
Jul 30 00:26:26 desktop kernel: FS: 00007fc3e37ece00(0000) GS:ffff96521ec80000(0000) knlGS:0000000000000000
Jul 30 00:26:26 desktop kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 30 00:26:26 desktop kernel: CR2: ffff9652c92f8368 CR3: 00000003ef586000 CR4: 00000000003406e0
Jul 30 00:26:26 desktop kernel: Call Trace:
Jul 30 00:26:26 desktop kernel: select_task_rq_fair+0xbcb/0xc20
Jul 30 00:26:26 desktop kernel: ? preempt_count_add+0x49/0xa0
Jul 30 00:26:26 desktop kernel: ? memcg_kmem_get_cache+0x8c/0x1b0
Jul 30 00:26:26 desktop kernel: ? preempt_count_add+0x49/0xa0
Jul 30 00:26:26 desktop kernel: ? memcg_kmem_put_cache+0x3f/0x70
Jul 30 00:26:26 desktop kernel: ? __kmalloc_node_track_caller+0x210/0x2b0
Jul 30 00:26:26 desktop kernel: ? __alloc_skb+0x82/0x1d0
Jul 30 00:26:26 desktop kernel: try_to_wake_up+0x13a/0x490
Jul 30 00:26:26 desktop kernel: pollwake+0x74/0x90
Jul 30 00:26:26 desktop kernel: ? wake_up_q+0x70/0x70
Jul 30 00:26:26 desktop kernel: __wake_up_common+0x77/0x140
Jul 30 00:26:26 desktop kernel: __wake_up_common_lock+0x7c/0xc0
Jul 30 00:26:26 desktop kernel: sock_def_readable+0x41/0x80
Jul 30 00:26:26 desktop kernel: unix_stream_sendmsg+0x1b5/0x3c0
Jul 30 00:26:26 desktop kernel: sock_sendmsg+0x33/0x40
Jul 30 00:26:26 desktop kernel: sock_write_iter+0x8f/0xf0
Jul 30 00:26:26 desktop kernel: do_iter_readv_writev+0x12b/0x190
Jul 30 00:26:26 desktop kernel: do_iter_write+0x80/0x190
Jul 30 00:26:26 desktop kernel: vfs_writev+0x84/0xf0
Jul 30 00:26:26 desktop kernel: ? __vfs_read+0x36/0x170
Jul 30 00:26:26 desktop kernel: do_writev+0x5c/0xf0
Jul 30 00:26:26 desktop kernel: do_syscall_64+0x5b/0x170
Jul 30 00:26:26 desktop kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 30 00:26:26 desktop kernel: RIP: 0033:0x7fc3e3351744
Jul 30 00:26:26 desktop kernel: RSP: 002b:00007ffea1f88ba8 EFLAGS: 00003246 ORIG_RAX: 0000000000000014
Jul 30 00:26:26 desktop kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3e3351744
Jul 30 00:26:26 desktop kernel: RDX: 0000000000000001 RSI: 00007ffea1f88e80 RDI: 0000000000000041
Jul 30 00:26:26 desktop kernel: RBP: 0000561e8c8b0490 R08: 0000000000000001 R09: 0000000000000007
Jul 30 00:26:26 desktop kernel: R10: 0000000000000001 R11: 0000000000003246 R12: 0000000000000001
Jul 30 00:26:26 desktop kernel: R13: 00007ffea1f88e80 R14: 0000000000000020 R15: 0000561e8c93f500
Jul 30 00:26:26 desktop kernel: Code: 44 24 08 e8 66 1e 67 00 41 89 c7 3d 3f 01 00 00 77 48 48 8b 04 24 4c 8d a8 68 03 00 00 eb 09 83 ed 01 0f 84 db fe ff ff 44 89 f8 <49> 0f a3 45 00 73 0c 44 89 ff e8 34 82 ff ff 85 c0 75 4f 44 89
Jul 30 00:26:26 desktop kernel: RIP: select_idle_sibling+0x38d/0x460 RSP: ffffa98583defa08
Jul 30 00:26:26 desktop kernel: CR2: ffff9652c92f8368
Jul 30 00:26:26 desktop kernel: ---[ end trace ead38b84905aaf33 ]---
Normally I would suggest reporting it upstream but upstream does not support issues produced on tainted kernels. Can you reproduce it without the nvidia modules?
You might also try testing 4.18-rc7.
[ 64.610838] systemd-udevd[361]: seq 2656 '/devices/pci0000:00/0000:00:07.1/0000:31:00.2' is taking a long time
[ 64.610846] systemd-udevd[361]: seq 2776 '/devices/system/cpu/cpu0' is taking a long time
[...]
[ 184.608688] systemd-udevd[361]: seq 2656 '/devices/pci0000:00/0000:00:07.1/0000:31:00.2' killed
[ 184.608756] systemd-udevd[361]: seq 2761 '/devices/system/cpu/cpu0' killed
[...]
[ 246.764303] INFO: task systemd-udevd:375 blocked for more than 120 seconds.
[ 246.764305] Not tainted 4.18.0-rc8-g8efcf34a2639 #1
[ 246.764306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
After I had flashed the old BIOS ( 7B79vA3 2018-05-10)the problems were gone.
I then contacted MSI.
They had checked that and thought it was the new AGESA code.
No more freezes, but I'm still getting udev output similar to that posted by schnilch (forum thread: https://bbs.archlinux.org/viewtopic.php?id=239539). I'm not completely sure, but I don't think I got that output before updating the BIOS.
Oh, in addition to the freezes I was also somehow missing one CPU core on the old BIOS. I had to use Ryzen Master on windows to re-enable it.
EDIT: I'm blind, I read "downgrade" as "upgrade". I will try that, thanks.
https://bugzilla.redhat.com/show_bug.cgi?id=1608242#c11
http://forum.asrock.com/forum_posts.asp?TID=9179&title=new-asrock-x470-taichi-uefi-150
Basically looks like a bug in the latest bios. Solutions are to roll back to a pre-1.0.0.4a AGESA code, or, use the LTS kernel, or really, any pre-4.16 kernel as that is when code was added that interacts with the PSP, which is bugged in the latest BIOS.
But after booting the cpu seems to work well.
CPU: Ryzen 5 1600
Mobo: Gigabyte AB350M-Gaming3
The issue is also documented here:
https://forum.level1techs.com/t/aorus-x399-gaming-7-new-bios-update/129389/5
Seems that the 'ccp' module is the culprit, and can be fixed in the kernel by compiling with CONFIG_CRYPTO_DEV_SP_PSP=n .
X470 Aorus ultra gaming
Ryzen 2700x
F3 BIOS with AGESA 1.0.0.4
Recompiling arch kernel with CONFIG_CRYPTO_DEV_SP_PSP=n solves the problem.