FS#74297 - [linux-zen][amdgpu] Kernel panic after upgrade to 5.17.1

Attached to Project: Arch Linux
Opened by Infanta Xavier (xavier83) - Friday, 01 April 2022, 11:24 GMT
Last edited by Jelle van der Waa (jelly) - Thursday, 14 September 2023, 17:52 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Jan Alexander Steffens (heftig)
David Runge (dvzrv)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description:
After upgrading linux-zen to 5.17.1 from 5.16.16, system hangs-up and unresponsive
with system monitor showing minimal cpu usage but there are kernel crash logs on dmesg.


Additional info:
* linux-zen 5.17.1
* attached dmesg with crash stacktrace.
[drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP block <si_dpm> failed -22

Steps to reproduce:
Upgrade linux-zen to 5.17.1 kernel panics when the new kernel is running.
This task depends upon

Closed by  Jelle van der Waa (jelly)
Thursday, 14 September 2023, 17:52 GMT
Reason for closing:  Deferred
Additional comments about closing:  Old kernel, please retry with the latest
Comment by Infanta Xavier (xavier83) - Friday, 01 April 2022, 11:34 GMT
Just downgraded to linux-zen 5.16.16 and surprised to notice the same kernel module crash stack trace but no message saying
[drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP block <si_dpm> failed -22
and it doesn't hangup the system like it does on 5.17.1 though. probably went unnoticed prior.
The unresponsiveness might be due to some logging behaviour change in the new kernel?
Comment by Infanta Xavier (xavier83) - Friday, 01 April 2022, 18:30 GMT
Just downgraded to linux-zen 5.16.16 and surprised to notice the same kernel module crash stack trace but no message saying
[drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP block <si_dpm> failed -22
and it doesn't hangup the system like it does on 5.17.1 though. probably went unnoticed prior.
The unresponsiveness might be due to some logging behaviour change in the new kernel?
Comment by Radu Pantiru (hex72a2) - Saturday, 02 April 2022, 15:37 GMT
Not sure if related but Xorg startup failed after upgrading to 5.17.1 using amdgpu with the error `(EE) Failed to load module "ati" (module does not exist, 0)`
Downgrading to 5.16 fixed the issue.
Comment by Infanta Xavier (xavier83) - Sunday, 26 June 2022, 09:09 GMT
Issue persists on kernel 5.18.6 as well.

# dmesg
.
.
.
[ 37.509068] ------------[ cut here ]------------
[ 37.509078] WARNING: CPU: 3 PID: 500 at drivers/gpu/drm/ttm/ttm_bo.c:411 ttm_bo_release+0x3ef/0x420 [ttm]
[ 37.509104] Modules linked in: ccm rfcomm exfat xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc overlay tun cmac algif_hash algif_skcipher af_alg bnep edac_mce_amd kvm_amd rtsx_usb_ms memstick kvm rtl8723be irqbypass btcoexist crct10dif_pclmul rtl8723_common uvcvideo crc32_pclmul rtl_pci btusb ghash_clmulni_intel btrtl aesni_intel rtlwifi videobuf2_vmalloc btbcm crypto_simd videobuf2_memops mac80211 cryptd videobuf2_v4l2 r8169 mousedev btintel snd_usb_audio videobuf2_common btmtk snd_usbmidi_lib videodev libarc4 snd_rawmidi bluetooth snd_seq_device joydev ideapad_laptop snd_hda_codec_conexant vfat ecdh_generic snd_hda_codec_generic fat snd_hda_codec_hdmi ledtrig_audio psmouse sparse_keymap realtek mc cfg80211 mdio_devres sp5100_tco pcspkr snd_hda_intel platform_profile fam15h_power snd_intel_dspcfg k10temp wmi i2c_piix4 ext4
[ 37.509220] snd_intel_sdw_acpi libphy crc16 mbcache rfkill snd_hda_codec jbd2 snd_hda_core snd_hwdep snd_pcm ccp snd_timer mac_hid video snd soundcore acpi_cpufreq rng_core ipmi_devintf ipmi_msghandler crypto_user fuse bpf_preload ip_tables x_tables xfs libcrc32c crc32c_generic usbhid dm_mod rtsx_usb_sdmmc sdhci_pci serio_raw cqhci atkbd sdhci libps2 vivaldi_fmap mmc_core rtsx_usb crc32c_intel xhci_pci xhci_pci_renesas i8042 serio radeon amdgpu gpu_sched drm_ttm_helper ttm drm_dp_helper
[ 37.509287] CPU: 3 PID: 500 Comm: Xorg Not tainted 5.18.6-zen1-1-zen #1 31c889efa9fd05489a9f6ae80ff3555cce57e8ee
[ 37.509295] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 08/04/2016
[ 37.509299] RIP: 0010:ttm_bo_release+0x3ef/0x420 [ttm]
[ 37.509315] Code: 00 e8 a5 ff 78 e7 48 8b 43 e8 eb a7 be 03 00 00 00 e8 d5 60 4b e7 e9 f8 fc ff ff e8 2b d3 78 e7 e9 ee fc ff ff 4c 89 e0 eb 89 <0f> 0b e9 44 fc ff ff e8 15 d3 78 e7 e9 c8 fe ff ff be 03 00 00 00
[ 37.509321] RSP: 0018:ffffada800adfdc0 EFLAGS: 00010202
[ 37.509325] RAX: 0000000000000000 RBX: ffff982c148171b8 RCX: 00000000820001fb
[ 37.509329] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff982c148171b8
[ 37.509332] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[ 37.509335] R10: ffff982c0dccb900 R11: 0000000000000000 R12: ffff982bca9e5280
[ 37.509338] R13: ffff982c14817058 R14: ffff982c0dd8f840 R15: 0000000000000000
[ 37.509341] FS: 00007f88177e0100(0000) GS:ffff982edfd80000(0000) knlGS:0000000000000000
[ 37.509345] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 37.509349] CR2: 000000000033f1ca CR3: 0000000107f9a000 CR4: 00000000000406e0
[ 37.509353] Call Trace:
[ 37.509358] <TASK>
[ 37.509365] amdgpu_bo_unref+0x1e/0x30 [amdgpu 4b9057c4028e0aaaa36548fda1a042a9a1fdf60b]
[ 37.510169] amdgpu_gem_object_free+0x34/0x50 [amdgpu 4b9057c4028e0aaaa36548fda1a042a9a1fdf60b]
[ 37.510954] drm_gem_dmabuf_release+0x3a/0x50
[ 37.510966] dma_buf_release+0x46/0xa0
[ 37.510973] __dentry_kill+0x102/0x240
[ 37.510982] __fput+0xe6/0x250
[ 37.510990] task_work_run+0x60/0x90
[ 37.510997] exit_to_user_mode_prepare+0x11b/0x140
[ 37.511003] syscall_exit_to_user_mode+0x26/0x50
[ 37.511011] do_syscall_64+0x6b/0x90
[ 37.511018] ? do_syscall_64+0x6b/0x90
[ 37.511023] ? syscall_exit_to_user_mode+0x26/0x50
[ 37.511028] ? do_syscall_64+0x6b/0x90
[ 37.511034] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 37.511041] RIP: 0033:0x7f88181077af
[ 37.511047] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[ 37.511052] RSP: 002b:00007ffd3ed95ca0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 37.511059] RAX: 0000000000000000 RBX: 0000564cc06c44e0 RCX: 00007f88181077af
[ 37.511062] RDX: 00007ffd3ed95d30 RSI: 0000000040086409 RDI: 0000000000000015
[ 37.511065] RBP: 00007ffd3ed95d30 R08: 0000564cc03a97f0 R09: 0000000000000000
[ 37.511067] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000040086409
[ 37.511070] R13: 0000000000000015 R14: 0000564cbf636724 R15: 0000564cc03aa260
[ 37.511075] </TASK>
[ 37.511078] ---[ end trace 0000000000000000 ]---
[ 56.601699] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 66.890625] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 77.014704] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 87.407513] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 97.609264] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 107.852470] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 117.995666] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 128.338435] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 138.429784] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 148.744717] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 159.002118] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 169.219300] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 179.558185] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).

Loading...