Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
FS#74556 - Kernel Oops in 5.17.4 with AMD GPU Reset workaround
Attached to Project:
Arch Linux
Opened by James King (Randomized) - Monday, 25 April 2022, 15:40 GMT
Last edited by Toolybird (Toolybird) - Sunday, 16 October 2022, 20:58 GMT
Opened by James King (Randomized) - Monday, 25 April 2022, 15:40 GMT
Last edited by Toolybird (Toolybird) - Sunday, 16 October 2022, 20:58 GMT
|
DetailsDescription:
I pass-through an AMD graphics card that requires a workaround to reset it (IOMMU Group 38 45:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c1)) After upgrading from the last 5.16 kernel to the latest 5.17 kernel (5.17.4) my script that I use to initiate to work-around by putting the machine to sleep generated a kernel oops. I've subsequently installed linux-lts and have successfully been able to run this script so it appears to be something in the 5.17.x series of kernels. The script is: echo "1" | sudo tee -a /sys/bus/pci/devices/0000\:45\:00.0/remove echo "1" | sudo tee -a /sys/bus/pci/devices/0000\:45\:00.1/remove systemctl suspend The oops follows: Apr 25 08:53:19 tux-master kernel: BUG: kernel NULL pointer dereference, address: 000000000000006c Apr 25 08:53:19 tux-master kernel: #PF: supervisor read access in kernel mode Apr 25 08:53:19 tux-master kernel: #PF: error_code(0x0000) - not-present page Apr 25 08:53:19 tux-master kernel: PGD 0 P4D 0 Apr 25 08:53:19 tux-master kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI Apr 25 08:53:19 tux-master kernel: CPU: 13 PID: 992 Comm: tee Tainted: G W 5.17.4-arch1-1 #1 bba05afeab01638bf5119bbe9f3f1f1452c88ff1 Apr 25 08:53:19 tux-master kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.30 08/14/2018 Apr 25 08:53:19 tux-master kernel: RIP: 0010:pcie_capability_read_dword+0x1c/0xb0 Apr 25 08:53:19 tux-master kernel: Code: eb a9 41 be 86 00 00 00 eb e3 0f 1f 40 00 0f 1f 44 00 00 41 56 41 89 f6 41 55 41 54 55 53 c7 02 00 00 00 00 41 83 e6 03 75 3e <44> 0f b6 6f 6c 48 89 fd 45 84 ed 74 25 89 f3 49 89 d4 e8 5d fe ff Apr 25 08:53:19 tux-master kernel: RSP: 0018:ffffbfde92dc3c10 EFLAGS: 00010246 Apr 25 08:53:19 tux-master kernel: RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000064 Apr 25 08:53:19 tux-master kernel: RDX: ffffbfde92dc3c4c RSI: 000000000000000c RDI: 0000000000000000 Apr 25 08:53:19 tux-master kernel: RBP: ffffa02044335d80 R08: 0000000000000004 R09: ffffbfde92dc3bf4 Apr 25 08:53:19 tux-master kernel: R10: 0000000000000000 R11: 0000000000000044 R12: 0000000000000000 Apr 25 08:53:19 tux-master kernel: R13: 0000000000000040 R14: 0000000000000000 R15: 0000000000000000 Apr 25 08:53:19 tux-master kernel: FS: 00007f9b55145740(0000) GS:ffffa02ffef40000(0000) knlGS:0000000000000000 Apr 25 08:53:19 tux-master kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 25 08:53:19 tux-master kernel: CR2: 000000000000006c CR3: 0000002072658000 CR4: 00000000003506e0 Apr 25 08:53:19 tux-master kernel: Call Trace: Apr 25 08:53:19 tux-master kernel: <TASK> Apr 25 08:53:19 tux-master kernel: pcie_aspm_check_latency.isra.0+0x104/0x210 Apr 25 08:53:19 tux-master kernel: pcie_update_aspm_capable+0xb0/0xe0 Apr 25 08:53:19 tux-master kernel: pcie_aspm_pm_state_change+0x3d/0xa0 Apr 25 08:53:19 tux-master kernel: pci_raw_set_power_state+0x169/0x210 Apr 25 08:53:19 tux-master kernel: pci_set_power_state+0xf8/0x1a0 Apr 25 08:53:19 tux-master kernel: vfio_pci_remove+0x15/0x30 [vfio_pci 4504ca667961aa5b56c0d2e5ce76a10c76fa6bc6] Apr 25 08:53:19 tux-master kernel: pci_device_remove+0x36/0xa0 Apr 25 08:53:19 tux-master kernel: __device_release_driver+0x17a/0x250 Apr 25 08:53:19 tux-master kernel: device_release_driver+0x24/0x30 Apr 25 08:53:19 tux-master kernel: pci_stop_bus_device+0x68/0x90 Apr 25 08:53:19 tux-master kernel: pci_stop_and_remove_bus_device_locked+0x16/0x30 Apr 25 08:53:19 tux-master kernel: remove_store+0x7d/0x90 Apr 25 08:53:19 tux-master kernel: kernfs_fop_write_iter+0x11c/0x1b0 Apr 25 08:53:19 tux-master kernel: new_sync_write+0x15c/0x1f0 Apr 25 08:53:19 tux-master kernel: vfs_write+0x1eb/0x280 Apr 25 08:53:19 tux-master kernel: ksys_write+0x67/0xe0 Apr 25 08:53:19 tux-master kernel: do_syscall_64+0x5c/0x80 Apr 25 08:53:19 tux-master kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Apr 25 08:53:19 tux-master kernel: RIP: 0033:0x7f9b5524a257 Apr 25 08:53:19 tux-master kernel: Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 Apr 25 08:53:19 tux-master kernel: RSP: 002b:00007ffd0ca5a778 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 Apr 25 08:53:19 tux-master kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f9b5524a257 Apr 25 08:53:19 tux-master kernel: RDX: 0000000000000002 RSI: 00007ffd0ca5a8d0 RDI: 0000000000000003 Apr 25 08:53:19 tux-master kernel: RBP: 00007ffd0ca5a8d0 R08: 0000000000001004 R09: 0000000000000001 Apr 25 08:53:19 tux-master kernel: R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000002 Apr 25 08:53:19 tux-master kernel: R13: 00005570dc0684a0 R14: 0000000000000002 R15: 00007f9b553437a0 Apr 25 08:53:19 tux-master kernel: </TASK> Apr 25 08:53:19 tux-master kernel: Modules linked in: hid_microsoft ff_memless mousedev joydev dm_mod nct6775 hwmon_vid iwlmvm snd_usb_audio snd_usbmidi_lib snd_rawmidi mac80211 snd_seq_device mc intel_rapl_msr mxm_wmi wmi_bmof snd_hda_codec_realtek snd> Apr 25 08:53:19 tux-master kernel: xhci_pci_renesas vfio_pci vfio_pci_core irqbypass vfio_virqfd vfio_iommu_type1 vfio Apr 25 08:53:19 tux-master kernel: CR2: 000000000000006c Apr 25 08:53:19 tux-master kernel: ---[ end trace 0000000000000000 ]--- Apr 25 08:53:19 tux-master kernel: RIP: 0010:pcie_capability_read_dword+0x1c/0xb0 Apr 25 08:53:19 tux-master kernel: Code: eb a9 41 be 86 00 00 00 eb e3 0f 1f 40 00 0f 1f 44 00 00 41 56 41 89 f6 41 55 41 54 55 53 c7 02 00 00 00 00 41 83 e6 03 75 3e <44> 0f b6 6f 6c 48 89 fd 45 84 ed 74 25 89 f3 49 89 d4 e8 5d fe ff Apr 25 08:53:19 tux-master kernel: RSP: 0018:ffffbfde92dc3c10 EFLAGS: 00010246 Apr 25 08:53:19 tux-master kernel: RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000064 Apr 25 08:53:19 tux-master kernel: RDX: ffffbfde92dc3c4c RSI: 000000000000000c RDI: 0000000000000000 Apr 25 08:53:19 tux-master kernel: RBP: ffffa02044335d80 R08: 0000000000000004 R09: ffffbfde92dc3bf4 Apr 25 08:53:19 tux-master kernel: R10: 0000000000000000 R11: 0000000000000044 R12: 0000000000000000 Apr 25 08:53:19 tux-master kernel: R13: 0000000000000040 R14: 0000000000000000 R15: 0000000000000000 Apr 25 08:53:19 tux-master kernel: FS: 00007f9b55145740(0000) GS:ffffa02ffef40000(0000) knlGS:0000000000000000 Apr 25 08:53:19 tux-master kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 25 08:53:19 tux-master kernel: CR2: 000000000000006c CR3: 0000002072658000 CR4: 00000000003506e0 |
This task depends upon
Closed by Toolybird (Toolybird)
Sunday, 16 October 2022, 20:58 GMT
Reason for closing: Fixed
Additional comments about closing: linux 6.0.2.arch1-1
Sunday, 16 October 2022, 20:58 GMT
Reason for closing: Fixed
Additional comments about closing: linux 6.0.2.arch1-1
Device: 0a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c0)
Oops: 0000 [#1] PREEMPT SMP NOPTI
Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F36a 02/16/2022
RIP: 0010:vfio_pci_core_unregister_device+0xd/0xa0 [vfio_pci_core]
BUG: kernel NULL pointer dereference, address: 000000000000006c
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 21 PID: 957 Comm: tee Tainted: G W 5.18.3-arch1-1 #1 2090c6f1d9d20f39bd14c0acb6fa89ddb994d43f
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.30 08/14/2018
RIP: 0010:pcie_capability_reg_implemented+0x7/0xd0
Code: 03 00 00 00 48 c7 c7 70 8b d4 b1 5b e9 22 2e b1 ff 0f 0b eb d3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 31 d2 <80> 7f 6c 00 89 f1 74 3e>
RSP: 0018:ffffbb3d926dfbe8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 000000000000000c RCX: 0000000000000064
RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000004 R09: ffffbb3d926dfbd4
R10: 0000000000000044 R11: ffffffffb0b35990 R12: ffffbb3d926dfc24
R13: 0000000000000000 R14: 0000000000001388 R15: 0000000000000000
FS: 00007f7c0b225740(0000) GS:ffff9f8dfe540000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000006c CR3: 00000010795ba000 CR4: 00000000003506e0
Call Trace:
<TASK>
pcie_capability_read_dword+0x2b/0xb0
pcie_aspm_check_latency.isra.0+0x10b/0x210
pcie_update_aspm_capable+0xb1/0xe0
pcie_aspm_pm_state_change+0x41/0xa0
pci_raw_set_power_state+0x137/0x210
vfio_pci_remove+0x19/0x30 [vfio_pci 71a74ce0c543b84b41207595c4fc0aba2b32864c]
pcie_aspm_check_latency.isra.0+0x10b/0x210
pcie_update_aspm_capable+0xb1/0xe0
pcie_aspm_pm_state_change+0x41/0xa0
pci_raw_set_power_state+0x137/0x210
vfio_pci_remove+0x19/0x30 [vfio_pci 71a74ce0c543b84b41207595c4fc0aba2b32864c]
pci_device_remove+0x3a/0xa0
device_release_driver_internal+0x1b3/0x210
pci_stop_bus_device+0x69/0x90
pci_stop_and_remove_bus_device_locked+0x1a/0x30
remove_store+0x82/0xa0
kernfs_fop_write_iter+0x11f/0x1f0
new_sync_write+0x13d/0x1c0
vfs_write+0x1ec/0x270
ksys_write+0x6f/0xf0
pcie_aspm_check_latency.isra.0+0x10b/0x210
pcie_update_aspm_capable+0xb1/0xe0
pcie_aspm_pm_state_change+0x41/0xa0
pci_raw_set_power_state+0x137/0x210
vfio_pci_remove+0x19/0x30 [vfio_pci 71a74ce0c543b84b41207595c4fc0aba2b32864c]
pci_device_remove+0x3a/0xa0
device_release_driver_internal+0x1b3/0x210
pci_stop_bus_device+0x69/0x90
pci_stop_and_remove_bus_device_locked+0x1a/0x30
remove_store+0x82/0xa0
kernfs_fop_write_iter+0x11f/0x1f0
new_sync_write+0x13d/0x1c0
vfs_write+0x1ec/0x270
ksys_write+0x6f/0xf0
do_syscall_64+0x5f/0x90
? syscall_exit_to_user_mode+0x26/0x50
? do_syscall_64+0x6b/0x90
? do_syscall_64+0x6b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f7c0b101c27
Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51>
RSP: 002b:00007ffdbd3677b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f7c0b101c27
RDX: 0000000000000002 RSI: 00007ffdbd367910 RDI: 0000000000000003
RBP: 00007ffdbd367910 R08: 0000000000001004 R09: 0000000000000001
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000002
R13: 000055eb94cd64a0 R14: 0000000000000002 R15: 00007f7c0b1f9940
</TASK>
Modules linked in: snd_usb_audio snd_usbmidi_lib snd_rawmidi hid_microsoft snd_seq_device mc ff_memless mousedev joydev intel_rapl_msr intel_rapl_common amd6>
nvme_core aacraid xhci_pci_renesas vfio_pci vfio_pci_core irqbypass vfio_virqfd vfio_iommu_type1 vfio
CR2: 000000000000006c
---[ end trace 0000000000000000 ]---
RIP: 0010:pcie_capability_reg_implemented+0x7/0xd0
Code: 03 00 00 00 48 c7 c7 70 8b d4 b1 5b e9 22 2e b1 ff 0f 0b eb d3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 31 d2 <80> 7f 6c 00 89 f1 74 3e>
RSP: 0018:ffffbb3d926dfbe8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 000000000000000c RCX: 0000000000000064
RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000004 R09: ffffbb3d926dfbd4
R10: 0000000000000044 R11: ffffffffb0b35990 R12: ffffbb3d926dfc24
R13: 0000000000000000 R14: 0000000000001388 R15: 0000000000000000
FS: 00007f7c0b225740(0000) GS:ffff9f8dfe540000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000006c CR3: 00000010795ba000 CR4: 00000000003506e0