FS#72092 - [linux-zen] Unable to detach GPU with libvirt

Attached to Project: Arch Linux
Opened by Michele Pappalardo (Wiichele) - Friday, 10 September 2021, 14:05 GMT
Last edited by Jonas Witschel (diabonas) - Monday, 22 November 2021, 09:37 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Jan Alexander Steffens (heftig)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
with kernel 5.14.2.zen1-2 libvirt can't start my VM after failing to detach the dGPU from the host (single GPU passthrough), leaving it in a unusable state.

Using ssh I was able to read these errors from the journal:
kernel: amdgpu 0000:09:00.0: amdgpu: Fail to disable thermal alert!
kernel: BUG: unable to handle page fault for address: ffffad8fe09e6000
kernel: #PF: supervisor write access in kernel mode
kernel: #PF: error_code(0x0002) - not-present page

With kernel 5.13.13-zen1-1-zen everything works with the usual error:
kernel: [drm:amdgpu_pci_remove [amdgpu]] *ERROR* Hotplug removal is not supported
In fact this doesn't stop the VM from running in a perfect state.

Removing or not the vendor-reset module (https://aur.archlinux.org/packages/vendor-reset-dkms-git) doesn't change anything.

System info:
CPU: AMD Ryzen 5 3600 (12) @ 3.600GHz
GPU: AMD Radeon RX 5600 XT
Motherboard: TUF GAMING B550-PLUS

Steps to reproduce:
1. Create or use a libvirt VM configuration with single GPU passthrough setup (like https://github.com/Wiichele/vfio)
2. Start the VM with virsh and wait for the GPU to crash
This task depends upon

Closed by  Jonas Witschel (diabonas)
Monday, 22 November 2021, 09:37 GMT
Reason for closing:  Fixed
Additional comments about closing:  linux-zen 5.15.3.zen1-1
Comment by Michele Pappalardo (Wiichele) - Monday, 13 September 2021, 11:01 GMT Comment by Michele Pappalardo (Wiichele) - Friday, 17 September 2021, 10:39 GMT
Issue resolved here: https://gitlab.freedesktop.org/drm/amd/-/issues/1081

Waiting for the patches in the stable kernel

Loading...