FS#59515 - [qemu] 2.12.0-1 -> 2.12.0-2 breaks Windows 10 guest

Attached to Project: Arch Linux
Opened by Jimi Bove (Jimi-James) - Friday, 03 August 2018, 04:36 GMT
Last edited by freswa (frederik) - Monday, 14 September 2020, 01:48 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Evangelos Foutras (foutrelis)
Anatol Pomozov (anatolik)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Relevant note:
I'm using the TianoCore BIOS for OVMF and passing thru a PCI USB card. I also passthru a GPU, but I've confirmed this issue happens without the GPU. I plan to test it without *any* PCI passthru (no GPU, no USB card) tomorrow.

Description:
Trying to boot my Windows 10 guest results in a hard freeze on the guest at some point while the Windows logo is loading. If I shut the VM down during said freeze, there's a good chance (not every single time) that it'll hard freeze my entire system with it. I tried restoring a backup of the VM's hard drive from 4 months ago. Same behavior. Downgrading qemu fixed it.

Steps to reproduce:
Upgrade qemu to 2.12.0-2
This task depends upon

Closed by  freswa (frederik)
Monday, 14 September 2020, 01:48 GMT
Reason for closing:  Works for me
Comment by Jimi Bove (Jimi-James) - Friday, 03 August 2018, 16:59 GMT
Alright, this is getting super weird. Up until this moment, this issue has been 100% consistent and predictable over multiple tests. 2.12.0-2 would freeze the guest, and 2.12.0-1 would not. When I tested it without any PCI passthru (not the USB card either), 2.12.0-2 worked. So, I thought, OK, this is a PCI passthru issue, which makes sense because a commit related to that was between 2.12.0-1 and 2.12.0-2. But then I added the USB card back and tried again, with 2.12.0-2, and suddenly for the first time in days on 2.12.0-2, Windows booted fine. So now I don't know what to think. Maybe this issue only affects the first time the guest boots since the last time the host booted? I'll reboot and try again to confirm.
Comment by Anatol Pomozov (anatolik) - Friday, 03 August 2018, 17:06 GMT
Yeah, it is hard to debug such flaky issues. I would suggest to post your experience to qemu-devel list - they might have better explanation of what is going on.
Comment by Jimi Bove (Jimi-James) - Friday, 03 August 2018, 17:27 GMT
It seems the issue has completely disappeared now. Unless anything comes up later, I'm just going to assume that my guest needed to boot *after* the changes in 2.12.0-2, *without* any PCI passthru devices, just once, to permanently acclimate something in the Windows system itself.
Comment by Jimi Bove (Jimi-James) - Friday, 03 August 2018, 17:35 GMT
AHA! Nevermind! I found what's going on. This confusion came from the fact that I've been simultaneously fixing an issue with a new AMD card where my system randomly freezes, and one of the things I've had to do to solve that was disable MSI interrupts. The reason it worked with the USB card on 2.12.0-2 just now was because, at the same time, I switched my kernel parameters from pci=nomsi to amdgpu.msi=0, i.e., I made it so just my GPU and not also the USB card was avoiding MSI interrupts. Now the VM has the same issue--only working on 2.12.0-1--with *just* the GPU, instead of both the GPU and the USB card.

So here are the exact 3 steps to reproduce this bug in 2.12.0-1 -> 2.12.0-2:
1. Have a Windows (10?) guest with a PCI card passed thru to it
2. Disable MSI interrupts for the PCI card in question on the host
3. Upgrade qemu to 2.12.0-2

Which means the more precise description is, "[qemu] 2.12.0-1 -> 2.12.0-2 breaks PCI passthru Windows 10 guest for PCI cards that aren't using MSI interrupts"
Comment by Jimi Bove (Jimi-James) - Sunday, 13 September 2020, 23:55 GMT
Kill Arch Bugs day notice:

I don't know whether this bug is still going on, as I stopped avoiding MSI interrupts quite a while ago. At the same time, I can't confirm whether it's still happening, because my Windows VM has been broken (freezes entire system instead of booting) specifically when trying to pass thru my GPU (the same GPU) for months, for some different, probably unrelated reason that I have yet to get to the bottom of.

Perhaps it would be best to close this until someone else voices their own trouble.

Loading...