FS#63730 - [linux] 5.2.11+ (at least through 5.2.14) on host locks up KVM VM's using > nproc/2 virtCPUs
Attached to Project:
Arch Linux
Opened by James Harvey (jamespharvey20) - Thursday, 12 September 2019, 03:21 GMT
Last edited by Antonio Rojas (arojas) - Saturday, 21 September 2019, 09:07 GMT
Opened by James Harvey (jamespharvey20) - Thursday, 12 September 2019, 03:21 GMT
Last edited by Antonio Rojas (arojas) - Saturday, 21 September 2019, 09:07 GMT
|
Details
Description: Host running linux 5.2.10 or earlier
successfully boots. Host 5.2.11-5.2.14 with hyperthreading
and a VM using more than host's nproc/2 virtual CPUs hangs
in early boot stage. Booting UEFI grub/systemd shows
"Loading Initial Ramdisk..." and hangs. Booting UEFI ISO
goes to a black screen. Host shows 100% CPU usage * number
of virtual CPUs. So, for example, my 16 physical core system
with hyperthreading shows nproc of 32. 5.2.11+ allows a VM
with up to 16 virtual CPUs to boot, but 18 or more hangs
forever. A race condition is probably involved, because
about 5% of boot attempts expected to hang succeed.
Additional info: * linux 5.2.11-5.2.14. This is a KVM bug. Version of QEMU doesn't seem to matter - 4.0.x and 4.1.x have identical behavior. * See https://www.spinics.net/lists/kvm/msg195171.html * Unknown if other hypervisors using KVM will run into this Workarounds: * Use linux 5.2.10 or 5.2.11+ custom made with commit 2ad350fb4c reverted * Temporarily decrease number of virtual CPUs given to each specific VM to be <= nproc/2 on host Arch maintainers: Probably nothing to do here, but wait for upstream to release a fix and mark closed. Commit 2ad350fb4c could be considered to be reverted for Arch, but upstream says that isn't viable because it would be at the expense of a fix for a "regression with device assignment" regarding to removing memslots. And, 5.2.11 was released almost 2 weeks ago and I haven't seen others reporting this issue. |
This task depends upon
I confirmed it fixes the problem for me.