FS#74716 - [qemu-system-x86] SIGABRT with KVM - kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' fa

Attached to Project: Arch Linux
Opened by Frantisek Sumsal (mrc0mmand) - Thursday, 12 May 2022, 09:55 GMT
Last edited by Toolybird (Toolybird) - Wednesday, 27 July 2022, 07:47 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To No-one
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
Latest qemu-system-x86 crashes with SIGABRT on startup with KVM enabled:

```
# qemu-system-x86_64 -machine accel=kvm -enable-kvm -cpu host /boot/initramfs-linux.img
WARNING: Image format was not specified for '/boot/initramfs-linux.img' and probing guessed raw.
Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
Specify the 'raw' format explicitly to remove the restrictions.
qemu-system-x86_64: error: failed to set MSR 0xc0000104 to 0x100000000
qemu-system-x86_64: ../qemu-7.0.0/target/i386/kvm/kvm.c:2996: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
Aborted (core dumped)

```

Additional info:
* package version(s)
qemu-system-x86 7.0.0-9

Steps to reproduce:
# qemu-system-x86_64 -machine accel=kvm -enable-kvm -cpu host /boot/initramfs-linux.img

Coredump info:
PID: 843 (qemu-system-x86)
UID: 0 (root)
GID: 0 (root)
Signal: 6 (ABRT)
Timestamp: Thu 2022-05-12 09:53:13 UTC (14s ago)
Command Line: qemu-system-x86_64 -machine accel=kvm -enable-kvm -cpu host /boot/initramfs-linux.img
Executable: /usr/bin/qemu-system-x86_64
Control Group: /user.slice/user-1000.slice/session-4.scope
Unit: session-4.scope
Slice: user-1000.slice
Session: 4
Owner UID: 1000 (vagrant)
Boot ID: 720d165b59734239a839778c4f0b6ec5
Machine ID: 10f52eb518164e1691a8b304a5bf3a43
Hostname: arch.localdomain
Storage: /var/lib/systemd/coredump/core.qemu-system-x86.0.720d165b59734239a839778c4f0b6ec5.843.1652349193000000.zst (present)
Disk Size: 1.1M
Message: Process 843 (qemu-system-x86) of user 0 dumped core.

Module linux-vdso.so.1 with build-id ae8518a0710c2679844504efb751b01abf13cef5
Module libcap-ng.so.0 with build-id 09690c43af29ef92bbec2e53e29101b2b8e9c48c
Module libblkid.so.1 with build-id 140694a62d8d4d07c6c320a501f948dd1b389d73
Module libcrypto.so.1.1 with build-id f94a24f9ce8f3f394c3df23f7d436796797d4459
Module liblzma.so.5 with build-id 28b40c7af8098a66af6ee093b6986b91cad7694d
Module libbrotlicommon.so.1 with build-id a4ba3f4b4571c8272343b621da812a6e24a202a7
Module libpthread.so.0 with build-id 7fa8b52fae071a370ba4ca32bf9490a30aff31c4
Module libaudit.so.1 with build-id 27ca9470fd239e2f61c83e293f24f266789485b6
Module libelf.so.1 with build-id 4cf96cb4785e1ca233693ae17fa0d62971ee09c2
Module libpcre.so.1 with build-id 845483dd0acba86de9f0313102bebbaf3ce52767
Module libffi.so.8 with build-id f0a9586cf0f42d2b9971bd1065ca3a6b19f4a2c2
Module libmount.so.1 with build-id 4436aeea0cd8c01b5a77969e0531184f8b3513ce
Module libkmod.so.2 with build-id 447e6072ef09d5e282332034705f86420c34e54e
Module libuuid.so.1 with build-id 032a21acd159ee3902605e9911be5f86a7df7df9
Module libdl.so.2 with build-id bb9bd2657bfba9f60bd34d2050cc63a7eb024bc4
Module libgmp.so.10 with build-id e58d34ab389d1b649c24195c2d145e3ff2e58290
Module libhogweed.so.6 with build-id 2d70cff7b1841b4d9ca4e8e7726cd4b944c07fdc
Module libnettle.so.8 with build-id 9a878e513c02007598fcf1e2e286c2203f13536e
Module libtasn1.so.6 with build-id ee3429ca5e94718aea4fe5249fc859e0cd88e4e9
Module libunistring.so.2 with build-id 015ac6d6bcb60b7d8bea31a80d1941b06e8636ab
Module libidn2.so.0 with build-id 1ce2b50ad9f9821c2c629b521cf5a3c99593d332
Module libbrotlidec.so.1 with build-id 45defc036e918e0140a72f1fbce6e7692d38241d
Module libbrotlienc.so.1 with build-id 81a4bdfe7d85b8daa2297869b1e9b35c28fe189e
Module libp11-kit.so.0 with build-id cc372ea3c28c4d3dfc633b4d2e933c8584d2af16
Module libstdc++.so.6 with build-id 88ad4eff81a00c684abfe0f863e87434123d8943
Module ld-linux-x86-64.so.2 with build-id c09c6f50f6bcec73c64a0b4be77eadb8f7202410
Module libc.so.6 with build-id 85766e9d8458b16e9c7ce6e07c712c02b8471dbc
Module libgcc_s.so.1 with build-id 5d817452a709ca3a213341555ddcf446ecee37fa
Module libm.so.6 with build-id 596b63a006a4386dcab30912d2b54a7a61827b07
Module libaio.so.1 with build-id f41b69db8468baa07d466cc240c0794b5ff52b92
Module libfuse3.so.3 with build-id e859cc0cfbe1388b71174fba0701ac7bef5ed62c
Module libpam.so.0 with build-id bb11b2685fe89555938ffd330ea44d82b0f8701c
Module libgmodule-2.0.so.0 with build-id 5d0db204364cefb16d6d80f9e40df7c3d86023b3
Module liburing.so.2 with build-id c7f5471ffaddf14493c661e39976cf4f43aa43a1
Module libbpf.so.0 with build-id 2cd05f37adf35ebab500ff2fa6f5eda457d608b2
Module libudev.so.1 with build-id 7dc938362569112855b6086de066cd6a18d1b978
Module libvdeplug.so.3 with build-id f5692d20d0c82bba981746e991ea525fdea7b9b2
Module libslirp.so.0 with build-id a7ecc447cfee5680a9308021e994ade25c3c9da3
Module libzstd.so.1 with build-id 3bccb8fe08e48d5ea135b1d0f99de0d771dd752f
Module libglib-2.0.so.0 with build-id d6c7c03d71a1b71f59e10016323136de55f43266
Module libgobject-2.0.so.0 with build-id f5126c30685462884948f1048f2039305c67f5c5
Module libgio-2.0.so.0 with build-id 3f16bee59e25c8bfbb70c4e78a3c90ee79ba4469
Module libnuma.so.1 with build-id ecf6af9807840e498f8027d31fe97fff1aa5afaf
Module libfdt.so.1 with build-id 7089f0e5cd72e16ad74053fe689ef4b0e87e95b7
Module libseccomp.so.2 with build-id 54179323d84e1b713b7547ba0b3f8310e65eec93
Module libdaxctl.so.1 with build-id 203da370da341b7890e2cafaa2b0f416def38974
Module libsasl2.so.3 with build-id 626ba9e8e877a809393c4d5a48ef6bdd8d30f817
Module libgnutls.so.30 with build-id 5b2955e99a56f895cb70144748d096b5c4f7bf83
Module libjpeg.so.8 with build-id 324d9d66f01707241e31af5cc104db3c9122f4c4
Module libpng16.so.16 with build-id 2dc0bce07f199bf983c07a05fb95a6f4af83a9b3
Module libz.so.1 with build-id 1fb800ce60ddb605ebe23f9702adcd341c7c8970
Module liblzo2.so.2 with build-id ed8e33ba505954ca344aea58d10c7b8a37fd2f39
Module libsnappy.so.1 with build-id 36e3fb247a476fe2f755162644ebcd8ebd5d92cb
Module libpixman-1.so.0 with build-id 341f793dcada3a48a306a793d265a517e3f2e7d6
Module qemu-system-x86_64 with build-id 5638dd4a047239c9384135ce89a908a844b30dd5
Stack trace of thread 848:
#0 0x00007f46e7ad034c __pthread_kill_implementation (libc.so.6 + 0x8f34c)
#1 0x00007f46e7a834b8 raise (libc.so.6 + 0x424b8)
#2 0x00007f46e7a6d534 abort (libc.so.6 + 0x2c534)
#3 0x00007f46e7a6d45c __assert_fail_base.cold (libc.so.6 + 0x2c45c)
#4 0x00007f46e7a7c116 __assert_fail (libc.so.6 + 0x3b116)
#5 0x000055f8676358d5 kvm_buf_set_msrs (qemu-system-x86_64 + 0x6ac8d5)
#6 0x000055f86763a81a kvm_arch_put_registers (qemu-system-x86_64 + 0x6b181a)
#7 0x000055f8677d5902 do_kvm_cpu_synchronize_post_init (qemu-system-x86_64 + 0x84c902)
#8 0x000055f8673d9091 process_queued_cpu_work (qemu-system-x86_64 + 0x450091)
#9 0x000055f8677d73d8 kvm_vcpu_thread_fn (qemu-system-x86_64 + 0x84e3d8)
#10 0x000055f8679f00c7 qemu_thread_start (qemu-system-x86_64 + 0xa670c7)
#11 0x00007f46e7ace5c2 start_thread (libc.so.6 + 0x8d5c2)
#12 0x00007f46e7b53584 __clone (libc.so.6 + 0x112584)

Stack trace of thread 844:
#0 0x00007f46e7b19a55 clock_nanosleep@GLIBC_2.2.5 (libc.so.6 + 0xd8a55)
#1 0x00007f46e7b1e717 __nanosleep (libc.so.6 + 0xdd717)
#2 0x00007f46e7f93289 g_usleep (libglib-2.0.so.0 + 0x80289)
#3 0x000055f8679ff653 call_rcu_thread (qemu-system-x86_64 + 0xa76653)
#4 0x000055f8679f00c7 qemu_thread_start (qemu-system-x86_64 + 0xa670c7)
#5 0x00007f46e7ace5c2 start_thread (libc.so.6 + 0x8d5c2)
#6 0x00007f46e7b53584 __clone (libc.so.6 + 0x112584)

Stack trace of thread 843:
#0 0x00007f46e7acb15a __futex_abstimed_wait_common (libc.so.6 + 0x8a15a)
#1 0x00007f46e7acd960 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8c960)
#2 0x000055f8679f4760 qemu_cond_wait_impl (qemu-system-x86_64 + 0xa6b760)
#3 0x000055f8673d8df8 do_run_on_cpu (qemu-system-x86_64 + 0x44fdf8)
#4 0x000055f86740eb60 cpu_synchronize_all_post_init (qemu-system-x86_64 + 0x485b60)
#5 0x000055f8674ec39f qdev_machine_creation_done (qemu-system-x86_64 + 0x56339f)
#6 0x000055f86742768e qmp_x_exit_preconfig.part.0 (qemu-system-x86_64 + 0x49e68e)
#7 0x000055f867428efc qemu_init (qemu-system-x86_64 + 0x49fefc)
#8 0x000055f8673cda8d main (qemu-system-x86_64 + 0x444a8d)
#9 0x00007f46e7a6e310 __libc_start_call_main (libc.so.6 + 0x2d310)
#10 0x00007f46e7a6e3c1 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2d3c1)
#11 0x000055f8673d02e5 _start (qemu-system-x86_64 + 0x4472e5)

Stack trace of thread 845:
#0 0x00007f46e7acb15a __futex_abstimed_wait_common (libc.so.6 + 0x8a15a)
#1 0x00007f46e7ad62f3 __new_sem_wait_slow64.constprop.0 (libc.so.6 + 0x952f3)
#2 0x000055f8679efb31 qemu_sem_timedwait (qemu-system-x86_64 + 0xa66b31)
#3 0x000055f867a1bc44 worker_thread (qemu-system-x86_64 + 0xa92c44)
#4 0x000055f8679f00c7 qemu_thread_start (qemu-system-x86_64 + 0xa670c7)
#5 0x00007f46e7ace5c2 start_thread (libc.so.6 + 0x8d5c2)
#6 0x00007f46e7b53584 __clone (libc.so.6 + 0x112584)
ELF object binary architecture: AMD x86-64
This task depends upon

Closed by  Toolybird (Toolybird)
Wednesday, 27 July 2022, 07:47 GMT
Reason for closing:  Fixed
Additional comments about closing:  Reporter says "Fixed in the current kernel (linux) packages."
Comment by Frantisek Sumsal (mrc0mmand) - Thursday, 12 May 2022, 12:40 GMT
I forgot to mention that I've reproduced this with _nested_ KVM (I don't have any machine at hand right now to test it with plain KVM).
Comment by loqs (loqs) - Thursday, 12 May 2022, 13:35 GMT Comment by Frantisek Sumsal (mrc0mmand) - Tuesday, 24 May 2022, 12:39 GMT
That indeed looks like the same issue, and the minimal reproducer is (also indeed):

```
# qemu-system-x86_64 -enable-kvm
qemu-system-x86_64: error: failed to set MSR 0xc0000104 to 0x100000000
qemu-system-x86_64: ../qemu-7.0.0/target/i386/kvm/kvm.c:2996: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
Aborted (core dumped)
```

So far I've managed to reproduce it only on certain setups, like:

CentOS 8 Stream (bare metal, kernel-4.18.0-383.el8.x86_64) -> Arch Linux (KVM, linux 5.17.9.arch1-1) -> Arch Linux (attempted nested KVM with `qemu-system-x86_64 -enable-kvm`)

Snippet from `/proc/cpuinfo`:
```
processor : 7
vendor_id : AuthenticAMD
cpu family : 21
model : 2
model name : AMD Opteron 63xx class CPU
stepping : 0
microcode : 0x1000065
cpu MHz : 2000.070
cache size : 512 KB
physical id : 7
siblings : 1
core id : 0
cpu cores : 1
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw xop fma4 tbm perfctr_core ssbd ibpb vmmcall tsc_adjust bmi1 virt_ssbd arat npt nrip_save arch_capabilities
bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 4001.81
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management:
```
Comment by loqs (loqs) - Tuesday, 24 May 2022, 14:13 GMT
I think it is also the same as https://bugzilla.kernel.org/show_bug.cgi?id=216017 which needs more information.
Comment by Frantisek Sumsal (mrc0mmand) - Tuesday, 24 May 2022, 14:48 GMT
I went through the Debian report linked in the LKML thread you mentioned, and the issue appears to be resolved in kernel 5.18.0 (at least from I've tried on a baremetal Fedora Rawhide machine, which I have at hand):

1) kernel-5.18.0-0.rc7.20220519gitf993aed406ea.56.fc37.x86_64 [0]

```
[root@dell-pem605-01 ~]# uname -r
5.18.0-0.rc7.20220519gitf993aed406ea.56.fc37.x86_64
[root@dell-pem605-01 ~]# qemu-system-x86_64 -enable-kvm -nographic
^[c^[[?7l^[[2J^[[0mSeaBIOS (version 1.16.0-1.fc37)^M


iPXE (https://ipxe.org) 00:03.0 CA00 PCI2.10 PnP PMM+07F8C230+07ECC230 CA00^M
Press Ctrl-B to configure iPXE (PCI 00:03.0)...^M ^M

<...snip...>
```

2) kernel-5.17.9-300.fc36.x86_64 [1]

```
[root@dell-pem605-01 ~]# uname -r
5.17.9-300.fc36.x86_64
[root@dell-pem605-01 ~]# qemu-system-x86_64 -enable-kvm
qemu-system-x86_64: error: failed to set MSR 0xc0000104 to 0x100000000
qemu-system-x86_64: ../target/i386/kvm/kvm.c:2996: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
Aborted (core dumped)
```

3) kernel-5.16.12-200.fc35.x86_64 [2]

```
[root@dell-pem605-01 ~]# uname -r
5.16.12-200.fc35.x86_64
[root@dell-pem605-01 ~]# qemu-system-x86_64 -enable-kvm
qemu-system-x86_64: error: failed to set MSR 0xc0000104 to 0x100000000
qemu-system-x86_64: ../target/i386/kvm/kvm.c:2996: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
Aborted (core dumped)
```


Given the results and the fact that we have 5.17.9.arch1-1 in Arch, I'd say (or more like hope) that this should get resolved automagically once Arch moves to kernel 5.18.0.


[0] https://koji.fedoraproject.org/koji/buildinfo?buildID=1968573
[1] https://koji.fedoraproject.org/koji/buildinfo?buildID=1968153
[2] https://koji.fedoraproject.org/koji/buildinfo?buildID=1927881
Comment by Frantisek Sumsal (mrc0mmand) - Tuesday, 24 May 2022, 15:55 GMT
I went through the kernel changelog between 5.17 and 5.18 and this commit looks related: https://github.com/torvalds/linux/commit/5a1bde46f98b893cda6122b00e94c0c40a6ead3c. I'll do a couple more tests to, hopefully, confirm this.
Comment by Toolybird (Toolybird) - Wednesday, 27 July 2022, 05:21 GMT
What's the status here? I'm not seeing other reports so hopefully this was fixed in 5.18.x ? Please let us know.
Comment by Frantisek Sumsal (mrc0mmand) - Wednesday, 27 July 2022, 07:31 GMT
Indeed, I haven't seen this for a while, so it seems to be fixed by one of the 5.18 minor releases.

Loading...