FS#79384 - latest Zen Inception fixes breaks nested kvm virtualization on AMD

Attached to Project: Arch Linux
Opened by Oliver (theob) - Tuesday, 15 August 2023, 15:29 GMT
Last edited by Toolybird (Toolybird) - Tuesday, 15 August 2023, 21:22 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To No-one
Architecture x86_64
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

oday I updated to 6.4.10-arch1-1 on arch linux. This broke my setup with running a KVM nested virtualization within a KVM VM. Problem seems kernel update related not distribution specific since others report same issue on a totally different setup: https://forum.proxmox.com/threads/amd-incpetion-fixes-cause-qemu-kvm-memory-leak.132057/#post-581207

Steps to reproduce:
1. Update to arch kernel 6.4.10-arch1-1
2. start a kvm vm ("hostVM")
2. within that hostVM start a nestedVM.
3. Memory consumption of the quemu process within the hostVM goes beyond available memory. Then the nestedVM gets OOM killed before even being started.

I tried to setup fresh nestedVMs with no luck, same problem.

Reverting to an earlier kernel (6.4.7 on arch linux) lets everything work again.

Arch linux host kernel: 6.4.10-arch1-1 (this induces the problems, rest was unchanged)
hostVM kernel: 5.15.107+truenas
nestedVM kernel: 5.15.0-78-generic

I created an upstream kernel bug report since i think this kernel related: https://bugzilla.kernel.org/show_bug.cgi?id=217796

Logs from the hostVM when OOM happens:

Aug 15 10:59:41 truenas kernel: CPU 0/KVM invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
Aug 15 10:59:42 truenas kernel: CPU: 9 PID: 7079 Comm: CPU 0/KVM Tainted: P OE 5.15.107+truenas #1
Aug 15 10:59:43 truenas kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
Aug 15 10:59:43 truenas kernel: Call Trace:
Aug 15 10:59:43 truenas kernel: <TASK>
Aug 15 10:59:43 truenas kernel: dump_stack_lvl+0x46/0x5e
Aug 15 10:59:43 truenas kernel: dump_header+0x4a/0x1f4
Aug 15 10:59:43 truenas kernel: oom_kill_process.cold+0xb/0x10
Aug 15 10:59:43 truenas kernel: out_of_memory+0x1bd/0x4f0
Aug 15 10:59:43 truenas kernel: __alloc_pages_slowpath.constprop.0+0xc30/0xd00
Aug 15 10:59:44 truenas kernel: __alloc_pages+0x1e9/0x220
Aug 15 10:59:44 truenas kernel: __get_free_pages+0xd/0x40
Aug 15 10:59:44 truenas kernel: kvm_mmu_topup_memory_cache+0x56/0x80 [kvm]
Aug 15 10:59:44 truenas kernel: mmu_topup_memory_caches+0x39/0x70 [kvm]
Aug 15 10:59:44 truenas kernel: direct_page_fault+0x3d9/0xbb0 [kvm]
Aug 15 10:59:44 truenas kernel: ? kvm_mtrr_check_gfn_range_consistency+0x61/0x120 [kvm]
Aug 15 10:59:44 truenas kernel: kvm_mmu_page_fault+0x7a/0x730 [kvm]
Aug 15 10:59:44 truenas kernel: ? ktime_get+0x38/0xa0
Aug 15 10:59:44 truenas kernel: ? lock_timer_base+0x61/0x80
Aug 15 10:59:44 truenas kernel: ? __svm_vcpu_run+0x5f/0xf0 [kvm_amd]
Aug 15 10:59:44 truenas kernel: ? __svm_vcpu_run+0x59/0xf0 [kvm_amd]
Aug 15 10:59:44 truenas kernel: ? __svm_vcpu_run+0xaa/0xf0 [kvm_amd]
Aug 15 10:59:44 truenas kernel: ? load_fixmap_gdt+0x22/0x30
Aug 15 10:59:44 truenas kernel: ? native_load_tr_desc+0x67/0x70
Aug 15 10:59:44 truenas kernel: ? x86_virt_spec_ctrl+0x43/0xb0
Aug 15 10:59:44 truenas kernel: kvm_arch_vcpu_ioctl_run+0xbff/0x1750 [kvm]
Aug 15 10:59:44 truenas kernel: kvm_vcpu_ioctl+0x278/0x660 [kvm]
Aug 15 10:59:44 truenas kernel: ? __seccomp_filter+0x385/0x5c0
Aug 15 10:59:44 truenas kernel: __x64_sys_ioctl+0x8b/0xc0
Aug 15 10:59:44 truenas kernel: do_syscall_64+0x3b/0xc0
Aug 15 10:59:44 truenas kernel: entry_SYSCALL_64_after_hwframe+0x61/0xcb
Aug 15 10:59:44 truenas kernel: RIP: 0033:0x7f29eee166b7
Aug 15 10:59:45 truenas kernel: Code: Unable to access opcode bytes at RIP 0x7f29eee1668d.
Aug 15 10:59:45 truenas kernel: RSP: 002b:00007f27f35fd4c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Aug 15 10:59:45 truenas kernel: RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f29eee166b7
Aug 15 10:59:45 truenas kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000015
Aug 15 10:59:45 truenas kernel: RBP: 00005558a87d3f00 R08: 00005558a7e52848 R09: 00005558a827c580
Aug 15 10:59:45 truenas kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Aug 15 10:59:45 truenas kernel: R13: 00005558a8298bc0 R14: 00007f27f35fd780 R15: 0000000000802000
Aug 15 10:59:45 truenas kernel: </TASK>
Aug 15 10:59:45 truenas kernel: Mem-Info:

This task depends upon

Closed by  Toolybird (Toolybird)
Tuesday, 15 August 2023, 21:22 GMT
Reason for closing:  Upstream
Additional comments about closing:  See comments
Comment by loqs (loqs) - Tuesday, 15 August 2023, 16:53 GMT
Does adjusting the value of the spec_rstack_overflow kernel parameter have any effect?
spec_rstack_overflow=off
spec_rstack_overflow=safe-ret
spec_rstack_overflow=ibpb
spec_rstack_overflow=ibpb-vmexit

Does applying [1] or all five commits currently in the x86/urgent branch of tip [2] resolve the issue?

[1] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/urgent&id=ba5ca5e5e6a1d55923e88b4a83da452166f5560e
[2] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/urgent
Comment by Oliver (theob) - Tuesday, 15 August 2023, 17:48 GMT
I reinstalled 6.4.10-arch1-1 booted with spec_rstack_overflow=off and it is working now (thanks for this workaround for now)!
So it is related to the mitigations...

Haven't had time to try [1]. I need to setup similar setup on my test box first.

Comment by Toolybird (Toolybird) - Tuesday, 15 August 2023, 21:22 GMT
> not distribution specific since others report same issue on a totally different setup

Well, clearly not an Arch packaging bug then.