FS#67649 - [linux] CPU soft lockup in smp_call_function_many_cond during boot with SMP
Attached to Project:
Arch Linux
Opened by Frantisek Sumsal (mrc0mmand) - Thursday, 20 August 2020, 10:14 GMT
Last edited by freswa (frederik) - Sunday, 13 September 2020, 15:48 GMT
Opened by Frantisek Sumsal (mrc0mmand) - Thursday, 20 August 2020, 10:14 GMT
Last edited by freswa (frederik) - Sunday, 13 September 2020, 15:48 GMT
|
Details
Description:
Hello, After upgrading our systemd CI KVM machines from kernel 5.7.12.arch1-1 to 5.8.1.arch1-1 they now experience random (and quite frequent) CPU soft lockups when trying to online non-boot CPUs during boot. Both the host and the guest are using the same package set. [ 3.995230] smp: Bringing up secondary CPUs ... [ 3.998969] x86: Booting SMP configuration: [ 4.001485] .... node #0, CPUs: #1 [ 1.308122] kvm-clock: cpu 1, msr 15201041, secondary cpu clock [ 1.308122] smpboot: CPU 1 Converting physical 0 to logical die 1 [ 4.368244] KVM setup async PF for cpu 1 [ 4.371435] kvm-stealtime: cpu 1, msr 1e46e080 [ 4.379139] #2 [ 1.308122] kvm-clock: cpu 2, msr 15201081, secondary cpu clock [ 1.308122] smpboot: CPU 2 Converting physical 0 to logical die 2 [ 4.664906] KVM setup async PF for cpu 2 [ 4.668102] kvm-stealtime: cpu 2, msr 1e4ae080 [ 4.675603] #3 [ 1.308122] kvm-clock: cpu 3, msr 152010c1, secondary cpu clock [ 1.308122] smpboot: CPU 3 Converting physical 0 to logical die 3 [ 4.700019] KVM setup async PF for cpu 3 [ 4.701435] kvm-stealtime: cpu 3, msr 1e4ee080 [ 4.712238] #4 [ 1.308122] kvm-clock: cpu 4, msr 15201101, secondary cpu clock [ 1.308122] smpboot: CPU 4 Converting physical 0 to logical die 4 [ 4.734249] KVM setup async PF for cpu 4 [ 4.734769] kvm-stealtime: cpu 4, msr 1e52e080 [ 4.745775] #5 [ 1.308122] kvm-clock: cpu 5, msr 15201141, secondary cpu clock [ 1.308122] smpboot: CPU 5 Converting physical 0 to logical die 5 [ 4.914477] KVM setup async PF for cpu 5 [ 4.914769] kvm-stealtime: cpu 5, msr 1e56e080 [ 4.929062] #6 [ 1.308122] kvm-clock: cpu 6, msr 15201181, secondary cpu clock [ 1.308122] smpboot: CPU 6 Converting physical 0 to logical die 6 [ 4.951806] KVM setup async PF for cpu 6 [ 4.954769] kvm-stealtime: cpu 6, msr 1e5ae080 [ 4.965788] #7 [ 1.308122] kvm-clock: cpu 7, msr 152011c1, secondary cpu clock [ 1.308122] smpboot: CPU 7 Converting physical 0 to logical die 7 [ 5.860012] KVM setup async PF for cpu 7 [ 5.861435] kvm-stealtime: cpu 7, msr 1e5ee080 [ 31.991435] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] [ 31.991435] Modules linked in: [ 31.991435] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.1-arch1-1 #1 [ 31.991435] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014 [ 31.991435] RIP: 0010:smp_call_function_many_cond+0x2a3/0x2f0 [ 31.991435] Code: 73 62 3c 00 3b 05 51 6c 82 01 89 c7 0f 83 f4 fd ff ff 48 63 c7 49 8b 55 00 48 03 14 c5 40 49 02 94 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75 f7 eb c9 48 c7 c2 20 91 56 94 48 89 ee 44 89 ff [ 31.991435] RSP: 0018:ffff988ac0013da8 EFLAGS: 00000202 [ 31.991435] RAX: 0000000000000011 RBX: 0000000000000000 RCX: 0000000000000006 [ 31.991435] RDX: ffff96365e5b20a0 RSI: 0000000000000000 RDI: 0000000000000006 [ 31.991435] RBP: 0000000000000007 R08: 0000000000000000 R09: 0000000000000006 [ 31.991435] R10: 0000000000000005 R11: 0000000000000005 R12: 0000000000000000 [ 31.991435] R13: ffff96365e42d300 R14: 0000000000000140 R15: ffff96365e42d308 [ 31.991435] FS: 0000000000000000(0000) GS:ffff96365e400000(0000) knlGS:0000000000000000 [ 31.991435] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 31.991435] CR2: ffff963655401000 CR3: 0000000014a0a000 CR4: 00000000000406f0 [ 31.991435] Call Trace: [ 31.991435] ? _raw_spin_unlock+0x16/0x30 [ 31.991435] ? text_poke_loc_init+0x160/0x160 [ 31.991435] ? text_poke_loc_init+0x160/0x160 [ 31.991435] on_each_cpu+0x43/0xb0 [ 31.991435] text_poke_bp_batch+0x9f/0x190 [ 31.991435] text_poke_finish+0x1b/0x26 [ 31.991435] arch_jump_label_transform_apply+0x16/0x30 [ 31.991435] static_key_enable_cpuslocked+0x57/0x90 [ 31.991435] static_key_enable+0x16/0x20 [ 31.991435] _cpu_up+0xc6/0x140 [ 31.991435] cpu_up+0x7e/0xc0 [ 31.991435] bringup_nonboot_cpus+0x48/0x50 [ 31.991435] smp_init+0x26/0x73 [ 31.991435] kernel_init_freeable+0xd0/0x214 [ 31.991435] ? rest_init+0xbf/0xbf [ 31.991435] kernel_init+0xa/0x101 [ 31.991435] ret_from_fork+0x22/0x30 [ 59.991435] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] [ 59.991435] Modules linked in: [ 59.991435] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G L 5.8.1-arch1-1 #1 [ 59.991435] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014 [ 59.991435] RIP: 0010:smp_call_function_many_cond+0x2a3/0x2f0 [ 59.991435] Code: 73 62 3c 00 3b 05 51 6c 82 01 89 c7 0f 83 f4 fd ff ff 48 63 c7 49 8b 55 00 48 03 14 c5 40 49 02 94 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75 f7 eb c9 48 c7 c2 20 91 56 94 48 89 ee 44 89 ff [ 59.991435] RSP: 0018:ffff988ac0013da8 EFLAGS: 00000202 [ 59.991435] RAX: 0000000000000011 RBX: 0000000000000000 RCX: 0000000000000006 [ 59.991435] RDX: ffff96365e5b20a0 RSI: 0000000000000000 RDI: 0000000000000006 [ 59.991435] RBP: 0000000000000007 R08: 0000000000000000 R09: 0000000000000006 [ 59.991435] R10: 0000000000000005 R11: 0000000000000005 R12: 0000000000000000 [ 59.991435] R13: ffff96365e42d300 R14: 0000000000000140 R15: ffff96365e42d308 [ 59.991435] FS: 0000000000000000(0000) GS:ffff96365e400000(0000) knlGS:0000000000000000 [ 59.991435] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 59.991435] CR2: ffff963655401000 CR3: 0000000014a0a000 CR4: 00000000000406f0 [ 59.991435] Call Trace: [ 59.991435] ? _raw_spin_unlock+0x16/0x30 [ 59.991435] ? text_poke_loc_init+0x160/0x160 [ 59.991435] ? text_poke_loc_init+0x160/0x160 [ 59.991435] on_each_cpu+0x43/0xb0 [ 59.991435] text_poke_bp_batch+0x9f/0x190 [ 59.991435] text_poke_finish+0x1b/0x26 [ 59.991435] arch_jump_label_transform_apply+0x16/0x30 [ 59.991435] static_key_enable_cpuslocked+0x57/0x90 [ 59.991435] static_key_enable+0x16/0x20 [ 59.991435] _cpu_up+0xc6/0x140 [ 59.991435] cpu_up+0x7e/0xc0 [ 59.991435] bringup_nonboot_cpus+0x48/0x50 [ 59.991435] smp_init+0x26/0x73 [ 59.991435] kernel_init_freeable+0xd0/0x214 [ 59.991435] ? rest_init+0xbf/0xbf [ 59.991435] kernel_init+0xa/0x101 [ 59.991435] ret_from_fork+0x22/0x30 [ 87.991435] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] [ 87.991435] Modules linked in: See the 'dmesg.log' attachment for a full log. Additional info: * package version(s) linux 5.8.1.arch1-1 qemu 5.1.0-1 See the 'package-set.txt' attachment for a full package set. Steps to reproduce: In our case we boot an Arch VM with KVM acceleration and SMP set to 8. Example command line from one of the failing tests: /sbin/qemu-system-x86_64 -smp 8 -net none -m 512M -nographic -kernel /boot/vmlinuz-linux -drive format=raw,cache=unsafe,file=/var/tmp/systemd-test.d2K1go/basic.img -initrd /var/tmp/initrd-testsuite-X4x.img -machine accel=kvm -enable-kvm -cpu host -append ' root=/dev/sda1 rw raid=noautodetect rd.luks=0 loglevel=2 init=/usr/lib/systemd/systemd console=ttyS0 selinux=0 SYSTEMD_UNIT_PATH=/usr/lib/systemd/tests/testdata/testsuite-01.units:/usr/lib/systemd/tests/testdata/units: systemd.unit=testsuite.target systemd.wants=testsuite-01.service systemd.wants=end.service debug systemd.log_level=debug systemd.log_target=console ' |
This task depends upon
Closed by freswa (frederik)
Sunday, 13 September 2020, 15:48 GMT
Reason for closing: Upstream
Additional comments about closing: This is not a packaging issue. Please report this upstream. Thank you
Sunday, 13 September 2020, 15:48 GMT
Reason for closing: Upstream
Additional comments about closing: This is not a packaging issue. Please report this upstream. Thank you
dmesg.log