FS#67649 - [linux] CPU soft lockup in smp_call_function_many_cond during boot with SMP

Attached to Project: Arch Linux
Opened by Frantisek Sumsal (mrc0mmand) - Thursday, 20 August 2020, 10:14 GMT
Last edited by freswa (frederik) - Sunday, 13 September 2020, 15:48 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
Levente Polyak (anthraxx)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
Hello,

After upgrading our systemd CI KVM machines from kernel 5.7.12.arch1-1 to 5.8.1.arch1-1 they now experience random (and quite frequent) CPU soft lockups when trying to online non-boot CPUs during boot. Both the host and the guest are using the same package set.

[ 3.995230] smp: Bringing up secondary CPUs ...
[ 3.998969] x86: Booting SMP configuration:
[ 4.001485] .... node #0, CPUs: #1
[ 1.308122] kvm-clock: cpu 1, msr 15201041, secondary cpu clock
[ 1.308122] smpboot: CPU 1 Converting physical 0 to logical die 1
[ 4.368244] KVM setup async PF for cpu 1
[ 4.371435] kvm-stealtime: cpu 1, msr 1e46e080
[ 4.379139] #2
[ 1.308122] kvm-clock: cpu 2, msr 15201081, secondary cpu clock
[ 1.308122] smpboot: CPU 2 Converting physical 0 to logical die 2
[ 4.664906] KVM setup async PF for cpu 2
[ 4.668102] kvm-stealtime: cpu 2, msr 1e4ae080
[ 4.675603] #3
[ 1.308122] kvm-clock: cpu 3, msr 152010c1, secondary cpu clock
[ 1.308122] smpboot: CPU 3 Converting physical 0 to logical die 3
[ 4.700019] KVM setup async PF for cpu 3
[ 4.701435] kvm-stealtime: cpu 3, msr 1e4ee080
[ 4.712238] #4
[ 1.308122] kvm-clock: cpu 4, msr 15201101, secondary cpu clock
[ 1.308122] smpboot: CPU 4 Converting physical 0 to logical die 4
[ 4.734249] KVM setup async PF for cpu 4
[ 4.734769] kvm-stealtime: cpu 4, msr 1e52e080
[ 4.745775] #5
[ 1.308122] kvm-clock: cpu 5, msr 15201141, secondary cpu clock
[ 1.308122] smpboot: CPU 5 Converting physical 0 to logical die 5
[ 4.914477] KVM setup async PF for cpu 5
[ 4.914769] kvm-stealtime: cpu 5, msr 1e56e080
[ 4.929062] #6
[ 1.308122] kvm-clock: cpu 6, msr 15201181, secondary cpu clock
[ 1.308122] smpboot: CPU 6 Converting physical 0 to logical die 6
[ 4.951806] KVM setup async PF for cpu 6
[ 4.954769] kvm-stealtime: cpu 6, msr 1e5ae080
[ 4.965788] #7
[ 1.308122] kvm-clock: cpu 7, msr 152011c1, secondary cpu clock
[ 1.308122] smpboot: CPU 7 Converting physical 0 to logical die 7
[ 5.860012] KVM setup async PF for cpu 7
[ 5.861435] kvm-stealtime: cpu 7, msr 1e5ee080
[ 31.991435] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[ 31.991435] Modules linked in:
[ 31.991435] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.1-arch1-1 #1
[ 31.991435] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
[ 31.991435] RIP: 0010:smp_call_function_many_cond+0x2a3/0x2f0
[ 31.991435] Code: 73 62 3c 00 3b 05 51 6c 82 01 89 c7 0f 83 f4 fd ff ff 48 63 c7 49 8b 55 00 48 03 14 c5 40 49 02 94 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75 f7 eb c9 48 c7 c2 20 91 56 94 48 89 ee 44 89 ff
[ 31.991435] RSP: 0018:ffff988ac0013da8 EFLAGS: 00000202
[ 31.991435] RAX: 0000000000000011 RBX: 0000000000000000 RCX: 0000000000000006
[ 31.991435] RDX: ffff96365e5b20a0 RSI: 0000000000000000 RDI: 0000000000000006
[ 31.991435] RBP: 0000000000000007 R08: 0000000000000000 R09: 0000000000000006
[ 31.991435] R10: 0000000000000005 R11: 0000000000000005 R12: 0000000000000000
[ 31.991435] R13: ffff96365e42d300 R14: 0000000000000140 R15: ffff96365e42d308
[ 31.991435] FS: 0000000000000000(0000) GS:ffff96365e400000(0000) knlGS:0000000000000000
[ 31.991435] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 31.991435] CR2: ffff963655401000 CR3: 0000000014a0a000 CR4: 00000000000406f0
[ 31.991435] Call Trace:
[ 31.991435] ? _raw_spin_unlock+0x16/0x30
[ 31.991435] ? text_poke_loc_init+0x160/0x160
[ 31.991435] ? text_poke_loc_init+0x160/0x160
[ 31.991435] on_each_cpu+0x43/0xb0
[ 31.991435] text_poke_bp_batch+0x9f/0x190
[ 31.991435] text_poke_finish+0x1b/0x26
[ 31.991435] arch_jump_label_transform_apply+0x16/0x30
[ 31.991435] static_key_enable_cpuslocked+0x57/0x90
[ 31.991435] static_key_enable+0x16/0x20
[ 31.991435] _cpu_up+0xc6/0x140
[ 31.991435] cpu_up+0x7e/0xc0
[ 31.991435] bringup_nonboot_cpus+0x48/0x50
[ 31.991435] smp_init+0x26/0x73
[ 31.991435] kernel_init_freeable+0xd0/0x214
[ 31.991435] ? rest_init+0xbf/0xbf
[ 31.991435] kernel_init+0xa/0x101
[ 31.991435] ret_from_fork+0x22/0x30
[ 59.991435] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[ 59.991435] Modules linked in:
[ 59.991435] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G L 5.8.1-arch1-1 #1
[ 59.991435] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
[ 59.991435] RIP: 0010:smp_call_function_many_cond+0x2a3/0x2f0
[ 59.991435] Code: 73 62 3c 00 3b 05 51 6c 82 01 89 c7 0f 83 f4 fd ff ff 48 63 c7 49 8b 55 00 48 03 14 c5 40 49 02 94 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75 f7 eb c9 48 c7 c2 20 91 56 94 48 89 ee 44 89 ff
[ 59.991435] RSP: 0018:ffff988ac0013da8 EFLAGS: 00000202
[ 59.991435] RAX: 0000000000000011 RBX: 0000000000000000 RCX: 0000000000000006
[ 59.991435] RDX: ffff96365e5b20a0 RSI: 0000000000000000 RDI: 0000000000000006
[ 59.991435] RBP: 0000000000000007 R08: 0000000000000000 R09: 0000000000000006
[ 59.991435] R10: 0000000000000005 R11: 0000000000000005 R12: 0000000000000000
[ 59.991435] R13: ffff96365e42d300 R14: 0000000000000140 R15: ffff96365e42d308
[ 59.991435] FS: 0000000000000000(0000) GS:ffff96365e400000(0000) knlGS:0000000000000000
[ 59.991435] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 59.991435] CR2: ffff963655401000 CR3: 0000000014a0a000 CR4: 00000000000406f0
[ 59.991435] Call Trace:
[ 59.991435] ? _raw_spin_unlock+0x16/0x30
[ 59.991435] ? text_poke_loc_init+0x160/0x160
[ 59.991435] ? text_poke_loc_init+0x160/0x160
[ 59.991435] on_each_cpu+0x43/0xb0
[ 59.991435] text_poke_bp_batch+0x9f/0x190
[ 59.991435] text_poke_finish+0x1b/0x26
[ 59.991435] arch_jump_label_transform_apply+0x16/0x30
[ 59.991435] static_key_enable_cpuslocked+0x57/0x90
[ 59.991435] static_key_enable+0x16/0x20
[ 59.991435] _cpu_up+0xc6/0x140
[ 59.991435] cpu_up+0x7e/0xc0
[ 59.991435] bringup_nonboot_cpus+0x48/0x50
[ 59.991435] smp_init+0x26/0x73
[ 59.991435] kernel_init_freeable+0xd0/0x214
[ 59.991435] ? rest_init+0xbf/0xbf
[ 59.991435] kernel_init+0xa/0x101
[ 59.991435] ret_from_fork+0x22/0x30
[ 87.991435] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[ 87.991435] Modules linked in:

See the 'dmesg.log' attachment for a full log.

Additional info:
* package version(s)
linux 5.8.1.arch1-1
qemu 5.1.0-1

See the 'package-set.txt' attachment for a full package set.

Steps to reproduce:
In our case we boot an Arch VM with KVM acceleration and SMP set to 8. Example command line from one of the failing tests:
/sbin/qemu-system-x86_64 -smp 8 -net none -m 512M -nographic -kernel /boot/vmlinuz-linux -drive format=raw,cache=unsafe,file=/var/tmp/systemd-test.d2K1go/basic.img -initrd /var/tmp/initrd-testsuite-X4x.img -machine accel=kvm -enable-kvm -cpu host -append ' root=/dev/sda1 rw raid=noautodetect rd.luks=0 loglevel=2 init=/usr/lib/systemd/systemd console=ttyS0 selinux=0 SYSTEMD_UNIT_PATH=/usr/lib/systemd/tests/testdata/testsuite-01.units:/usr/lib/systemd/tests/testdata/units: systemd.unit=testsuite.target systemd.wants=testsuite-01.service systemd.wants=end.service debug systemd.log_level=debug systemd.log_target=console '
This task depends upon

Closed by  freswa (frederik)
Sunday, 13 September 2020, 15:48 GMT
Reason for closing:  Upstream
Additional comments about closing:  This is not a packaging issue. Please report this upstream. Thank you

Loading...