FS#54483 - [jemalloc] Crashes QEMU
Attached to Project:
Arch Linux
Opened by rainer (raneon) - Friday, 16 June 2017, 21:32 GMT
Last edited by Christian Hesse (eworm) - Wednesday, 21 June 2017, 07:09 GMT
Opened by rainer (raneon) - Friday, 16 June 2017, 21:32 GMT
Last edited by Christian Hesse (eworm) - Wednesday, 21 June 2017, 07:09 GMT
|
Details
Description: Jemalloc 5 crashes libvirt (qemu) server ->
segfault
Additional info: Server is fully udated, tried to reboot but the result is the same... I cannot bring my VM server online for more than 15 minutes pacman.log: [2017-06-16 22:02] [ALPM] upgraded jemalloc (4.5.0-1 -> 5.0.0-2) Steps to reproduce: 1. Start virtual machine with libvirt (qemu-system) 2. Wait some minutes to get a segfault journalctl -f Jun 16 23:19:38 host1 systemd-machined[915]: New machine qemu-2-VM1. Jun 16 23:19:38 host1 systemd[1]: Started Virtual Machine qemu-2-VM1. Jun 16 23:19:38 host1 systemd-timesyncd[333]: Synchronized to time server 213.209.109.45:123 (2.arch.pool.ntp.org). Jun 16 23:19:38 host1 kernel: qemu-system-x86: sending ioctl 5326 to a partition! Jun 16 23:19:39 host1 systemd-networkd[383]: vnet0: Gained IPv6LL Jun 16 23:19:39 host1 systemd-timesyncd[333]: Network configuration changed, trying to establish connection. Jun 16 23:19:39 host1 systemd-timesyncd[333]: Synchronized to time server 213.209.109.45:123 (2.arch.pool.ntp.org). Jun 16 23:19:52 host1 systemd-networkd[383]: vnet0: Configured Jun 16 23:19:52 host1 systemd-timesyncd[333]: Network configuration changed, trying to establish connection. Jun 16 23:19:52 host1 systemd-timesyncd[333]: Synchronized to time server 213.209.109.45:123 (2.arch.pool.ntp.org). Jun 16 23:24:15 host1 kernel: worker[1692]: segfault at 370 ip 00007f67bcebc1d1 sp 00007f6699cb8d50 error 6 in libjemalloc.so.2[7f67bce64000+6a000] Jun 16 23:24:15 host1 systemd[1]: Started Process Core Dump (PID 1693/UID 0). Jun 16 23:24:15 host1 systemd-coredump[1694]: Resource limits disable core dumping for process 1537 (qemu-system-x86). Jun 16 23:24:15 host1 systemd-coredump[1694]: Process 1537 (qemu-system-x86) of user 99 dumped core. Jun 16 23:24:16 host1 libvirtd[391]: 2017-06-16 21:24:15.999+0000: 391: error : qemuMonitorIO:699 : internal error: End of file from qemu monitor Jun 16 23:24:16 host1 systemd-networkd[383]: vnet0: Lost carrier |
This task depends upon
Closed by Christian Hesse (eworm)
Wednesday, 21 June 2017, 07:09 GMT
Reason for closing: Fixed
Additional comments about closing: jemalloc 1:5.0.0-3 in [testing]
Wednesday, 21 June 2017, 07:09 GMT
Reason for closing: Fixed
Additional comments about closing: jemalloc 1:5.0.0-3 in [testing]
Same problem, segfault at libjremalloc caused qemu to abort.
Jun 17 08:02:39 T530-laptop kernel: worker[1505]: segfault at 7f9828b230d8 ip 00007f9a64f291bc sp 00007f982b326d50 error 4 in libjemalloc.so.2[7f9a64ed1000+6a000]
Jun 17 08:02:39 T530-laptop systemd[1]: Started Process Core Dump (PID 1506/UID 0).
Jun 17 08:02:39 T530-laptop systemd-coredump[1507]: Resource limits disable core dumping for process 1401 (qemu-system-x86).
Jun 17 08:02:39 T530-laptop systemd-coredump[1507]: Process 1401 (qemu-system-x86) of user 99 dumped core.
Downgrading jemalloc per raneon's suggestion has resolved the issue as a temporary fix:
pacman -U /var/cache/pacman/pkg/jemalloc-4.5.0-1-x86_64.pkg.tar.xz
jemalloc 5.0.0-2 introduced a patch to handle it.
It seems working after a short qemu test.
I'll do a longer test to see if it really solves it.
Downgrading seems to be the only working way yet.
Instead of downgrading jemalloc I just recompiled qemu and it seems to work well for now, I would suggest a qemu dummy update that just rebuilds as there might just have been an ABI change with jemalloc.
Are there other applications running jemalloc?
https://bbs.archlinux.org/viewtopic.php?id=227344
Could you submit the backtraces you have upstream to see what upstream makes of the issue?
I strike back that rebuilding fixes the issue, turns out I recompiled without jemalloc, no wonder that helps... Will rebuild both jemalloc without stripping and qemu with jemalloc to check that when I can.
If someone else with the issue wants to try, revert 627f69 (back to 4.5.0 patch) and apply http://asmadeus.notk.org/patches/arch/jemalloc-5.0.0-3.patch to a temporary 5.0.0-3 with that extra patch.
I'll post here again once I am confident the issue didn't just get sneakier.
For what it's worth, https://github.com/jemalloc/jemalloc/commit/d35c037e03e1450794dcf595e49a1e1f97f87ac4 is likely worth integrating to the arch package as well if we're going to roll out a 5.0.0-x version back again. This does not affect qemu but I am sure we have some packages using jemalloc that do fork...
Letting arch folks handle it from there (a bit broader testing can't hurt), that was a quick fix . . . One day before this bug report was opened :D