FS#54483 - [jemalloc] Crashes QEMU

Attached to Project: Arch Linux
Opened by rainer (raneon) - Friday, 16 June 2017, 21:32 GMT
Last edited by Christian Hesse (eworm) - Wednesday, 21 June 2017, 07:09 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Christian Hesse (eworm)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 5
Private No

Details

Description: Jemalloc 5 crashes libvirt (qemu) server -> segfault


Additional info:
Server is fully udated, tried to reboot but the result is the same... I cannot bring my VM server online for more than 15 minutes
pacman.log:
[2017-06-16 22:02] [ALPM] upgraded jemalloc (4.5.0-1 -> 5.0.0-2)


Steps to reproduce:
1. Start virtual machine with libvirt (qemu-system)
2. Wait some minutes to get a segfault


journalctl -f
Jun 16 23:19:38 host1 systemd-machined[915]: New machine qemu-2-VM1.
Jun 16 23:19:38 host1 systemd[1]: Started Virtual Machine qemu-2-VM1.
Jun 16 23:19:38 host1 systemd-timesyncd[333]: Synchronized to time server 213.209.109.45:123 (2.arch.pool.ntp.org).
Jun 16 23:19:38 host1 kernel: qemu-system-x86: sending ioctl 5326 to a partition!
Jun 16 23:19:39 host1 systemd-networkd[383]: vnet0: Gained IPv6LL
Jun 16 23:19:39 host1 systemd-timesyncd[333]: Network configuration changed, trying to establish connection.
Jun 16 23:19:39 host1 systemd-timesyncd[333]: Synchronized to time server 213.209.109.45:123 (2.arch.pool.ntp.org).
Jun 16 23:19:52 host1 systemd-networkd[383]: vnet0: Configured
Jun 16 23:19:52 host1 systemd-timesyncd[333]: Network configuration changed, trying to establish connection.
Jun 16 23:19:52 host1 systemd-timesyncd[333]: Synchronized to time server 213.209.109.45:123 (2.arch.pool.ntp.org).
Jun 16 23:24:15 host1 kernel: worker[1692]: segfault at 370 ip 00007f67bcebc1d1 sp 00007f6699cb8d50 error 6 in libjemalloc.so.2[7f67bce64000+6a000]
Jun 16 23:24:15 host1 systemd[1]: Started Process Core Dump (PID 1693/UID 0).
Jun 16 23:24:15 host1 systemd-coredump[1694]: Resource limits disable core dumping for process 1537 (qemu-system-x86).
Jun 16 23:24:15 host1 systemd-coredump[1694]: Process 1537 (qemu-system-x86) of user 99 dumped core.
Jun 16 23:24:16 host1 libvirtd[391]: 2017-06-16 21:24:15.999+0000: 391: error : qemuMonitorIO:699 : internal error: End of file from qemu monitor
Jun 16 23:24:16 host1 systemd-networkd[383]: vnet0: Lost carrier
This task depends upon

Closed by  Christian Hesse (eworm)
Wednesday, 21 June 2017, 07:09 GMT
Reason for closing:  Fixed
Additional comments about closing:  jemalloc 1:5.0.0-3 in [testing]
Comment by rainer (raneon) - Friday, 16 June 2017, 21:41 GMT
Sorry, I tried to update the title/summary but I don't know how...
Comment by Adam (adam900710) - Saturday, 17 June 2017, 06:55 GMT
Confirmed here.
Same problem, segfault at libjremalloc caused qemu to abort.

Comment by rainer (raneon) - Saturday, 17 June 2017, 07:03 GMT
Downgrading to jemalloc 4.5.0-1 helped and my virtual machine runs well again since 9 hours. For now I added jemalloc to IgnorePkg in pacman.conf. My systems are all x86_64.
Comment by Jason Lenz (lenzj) - Saturday, 17 June 2017, 13:48 GMT
Confirmed here as well. Segfault in libjemalloc running libvirt/qemu:
Jun 17 08:02:39 T530-laptop kernel: worker[1505]: segfault at 7f9828b230d8 ip 00007f9a64f291bc sp 00007f982b326d50 error 4 in libjemalloc.so.2[7f9a64ed1000+6a000]
Jun 17 08:02:39 T530-laptop systemd[1]: Started Process Core Dump (PID 1506/UID 0).
Jun 17 08:02:39 T530-laptop systemd-coredump[1507]: Resource limits disable core dumping for process 1401 (qemu-system-x86).
Jun 17 08:02:39 T530-laptop systemd-coredump[1507]: Process 1401 (qemu-system-x86) of user 99 dumped core.

Downgrading jemalloc per raneon's suggestion has resolved the issue as a temporary fix:
pacman -U /var/cache/pacman/pkg/jemalloc-4.5.0-1-x86_64.pkg.tar.xz
Comment by Adam (adam900710) - Sunday, 18 June 2017, 01:33 GMT
https://git.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/jemalloc&id=31d307fa0bfbaa43aace048fe45d6c52d67f5615

jemalloc 5.0.0-2 introduced a patch to handle it.
It seems working after a short qemu test.

I'll do a longer test to see if it really solves it.
Comment by Adam (adam900710) - Sunday, 18 June 2017, 01:42 GMT
Well, 5.0.0-2 still causing the problem.

Downgrading seems to be the only working way yet.
Comment by Dominique Martinet (asmadeus) - Sunday, 18 June 2017, 15:40 GMT
I confirm I still have the problem with 5.0.0-2 ; qemu crashes after 30-40 minutes of running. I've got two backtraces, both in thread local storage memory alloc or dealloc.

Instead of downgrading jemalloc I just recompiled qemu and it seems to work well for now, I would suggest a qemu dummy update that just rebuilds as there might just have been an ABI change with jemalloc.
Are there other applications running jemalloc?
Comment by Doug Newgard (Scimmia) - Sunday, 18 June 2017, 15:45 GMT
Looks like it's causing probems with mariadb, too.

https://bbs.archlinux.org/viewtopic.php?id=227344
Comment by loqs (loqs) - Sunday, 18 June 2017, 20:15 GMT
@asmadeus jemalloc 5.0 seems to fully backward compatible with 4.5 see https://abi-laboratory.pro/tracker/timeline/jemalloc/
Could you submit the backtraces you have upstream to see what upstream makes of the issue?
Comment by Dominique Martinet (asmadeus) - Monday, 19 June 2017, 06:04 GMT
Posted https://github.com/jemalloc/jemalloc/issues/915
I strike back that rebuilding fixes the issue, turns out I recompiled without jemalloc, no wonder that helps... Will rebuild both jemalloc without stripping and qemu with jemalloc to check that when I can.
Comment by Dominique Martinet (asmadeus) - Tuesday, 20 June 2017, 19:36 GMT
Rebuilt with the upstream patch suggested in the github issue, seems to hold out OK.

If someone else with the issue wants to try, revert 627f69 (back to 4.5.0 patch) and apply http://asmadeus.notk.org/patches/arch/jemalloc-5.0.0-3.patch to a temporary 5.0.0-3 with that extra patch.
I'll post here again once I am confident the issue didn't just get sneakier.


For what it's worth, https://github.com/jemalloc/jemalloc/commit/d35c037e03e1450794dcf595e49a1e1f97f87ac4 is likely worth integrating to the arch package as well if we're going to roll out a 5.0.0-x version back again. This does not affect qemu but I am sure we have some packages using jemalloc that do fork...
Comment by Dominique Martinet (asmadeus) - Wednesday, 21 June 2017, 06:23 GMT
Still looks good with https://github.com/jemalloc/jemalloc/commit/9b1befabbb7a7105501d27843873d14e1c2de54b on top of old 5.0.0-2 package; I've closed the upstream issue.

Letting arch folks handle it from there (a bit broader testing can't hurt), that was a quick fix . . . One day before this bug report was opened :D

Loading...