FS#43010 - [glibc] Enabling lock elision in glibc causes illegal instruction crashes on non-Haswell Intel CPUs
Attached to Project:
Arch Linux
Opened by David Anderson (danderson) - Thursday, 04 December 2014, 21:40 GMT
Last edited by Doug Newgard (Scimmia) - Friday, 05 December 2014, 00:36 GMT
Opened by David Anderson (danderson) - Thursday, 04 December 2014, 21:40 GMT
Last edited by Doug Newgard (Scimmia) - Friday, 05 December 2014, 00:36 GMT
|
Details
Description:
When compiled with --enable-lock-elision, glibc 2.20 unconditionally issues the 'xend' instruction in pthread_mutex_unlock. This causes programs to crash with SIGILL on non-Haswell Intel CPUs, because they don't implement the TSX instruction set extension that defines 'xend'. Obviously, the fix for glibc itself should be done upstream (I don't see any relevant bugs in their tracker, so I'm going to go file one after this). In the meantime, Arch could remove --enable-lock-elision from the glibc PKGBUILD to work around the issue, at the cost of degraded performance on Haswell CPUs. Fedora is also tracking this bug in their tracker, though they don't seem to be working on an upstream fix - they just disabled lock elision. See https://bugzilla.redhat.com/show_bug.cgi?id=1146967 and https://bugzilla.redhat.com/show_bug.cgi?id=1144794 Steps to reproduce: The annoying reproduction I have involves building Ceph using my PKGBUILD here: https://github.com/danderson/packages-archlinux/tree/master/aur/ceph , then running `ceph -s`. I'm working on a short&sweet C reproduction, I'll post it when I have it. |
This task depends upon
Closed by Doug Newgard (Scimmia)
Friday, 05 December 2014, 00:36 GMT
Reason for closing: Not a bug
Additional comments about closing: User requested: Invalid: Ceph is invoking undefined pthread behavior which glibc devs have decided to not make less crashy.
Friday, 05 December 2014, 00:36 GMT
Reason for closing: Not a bug
Additional comments about closing: User requested: Invalid: Ceph is invoking undefined pthread behavior which glibc devs have decided to not make less crashy.
And indeed, I'm unable to reproduce the SIGILL with a minimal reproduction case (attached) which correctly sequences the unlock. Incorrect sequencing (also attached) does trigger the SIGILL, but as explained in https://sourceware.org/bugzilla/show_bug.cgi?id=17561 , glibc devs decided it was undefined behavior not worth correcting.
And the Intel errata for TSX was "fixed" by a microcode update that turned off TSX at the source, so it's probably fine to keep lock elision enabled for glibc, unless you feel that it's unfair to require users to keep their microcode up to date to not be crashy.
rwlock_correct_ordering.c (0.5 KiB)