FS#39631 - [glibc] --enable-lock-elision breaks applications on Haswell
Attached to Project:
Arch Linux
Opened by Thomas Bächler (brain0) - Wednesday, 26 March 2014, 15:10 GMT
Last edited by Allan McRae (Allan) - Friday, 05 September 2014, 11:25 GMT
Opened by Thomas Bächler (brain0) - Wednesday, 26 March 2014, 15:10 GMT
Last edited by Allan McRae (Allan) - Friday, 05 September 2014, 11:25 GMT
|
Details
With current glibc 2.19-3 on a Haswell i7-4600U, some
applications show subtle, hard to reproduce failure.
In particular, running Maple 17 and performing certain computations causes Maple to abort with GC Thread signalAbort 0x7fb49679f700 Execution stopped: Stack limit reached. after some time. Due to lack of sources and debug symbols, the backtrace is rather useless. I built glibc with --enable-lock-elision=no, put libpthread.so.0 into its own directory and started the application with LD_LIBRARY_PATH set accordingly. This fixes the problem. |
This task depends upon
Closed by Allan McRae (Allan)
Friday, 05 September 2014, 11:25 GMT
Reason for closing: Not a bug
Additional comments about closing: Not a glibc issue.
Friday, 05 September 2014, 11:25 GMT
Reason for closing: Not a bug
Additional comments about closing: Not a glibc issue.
Core was generated by `/opt/maple17/bin.X86_64_LINUX/mserver -kport 51580 -O C --env-setup'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007eff57382389 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0 0x00007eff57382389 in raise () from /usr/lib/libc.so.6
#1 0x00007eff57383788 in abort () from /usr/lib/libc.so.6
#2 0x00007eff57a23333 in ?? () from /opt/maple17/bin.X86_64_LINUX/libmaple.so
#3 0x00007eff57a2201f in ?? () from /opt/maple17/bin.X86_64_LINUX/libmaple.so
#4 0x00007eff56c2f0a2 in start_thread () from /usr/lib/libpthread.so.0
#5 0x00007eff57432d1d in clone () from /usr/lib/libc.so.6
I'd at least like to determine who behaves incorrectly here.
I suggest getting an old glibc package or rebuild the current and LD_PRELOAD it.
http://anandtech.com/show/8376/intel-disables-tsx-instructions-erratum-found-in-haswell-haswelleep-broadwell
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3-1200v3-spec-update.pdf
Quite a super rare case to find bug in HW and I am also impressed that someone in arch linux bug tracker was able to find it month before the news got public.
As for myself, I don't really care, I can work around.
The only hardware that supports this extension at this moment, is hardware which is bugged. Updating microcode (through BIOS or through microcode interface in linux) will just disable the extension, meaning that no CPU provides support for this feature.
I would suggest turning this off: why would you use an extension that isn't available on any hardware? Also note that it isn't enabled by default, but specifically enabled with a --enable-flag, so we should turn it off.
My haswells (i7-4770R stepping 1 microcode 0xe and i7-4750HQ stepping 1 microcode 0x10) both have it disabled.
I'm also for rebuilding glibc without lock elision.
http://techreport.com/news/26911/errata-prompts-intel-to-disable-tsx-in-haswell-early-broadwell-cpus
An Intel spokesperson has provided TR with a brief statement on the TSX erratum, confirming that Intel has "addressed the issue" and "disabled the TSX feature on affected products."
Seriously there is no hardware with a working microcode for that instruction as far as I can tell. It is going to create issue randomly that people wont be able to debug and when they will finally point it down (if they have the time to go that deep down), it will be to figure out that Arch Linux is providing a faulty glibc.
I guess I better go to glibc and try to figure a patch there.
Microcode updates appear to have been released by Intel.
Lock elision is enabled by Arch, Debian (jessie), Fedora, openSUSE, ...
I see no reason to disable it. glibc is not the issue.