FS#45295 - [glibc] Lots of programs will get a segfault when quit.

Attached to Project: Arch Linux
Opened by Madper Xie (Madper) - Thursday, 11 June 2015, 09:34 GMT
Last edited by Allan McRae (Allan) - Wednesday, 05 August 2015, 13:42 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Allan McRae (Allan)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 3
Private No

Details

Description:
I'm using T450p. here is my cpu info:
Model name: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz

When I quit a program, it will show something like this:
~ % emacs -Q
[1] 4872 segmentation fault (core dumped) emacs -Q

With GDB:
(gdb) r -Q
Starting program: /usr/bin/emacs -Q
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7fffe6a00700 (LWP 4917)]

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff17bc080 in __lll_unlock_elision () from /usr/lib/libpthread.so.0

I searched some users met the same issue and fixed it with updating microcode. But it doesn't work for me:
~ % cat /proc/cmdline
root=/dev/sda2 rw initrd=/EFI/arch/intel-ucode.img initrd=/EFI/arch/initramfs-linux.img
~ % dmesg | grep microcode
[ 0.673257] microcode: CPU0 sig=0x306d4, pf=0x40, revision=0x1a
[ 0.673271] microcode: CPU1 sig=0x306d4, pf=0x40, revision=0x1a
[ 0.673286] microcode: CPU2 sig=0x306d4, pf=0x40, revision=0x1a
[ 0.673303] microcode: CPU3 sig=0x306d4, pf=0x40, revision=0x1a
[ 0.673396] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
The issue still existed.

Additional info:
* package version(s)
core/glibc 2.21-4 (base)
extra/intel-ucode 20150121-1
local/iucode-tool (null)
core/linux 4.0.5-1 (base)

* config and/or log files etc.


Steps to reproduce:
This task depends upon

Closed by  Allan McRae (Allan)
Wednesday, 05 August 2015, 13:42 GMT
Reason for closing:  Not a bug
Additional comments about closing:  Not a glibc issue - file bugs against the segfaulting packages.
Comment by Jan de Groot (JGC) - Thursday, 11 June 2015, 12:17 GMT
Can you attach a full copy of dmesg?
Comment by Madper Xie (Madper) - Thursday, 11 June 2015, 12:52 GMT
Hi @JGC,
The dmesg output is attached.
Thanks a lot in advance!
   dmesg.out (67.3 KiB)
Comment by Jan de Groot (JGC) - Thursday, 11 June 2015, 13:39 GMT
Looks like your BIOS has a newer microcode than we have in intel-ucode.

Your dmesg indicates you have BIOS 1.10. Looking at Lenovo's changelogs, I see this in changelog for 1.13:
- Fixed an BSOD issue related to CPU microcode.

If microcode is buggy enough to BSOD windows, it probably also causes issues on linux.
Comment by Madper Xie (Madper) - Thursday, 11 June 2015, 13:44 GMT
I see. Thanks a lot. I'll try to update my bios. And will update my result here.
Comment by Madper Xie (Madper) - Friday, 12 June 2015, 05:30 GMT
Hi @JGC,
I updated the bios to 1.14. And I double confirmed it in BIOS setup screen.
~ % dmesg | grep -e "1.14"
[ 0.000000] DMI: LENOVO 20BWZ0CJUS/20BWZ0CJUS, BIOS JBET49WW (1.14 ) 05/21/2015

Also I removed the intel-ucode.img from my command line.
~ % cat /proc/cmdline
root=/dev/sda2 rw initrd=/EFI/arch/initramfs-linux.img

~ % dmesg | grep micro
[ 0.335127] microcode: CPU0 sig=0x306d4, pf=0x40, revision=0x1d
[ 0.335131] microcode: CPU1 sig=0x306d4, pf=0x40, revision=0x1d
[ 0.335137] microcode: CPU2 sig=0x306d4, pf=0x40, revision=0x1d
[ 0.335143] microcode: CPU3 sig=0x306d4, pf=0x40, revision=0x1d
[ 0.335178] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

However the issue still exist...
Comment by Doug Newgard (Scimmia) - Saturday, 13 June 2015, 05:09 GMT
This processor is new enough that it shouldn't be affected by the TSX bug, so I'm thinking that the microcode path may be a dead end.
Comment by Madper Xie (Madper) - Saturday, 13 June 2015, 15:16 GMT
So maybe it's another CPU bug? Due to it's a fresh installed arch. And many Arch users using the same version of glibc won't meet the same issue. So it's likely an hardware related issue?
Comment by Madper Xie (Madper) - Thursday, 25 June 2015, 12:48 GMT
I found the following logs in my demsg:

[ 20.685971] traps: sogou-qimpanel[971] general protection ip:7ffff756e080 sp:7fffffffdf38 error:0 in libpthread-2.21.so[7ffff755c000+18000]
[ 20.747772] traps: sogou-qimpanel[940] general protection ip:7ffff756e080 sp:7fffffffdf38 error:0 in libpthread-2.21.so[7ffff755c000+18000]
[ 20.848911] traps: sogou-qimpanel-[1050] general protection ip:7ffff51b0080 sp:7fffffffe018 error:0 in libpthread-2.21.so[7ffff519e000+18000]
[ 20.851554] traps: sogou-qimpanel-[1051] general protection ip:7ffff51b0080 sp:7fffffffde28 error:0 in libpthread-2.21.so[7ffff519e000+18000]
[ 20.883135] traps: sogou-qimpanel-[994] general protection ip:7ffff51b0080 sp:7fffffffe018 error:0 in libpthread-2.21.so[7ffff519e000+18000]
[ 131.755433] traps: pavucontrol[3535] general protection ip:7ffff272e080 sp:7fffffffe0c8 error:0 in libpthread-2.21.so[7ffff271c000+18000]
[ 1754.130471] traps: plugin-containe[2402] general protection ip:7ffff7bd0080 sp:7fffffffde68 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 1802.060843] traps: plugin-containe[32035] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 1946.744722] traps: plugin-containe[2605] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 2039.886797] traps: plugin-containe[4515] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 2102.240091] traps: plugin-containe[5794] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 2210.843161] traps: plugin-containe[7826] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 2325.822724] traps: plugin-containe[10224] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 2669.675635] traps: plugin-containe[16649] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 2713.276303] traps: plugin-containe[18152] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 2790.147593] traps: plugin-containe[19860] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 2894.831930] traps: plugin-containe[21549] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 2959.272602] traps: plugin-containe[23031] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 3031.701294] traps: plugin-containe[24529] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 3190.851857] traps: plugin-containe[27643] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 3213.029789] traps: eog[28806] general protection ip:7ffff655b080 sp:7fffffffdd38 error:0 in libpthread-2.21.so[7ffff6549000+18000]
[ 3304.322909] traps: plugin-containe[29732] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 3521.254837] traps: plugin-containe[2084] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 3694.576240] traps: firefox[6293] general protection ip:7ffff7bd0080 sp:7fffffffe068 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 3848.025320] traps: plugin-containe[8833] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 4112.211409] traps: plugin-containe[10475] general protection ip:7ffff7bd0080 sp:7fffffffdef8 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 4353.433972] traps: plugin-containe[18950] general protection ip:7ffff7bd0080 sp:7fffffffdf18 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
[ 4401.486186] traps: firefox[20577] general protection ip:7ffff7bd0080 sp:7fffffffe048 error:0 in libpthread-2.21.so[7ffff7bbe000+18000]
Comment by Doug Newgard (Scimmia) - Friday, 26 June 2015, 04:26 GMT
Everything references libpthread, and the original report specifies unlock_elision, so I'm going to assign this to glibc (probably should have done that a while ago).
Comment by Madper Xie (Madper) - Friday, 26 June 2015, 08:13 GMT
Hi Allan,
I find some one else met the same issue with me. https://bbs.archlinux.org/viewtopic.php?id=197100
But upgrade the microcode didn't fix the issue for me. :-(
Comment by Allan McRae (Allan) - Friday, 26 June 2015, 10:09 GMT
Can we get an actual backtrace?
Comment by Madper Xie (Madper) - Friday, 26 June 2015, 10:10 GMT
Hi Allan,
Can you guide me how to get the backtrace?
Thanks in advance.
Comment by Allan McRae (Allan) - Friday, 26 June 2015, 10:27 GMT
Install "gdb" and run "gdb program". Quit and see the segfault. Type "bt full" and copy the output.
Comment by Madper Xie (Madper) - Friday, 26 June 2015, 11:12 GMT
~ % gdb emacs
...
(gdb) r -Q
Starting program: /usr/bin/emacs -Q
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Xlib: extension "GLX" missing on display ":0.0".
[New Thread 0x7fffe5ff0700 (LWP 2435)]
[New Thread 0x7fffe57ef700 (LWP 2437)]
[New Thread 0x7fffe69fe700 (LWP 2433)]

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff17ba080 in __lll_unlock_elision () from /usr/lib/libpthread.so.0
(gdb) bt full
#0 0x00007ffff17ba080 in __lll_unlock_elision () from /usr/lib/libpthread.so.0
No symbol table info available.
#1 0x00007fffee2c326c in ?? () from /usr/lib/libEGL.so.1
No symbol table info available.
#2 0x00007fffee253a22 in ?? () from /usr/lib/libEGL.so.1
No symbol table info available.
#3 0x00007fffffffd1e0 in ?? ()
No symbol table info available.
#4 0x00007fffee2d7ea1 in ?? () from /usr/lib/libEGL.so.1
No symbol table info available.
#5 0x00007fffffffd1e0 in ?? ()
No symbol table info available.
#6 0x00007ffff7dea6f5 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)
Comment by Allan McRae (Allan) - Monday, 27 July 2015, 05:43 GMT
Did you fix this? The backtrace looks exactly like the TSX issues that are disabled in intel-ucode.
Comment by Doug Newgard (Scimmia) - Monday, 27 July 2015, 06:07 GMT
The reporter's CPU is a Broadwell-U, so it does NOT have the TSX instructions disabled by newer microcode. These processors were not supposed to be affected. Either Intel didn't fix it like they thought they did, or there's something going on with glibc's implementation.
Comment by Allan McRae (Allan) - Monday, 27 July 2015, 06:30 GMT
Then I am labelling this genuine issues in the software that crashes. These issues have been largely hidden by the disabling of TSX, but are abundant. The TSX issues should not cause such reliable crashes...

Can you provide a list of all packages that you see segfault? I am assuming it is the same software crashing at the same point repeatedly.

Loading...