FS#37617 - [kdenlive] Issue with glibc lock elision implementation for Haswell CPU

Attached to Project: Community Packages
Opened by Alphazo (alphazo) - Sunday, 03 November 2013, 10:17 GMT
Last edited by Sergej Pupykin (sergej) - Monday, 24 February 2014, 08:50 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Sergej Pupykin (sergej)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

I just installed ArchLinux on a brand new Intel i7 4770S (Haswell) because my old laptop was unable to handle HD content properly under Kdenlive. I have the exact package versions on both machines but when disabling a simple effect in Kdenlive I get a seg fault on the new machine.

You can see my bug report on Kdenlive bug tracker here: http://www.kdenlive.org/mantis/view.php?id=3186

I then looked at the trace and found this stuff about lock elision that seems very (hardware) specific to Haswell CPU.

http://lwn.net/Articles/534758/
http://halobates.de/adding-lock-elision-to-linux.pdf
http://www.phoronix.com/scan.php?page=n … px=MTQzNDk
http://www.anandtech.com/show/6290/maki … tensions/2

I looked at the PKGBUILD for the current glibc 2.18-9 and it has the "--enable-lock-elision" so I used ABS to recompile it without support for it to see if it would make my Kdenlive work again. And yes problem is gone.

Is this a problem with Kdenlive code or lock elision implementation in current glibc when ran on Haswell processors ? Don't know if this is related but I also had some issues with Digikam when using the default glibc.
This task depends upon

Closed by  Sergej Pupykin (sergej)
Monday, 24 February 2014, 08:50 GMT
Reason for closing:  Fixed
Additional comments about closing:  patch applied
Comment by Allan McRae (Allan) - Sunday, 03 November 2013, 10:34 GMT
What seems the relevant part of the backtrace in your kdenlive bug report:

Thread 1 (Thread 0x7ffff7f957c0 (LWP 8517)):
#0 0x00007ffff48dd1b8 in __lll_unlock_elision () from /usr/lib/libpthread.so.0
#1 0x00000000006cec6b in ?? ()
#2 0x000000000058355c in ?? ()
#3 0x000000000046ea4f in _start ()

Can you rebuild kdenlive without stripping debug symbols - options=('!strip') - and post the backtrace again.

I'll also need clarification on whether the digikam issue is related at all...
Comment by Alphazo (alphazo) - Sunday, 03 November 2013, 10:46 GMT
Please find attached a more detailled backtrace. Hope it will help.
Comment by Allan McRae (Allan) - Monday, 04 November 2013, 11:29 GMT
Can you give the output of a "bt full" on the segfaulting thread? I can not tell where the issue is yet.
Comment by Alphazo (alphazo) - Monday, 04 November 2013, 11:31 GMT
I used "thread apply all bt" when the crash occured. Do you want me just to use "bt full" ?
Comment by Alphazo (alphazo) - Monday, 04 November 2013, 11:53 GMT
here are the traces for both options "bt full" and "thread apply all bt full"
Comment by Alphazo (alphazo) - Monday, 04 November 2013, 11:54 GMT
here is the one for "thread apply all bt full"
Comment by Alphazo (alphazo) - Tuesday, 05 November 2013, 21:45 GMT
Do you need any more trace from my side?
Comment by Allan McRae (Allan) - Tuesday, 05 November 2013, 23:36 GMT
#0 0x00007ffff48dd1b8 in __lll_unlock_elision () from /usr/lib/libpthread.so.0
No symbol table info available.

We don't strip that library so why is there no information here?

Anyway - I don't know how to take this further. Hopefully the bug report I see you opened on the glibc tracker gets some traction.
Comment by Allan McRae (Allan) - Wednesday, 06 November 2013, 04:21 GMT Comment by Allan McRae (Allan) - Friday, 13 December 2013, 01:44 GMT
Do you still have these issues? I can not find reference to this anywhere else.
Comment by Alphazo (alphazo) - Saturday, 14 December 2013, 16:04 GMT
Yep. Just updated my system (Intel i7 4770S) this morning and forgot to ignore glibc package. Result was the exact same crash when using kdenlive. After recompiling glibc without "--enable-lock-elision" problem was fixed.
Comment by Allan McRae (Allan) - Sunday, 05 January 2014, 10:55 GMT
Any chance you could try using the prerelease package I have made. Glibc is in code freeze now, so this is not far off the next release.

pacman -U http://allanmcrae.com/tmp/glibc-2.18.90.20140105-1-x86_64.pkg.tar.xz

Comment by Alphazo (alphazo) - Sunday, 05 January 2014, 12:15 GMT
Tried the proposed package glibc-2.18.90.20140105-1-x86_64.pkg.tar.xz but it crashes the same way. When reverting to my old glibc-2.18-9 compiled without "--enable-lock-elision" problem is gone. I'm really surprised to be the only one to experience such issue.
Comment by Allan McRae (Allan) - Sunday, 05 January 2014, 12:47 GMT
Do you only see this issue in kdenlive? Or were other applications also affected?
Comment by Alphazo (alphazo) - Sunday, 05 January 2014, 15:48 GMT
I can instantly see it with Kdenlive and I'm not sure with Digikam. Now I have installed Fedora 20 in a VM running on ArchLinunx. Fedora 20 ships with glibc 2.18 (and apparently lock-elision enabled). I tried Kdenlive and surprisingly it doesn't crash.
Comment by Allan McRae (Allan) - Sunday, 05 January 2014, 22:20 GMT
Can you run using the repo glibc for a while when you don't intend using kdenlive? Looing at the __lll_unlock_elision function, I'd guess this is a use after free situation so should be quite application specific.
Comment by Allan McRae (Allan) - Tuesday, 14 January 2014, 03:11 GMT
Hrm - Fedora 20 is generating a few backtraces that seem related...

https://retrace.fedoraproject.org/faf/problems/1450092/
https://retrace.fedoraproject.org/faf/problems/1469298/
https://retrace.fedoraproject.org/faf/problems/1446340/

Did you manage to test if it was just kdenlive having issues on your system?
Comment by Alphazo (alphazo) - Wednesday, 15 January 2014, 11:12 GMT
I played around with a bunch of KDE applications including DigiKam, KMail, Krita... and didn't experience any crash. I also installed a fresh ArchLinux on a VirtualBox VM using Antergos jump-start DVD, then installed Kdenlive and no crash using the exact same glibc/kdenlive version. Did the same test with Manjaro and Fedora 20 and no crash. Now I don't know how a virtual machine behaves with the lock elision mechanism.
Comment by Allan McRae (Allan) - Wednesday, 15 January 2014, 11:23 GMT
OK - seems a VM is not the way to test... Also, it appears restricted to Kdenlive. The Fedora reports point to a common issue in kdelibs, so this may not be a glibc issue at all, but rather glibc exposes the issue.
Comment by Allan McRae (Allan) - Thursday, 30 January 2014, 07:53 GMT
  • Field changed: Summary (Issue with glibc lock elision implementation for Haswell CPU → [kdenlive] Issue with glibc lock elision implementation for Haswell CPU)
Tagging as a kdenlive issue after discussion on glibc irc.
Comment by Balló György (City-busz) - Friday, 21 February 2014, 09:46 GMT
Apparently upstream fixed the problem. Please rebuild kdenlive with the following patch, and give a feedback:
http://quickgit.kde.org/?p=kdenlive.git&a=commitdiff&h=d049b327afc02b499266b5c895b13e438490b7c0&o=plain
Comment by Alphazo (alphazo) - Friday, 21 February 2014, 20:26 GMT
Applying above patch to 0.9.6-2 from ABS fixed the problem. Thanks a lot.

Loading...