Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#38907 - [valgrind] kernel CONFIG_MEM_SOFT_DIRTY causes pthread_attr_getstack bug

Attached to Project: Arch Linux
Opened by sergio (sergio) - Thursday, 13 February 2014, 23:53 GMT
Last edited by Tobias Powalowski (tpowa) - Wednesday, 19 March 2014, 16:05 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Anatol Pomozov (anatolik)
Architecture x86_64
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description:

Valgrind 3.9 seems broken on archlinux, tested OK in other distros.

Compile the attached program:
g++ -g main.cpp -lpthread

valgrind ./a.out

The expected result would be like:
Stack size: 8388608; success= 0

But our broken valgrind gives:
Stack size: 8192; success= 0

This makes it impossible to valgrind any webkit program...
Max stack size is usually 8MB, as you can see by running the snipplet without valgrind, or running under an unbroken version of valgrind.

Edit: this is caused by CONFIG_MEM_SOFT_DIRTY in the Linux kernel
   main.cpp (0.6 KiB)
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Wednesday, 19 March 2014, 16:05 GMT
Reason for closing:  Fixed
Comment by sergio (sergio) - Friday, 14 February 2014, 13:08 GMT
Improved test-case
   main.cpp (0.6 KiB)
Comment by Allan McRae (Allan) - Saturday, 15 February 2014, 01:23 GMT
Did this start with the upgrade to glibc-2.19?
Comment by Allan McRae (Allan) - Saturday, 15 February 2014, 03:07 GMT
It is an issue with valgrind-svn too.
Comment by sergio (sergio) - Saturday, 15 February 2014, 03:10 GMT
I don't remember if it happened with 2.18...

I tested valgrind-3.7 and it still happens, so it's something else that archlinux does... not sure what
Comment by Allan McRae (Allan) - Saturday, 15 February 2014, 03:54 GMT
Just tested with old install. It happened there too. So this is not a recent thing.
Comment by sergio (sergio) - Saturday, 15 February 2014, 12:37 GMT
Which glibc did you test ?
Comment by Allan McRae (Allan) - Saturday, 15 February 2014, 12:42 GMT
glibc-2.18
Comment by sergio (sergio) - Saturday, 15 February 2014, 12:53 GMT
valgrind 3.8.1, openSUSE 13.1, glibc 2.18
Stack size: 8384512; success= 0

Maybe it's not glibc but something else we do differently
Comment by Allan McRae (Allan) - Saturday, 15 February 2014, 12:54 GMT
Yes - I just can not figure out what... I tried rebuilding valgrind not using any of our CFLAGS/LDFLAGS and that makes no difference.
Comment by Allan McRae (Allan) - Saturday, 15 February 2014, 12:55 GMT
Also tried a completely unstripped glibc.
Comment by sergio (sergio) - Saturday, 15 February 2014, 18:24 GMT
I just tried mounting a suse chroot inside ArchLinux and I can reproduce the problem with an old version of valgrind/glibc.

But testing it natively in suse ( without chroot ) I can't reproduce.

So this has nothing to do with valgrind/glibc version, maybe some sysctl setting

Comment by sergio (sergio) - Saturday, 15 February 2014, 18:53 GMT
"Fixed" by using linux-lts kernel! (3.10.29-1-lts)


Comment by Allan McRae (Allan) - Saturday, 15 February 2014, 23:58 GMT
Ah - good catch. Any chance you feel like bisecting the kernel to figure out what change caused this? I won't have time over the next week.
Comment by Dave Reisner (falconindy) - Sunday, 16 February 2014, 00:02 GMT
It's likely to be a difference found in the config. I don't see this bug on my own kernel.

By chance does this work if you run valgrind as root?
Comment by sergio (sergio) - Sunday, 16 February 2014, 00:04 GMT
@Allan, I won't have time to narrow it down further either.

@Dave, I can reproduce the bug as root
Comment by Allan McRae (Allan) - Sunday, 16 February 2014, 00:51 GMT
@Dave: Any chance you see a smoking gun in the differences between your config and the Arch one?
Comment by Dave Reisner (falconindy) - Sunday, 16 February 2014, 00:59 GMT
Nothing stands out. Here's the (rather noisy) diff:

https://paste.xinu.at/NN3opq/
Comment by Allan McRae (Allan) - Sunday, 16 February 2014, 01:14 GMT
Confirmed working on Fedora with 3.12.10.


Looking at Dave's diff, everything he "added" is present as a module in the Arch build. So it must be something set by Arch that is breaking this.

There are 713 things set by Arch that are not set by Dave...
Comment by Allan McRae (Allan) - Sunday, 16 February 2014, 02:07 GMT
I can not find a difference between the linux-lts and linux package kernels that is not present in any of Dave's, Fedora's or openSUSE's kernels...
Comment by Allan McRae (Allan) - Sunday, 16 February 2014, 04:04 GMT
I was mistaken. I can replicate it in Fedora 20.
Comment by Allan McRae (Allan) - Sunday, 16 February 2014, 04:58 GMT
The culprit is CONFIG_MEM_SOFT_DIRTY.
Comment by Allan McRae (Allan) - Sunday, 16 February 2014, 05:00 GMT
Adding in the kernel guys for comments on this.
Comment by Sergio Correia (sergio.correia) - Monday, 24 February 2014, 19:45 GMT
Any news on this from the kernel guys?
Comment by Jocelyn Turcotte (jturcotte) - Tuesday, 18 March 2014, 10:10 GMT
I think that this bug is also breaking QtQuick2 applications in valgrind.

The result is that any QML code fails to evaluate and the application shows a blank window.

The Qt JavaScript engine tries to throw a "Maximum call stack size exceeded." exception, however the engine checks the stack again while trying to print the exception, and the only thing that shows up on the console is:

<Unknown File>:
<Unknown File>:
...

getStackLimit() in qtdeclarative/src/qml/jsruntime/qv4engine.cpp calls pthread_attr_getstack, which returns an unexpected 8k small stack size. A larger safety range of 256k is added at the end of the function which causes the stack check to fail every time.
Comment by Anatol Pomozov (anatolik) - Tuesday, 18 March 2014, 15:01 GMT
Interesting, but I cannot reproduce the problem from original testcase. I use default kernel and valgrind.

$ uname -a
Linux archie 3.13.6-1-ARCH #1 SMP PREEMPT Fri Mar 7 22:47:48 CET 2014 x86_64 GNU/Linux
$ zgrep CONFIG_MEM_SOFT_DIRTY /proc/config.gz
CONFIG_MEM_SOFT_DIRTY=y
$ g++ -g main.cpp -lpthread
$ valgrind ./a.out
==46867== Memcheck, a memory error detector
==46867== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==46867== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==46867== Command: ./a.out
==46867==
Stack size: 8388608; success= 0
==46867==
==46867== HEAP SUMMARY:
==46867== in use at exit: 0 bytes in 0 blocks
==46867== total heap usage: 4 allocs, 4 frees, 960 bytes allocated
==46867==
==46867== All heap blocks were freed -- no leaks are possible
==46867==
==46867== For counts of detected and suppressed errors, rerun with: -v
==46867== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
Comment by Anatol Pomozov (anatolik) - Tuesday, 18 March 2014, 15:15 GMT
Allan how did you reproduce the problem?

I use a virtual machine with [testing] repo, all packages up-to-date. I do not see the issue...
Comment by Allan McRae (Allan) - Tuesday, 18 March 2014, 22:21 GMT
Works here now too...
Comment by Jocelyn Turcotte (jturcotte) - Wednesday, 19 March 2014, 14:43 GMT
After updating and rebooting I can't reproduce the issue with Qt anymore either, sorry for the noise.

Loading...