Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#23214 - [kernel26] CPU/memory maxing-out by simply ls'ing (find'ing, os.path.find()ing, etc.) a directory

Attached to Project: Arch Linux
Opened by Phil Bordelon (Sunfall) - Thursday, 10 March 2011, 04:23 GMT
Last edited by Tobias Powalowski (tpowa) - Wednesday, 15 February 2012, 08:13 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

With the latest Arch Linux kernel (and perhaps older ones), I can repeatably cause every byte of the 8G of RAM in my system to get filled
by 'ls' by simply trying to list a particular directory on an NFSv3 mount. It's not just ls; find will hang, as will anything else that tries to list the directory (from Java on to Python's os.path.walk()).

I strongly suspect this is coming from some horrid interplay between glibc and NFS. There's a bug somewhere, and it chews up all the RAM. I've played around with wsize, rsize, and a few other NFS options to no avail.

Interestingly, creating a directory with, say, four characters in it ('zzzz', 'test', etc.) causes ls/find/etc. to work again. Moving a couple of directories /out/ makes them work again too. It's something specific about the size of the packets it's getting.

I've tested doing an ls on an Ubuntu 10.10 machine against the exact same mount and it worked fine.

Additional info:

This is a 64-bit install.

Relevant package installs:
kernel26 2.6.37.2-1
glibc 2.13-4
nfs-utils 1.2.2-6

Steps to reproduce:

* Mount the NFS filesystem
* Attempt to do anything that pokes at the directory (ls, find, etc.)
* Be prepared to kill -9 the process before it hoses your system

I'm attaching the first 200 lines or so of an strace of 'ls' against the directory. It's pretty boring. You'll see near the bottom that it just starts allocating more and more and more memory as it tries to ... do whatever the heck it is it's doing.

I'm happy to do whatever kernel testing/package version testing/etc. you folks guide me to try. Just let me know.
   strace.out (410.3 KiB)
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Wednesday, 15 February 2012, 08:13 GMT
Reason for closing:  Fixed
Comment by Phil Bordelon (Sunfall) - Thursday, 10 March 2011, 05:00 GMT
I'm also attaching a script that will recreate all of the directories. I've verified that, at least on my install, doing this on a fresh directory and then ls'ing/etc. recreates the hang. Hopefully this will help with troubleshooting.
Comment by Phil Bordelon (Sunfall) - Thursday, 10 March 2011, 22:48 GMT
I just tested this with the kernel released today (2.6.37.3-1) and it still causes a hang.
Comment by Yuri Bushmelev (jay7) - Saturday, 12 March 2011, 12:22 GMT
I confirm this issue. ls on large (about 3000 files+dirs on first level) NFSv3 share linearly fill all RAM and swap until be killed by OOM. Umount/mount help for some time but then it starting to eat RAM again.

This issue occured on my host system (todays ArchLinux) and inside my Debian Squeeze LXC container, which works fine about week before.

$ uname -a
Linux mozart 2.6.37-ARCH #1 SMP PREEMPT Tue Mar 8 08:34:35 CET 2011 x86_64 AMD Phenom(tm) II X6 1055T Processor AuthenticAMD GNU/Linux
Comment by Yuri Bushmelev (jay7) - Saturday, 12 March 2011, 13:48 GMT
I have deleted about 1000 files (now about 2000 left) and things are fine. Now all operations on share are ok.
Comment by Yuri Bushmelev (jay7) - Saturday, 12 March 2011, 15:54 GMT
No, it's still failing. But now ls/find is not affected seems. I only see this issue when trying to svn update gcc-4.5 tree laying on NFS share.
Comment by Tobias Powalowski (tpowa) - Friday, 19 August 2011, 10:47 GMT
Any update on this?
Comment by Tommy (mhm) - Thursday, 08 September 2011, 14:20 GMT
Ran into this issue with the lts-kernel (2.6.32.44-1).
Was able to reproduce it on multiple servers running that kernel.

Might be due the bug reported here? :
https://bugzilla.novell.com/show_bug.cgi?id=678123

Installing a newer kernel solved it for me.
Comment by Phil Bordelon (Sunfall) - Friday, 09 September 2011, 14:51 GMT
This was fixed with kernel 3.0.x at a minimum. It prints an error in dmesg, drops the clashing dentries, and continues along its merry way.

Loading...