FS#24397 : [kernel26] softlockup with kernel 2.6.39

FS#24397 - [kernel26] softlockup with kernel 2.6.39

Attached to Project: Arch Linux
Opened by Hussam Al-Tayeb (hussam) - Monday, 23 May 2011, 02:15 GMT
Last edited by Tobias Powalowski (tpowa) - Thursday, 16 February 2012, 17:57 GMT

Task Type	Bug Report
Category	Upstream Bugs
Status	Closed
Assigned To	Tobias Powalowski (tpowa) Thomas Bächler (brain0)
Architecture	All
Severity	Critical
Priority	Normal
Reported Version
Due in Version	Undecided
Due Date	Undecided
Percent Complete
Votes	0
Private	No

Details

After upgrading to kernel 2.6.39, I started having soft lockups due to disk activity. anything more that low disk activity would cause a problem in an application.
dmesg would spit out something like [ 1920.307498] INFO: task java:25665 blocked for more than 120 seconds.
[ 1920.307499] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1920.307500] java D f036df98 0 25665 25393 0x00000000
[ 1920.307503] f036dee0 00000086 c15899c8 f036df98 e1c09590 00000001 00cbde28 00000170
[ 1920.307507] f036de98 00000064 f036de60 f036de60 f036de68 f036de68 e1c09590 c14e1440
[ 1920.307511] 081c6000 c14e1440 f5506440 e1c09590 e1c08450 00000000 ffffffff c15899c8
[ 1920.307515] Call Trace:
[ 1920.307518] [<c1073c8d>] ? get_futex_key+0x6d/0x1d0
[ 1920.307520] [<c10742c5>] ? futex_wake+0xe5/0x100
[ 1920.307522] [<c132fd65>] rwsem_down_failed_common+0x95/0xe0
[ 1920.307525] [<c1027640>] ? vmalloc_sync_all+0x120/0x120
[ 1920.307527] [<c132fde2>] rwsem_down_read_failed+0x12/0x14
[ 1920.307529] [<c132fe1f>] call_rwsem_down_read_failed+0x7/0xc
[ 1920.307531] [<c132f69d>] ? down_read+0xd/0x10
[ 1920.307534] [<c1027787>] do_page_fault+0x147/0x420
[ 1920.307536] [<c10760e4>] ? sys_futex+0xc4/0x130
[ 1920.307538] [<c1027640>] ? vmalloc_sync_all+0x120/0x120
[ 1920.307540] [<c1330c4b>] error_code+0x67/0x6c

One application (let's call it A) would then stop being able to read/write from disk. Other running applications would still be able to read/write fine to the disk.
I could even copy the data application A to another folder or delete it.
This isn't a hard lockup and I could still continue to use the computer but then it'll hang at shutdown.
At first I thought the disk (which I bought 12 days ago) is bad so I ran badblocks -vs and didn't find a single bad block. I ran smartctl long test and the disk is fine. It started to feel like some ext4 regression.

I downgraded to kernel 2.6.38.6 and performed a disk intensive action which was recompiling libreoffice. This worked without a problem.
I also tried the application A which was giving problems earlier but I couldn't see a problem again. So I compiled libreoffice again to check and didn't have lockups.

This task depends upon

Closed by Tobias Powalowski (tpowa)
Thursday, 16 February 2012, 17:57 GMT
Reason for closing: Upstream

Comment by Tom Gundersen (tomegun) - Monday, 23 May 2011, 08:03 GMT

This is almost certainly an upstream issue, so should probably be reported at: <https://bugzilla.kernel.org/>.

Comment by Hussam Al-Tayeb (hussam) - Monday, 23 May 2011, 08:15 GMT

Ok, I reported a upstream bug. https://bugzilla.kernel.org/show_bug.cgi?id=35662

In the meantime, it is possible that we can have a update in core to 2.6.38.7 while 2.6.39 is still in testing?

Comment by Jens Adam (byte) - Tuesday, 24 May 2011, 10:42 GMT

I had those "hung_task" messages for at least through the whole 2.6.38 releases.
Mostly while dd'ing disk images onto USB sticks, md5summing CD-RWs or similar.
The first hint was always Firefox being completely frozen.
But when the long-running task was completed, all mouse and keyboard input I had done in the meanwhile got fed into Firefox and everything was back to normal.

Comment by Hussam Al-Tayeb (hussam) - Tuesday, 24 May 2011, 16:41 GMT

Andrew Morton seems to suggest it is because of luks encryption in my case.

Comment by sergio (asgarth) - Wednesday, 06 July 2011, 16:53 GMT

Same problem here, without using encryption for any partition. Problem appear only after intensive cpu or disk usage, usually after 4 or more hours from system startup.

Comment by robert r (crobe) - Saturday, 09 July 2011, 10:40 GMT

I also experienced this and I'm using luks with XFS.
Running the "sync" command continued disk writing for a while, so for a short time fix I mad something like "while true; do sync; done", which is not the best fix :)

	Tasks related to this task (0)

Duplicate tasks of this task (0)

Arch Linux

FS#24397 - [kernel26] softlockup with kernel 2.6.39

Details

Loading...