FS#41200 : [linux] 3.15.x btrfs hangups with different oops in run_delalloc

FS#41200 - [linux] 3.15.x btrfs hangups with different oops in run_delalloc_range

Attached to Project: Arch Linux
Opened by Daniele C. (legolas558) - Monday, 14 July 2014, 20:10 GMT
Last edited by Tobias Powalowski (tpowa) - Monday, 06 October 2014, 14:19 GMT

Task Type	Bug Report
Category	Kernel
Status	Closed
Assigned To	Tobias Powalowski (tpowa) Thomas Bächler (brain0)
Architecture	All
Severity	High
Priority	Normal
Reported Version
Due in Version	Undecided
Due Date	Undecided
Percent Complete
Votes	5 beta990 (beta990) (2014-08-12) Javier Viñal (fjvinal) (2014-07-29) Thomas (thomasbk) (2014-07-28) Felix Seidel (Sh4rk) (2014-07-19) Daniele C. (legolas558) (2014-07-14)
Private	No

Details

I noticed this issue in Firefox because it would hang up and then the process would never possibly be terminated (not even on reboot/shutdown).
The fact processes cannot be terminated causes partitions to not be mounted correctly and disk writes to be forgotten (this can't be any good: it's data loss).

Symptoms:
- firefox/thunderbird (and I assume it could happen also with other disk intensive applications) will hang up indefinitively, rest of the system works as expected and if you don't go to check the tail of journalctl your will never notice a thing (risky!)

Most crashes happen on run_delalloc_range

I cannot report any other data corruption/loss than those due to bad unmounting; I am using LUKS and btrfs.

See attachment for some of the crashes in kworker/thunderbird (the others for firefox are all alike).

terrible btrfs 3.5.15.log (18.7 KiB)

This task depends upon

Closed by Tobias Powalowski (tpowa)
Monday, 06 October 2014, 14:19 GMT
Reason for closing: Fixed

Comment by Daniele C. (legolas558) - Monday, 14 July 2014, 20:12 GMT

sorry, wrong category: should be core (linux), latest version is affected (3.5.15)

I have currently downgraded to 3.14.6-1 as a workaround

Comment by Tobias Powalowski (tpowa) - Tuesday, 15 July 2014, 07:55 GMT

Please get in contact with btrfs developers on their IRC channel and mailinglist.
We cannot help you here.

Comment by Carl George (cgtx) - Saturday, 19 July 2014, 19:56 GMT

I have a similar issue when plexmediaserver is transcoding video.

https://gist.github.com/cgtx/49d001f72e03e2e3083e

Daniele, were you able to make contact with any btrfs developers?

Comment by Felix Seidel (Sh4rk) - Saturday, 19 July 2014, 20:37 GMT

This bug occurs on my machine as well. Running stock 3.15.5 x64 btrfs RAID1 with LUKS.

Everyone that complained was also running LUKS/dm-crypt – see the thread on the linux-btrfs mailing list. Maybe this isn't even related to btrfs?
-> http://www.spinics.net/lists/linux-btrfs/msg34586.html

I reverted to 3.14.6 for now, let's see if that helps.

Comment by Carl George (cgtx) - Saturday, 19 July 2014, 21:13 GMT

No LUKS/dm-crypt in my case. I'm using a btrfs RAID-10 across four 3TB drives.

Comment by Felix Seidel (Sh4rk) - Saturday, 19 July 2014, 22:30 GMT

Downgrading to 3.14.6 didn't help either, just got a deadlock while using rsync. Other processes were still running completely normal though.
I'd really like to know what's going on there...

Comment by Daniele C. (legolas558) - Friday, 25 July 2014, 08:34 GMT

@Sh4rk are you sure that you got same deadlock? Please attach journalctl log, it will say at bottom of oops "Not tainted 3.1xxxx" reporting the running kernel version.

I have no such messages anymore after downgrading to 3.14.6-1

@tpowa I think there should be major news about this bug and possibly 3.15 should be retired

@cgtx perhaps LUKS just increases the likelihood of triggering the bug, wouldn't be the first time..

Comment by Felix Seidel (Sh4rk) - Friday, 25 July 2014, 08:46 GMT

See attached file. It's not exactly the same stack trace, but it's definitely related to btrfs.
Btw, on linux-btrfs there was a hint about disabling LZO. Got no deadlock since I disabled compression (but I'm on mainline 3.16-rc6 for now).

3.14 btrfs hang.log (2.2 KiB)

Comment by Tobias Powalowski (tpowa) - Friday, 25 July 2014, 10:50 GMT

If you use an experimental filesystem you should know what you are doing. Breakage can happen all the time, you need to stay in touch with the btrfs developers.

Comment by Thomas (thomasbk) - Monday, 28 July 2014, 02:20 GMT

I had the same issue (also btrfs on LUKS with LZO compression enabled), and it made my backups (from ext4 to btrfs) fail silently.

I disabled LZO compression in the mount options as Felix mentioned above, and that seems to have fixed the issue for now on 3.15.5-2 (or rather, worked around it).

Comment by Javier Viñal (fjvinal) - Wednesday, 30 July 2014, 10:17 GMT

I also had the same bug in a single disk containing big multimedia files. Disabling "compress=lzo" from the mount options solve the problem, that started with the 3.15 kernel.

Comment by Tobias Powalowski (tpowa) - Wednesday, 13 August 2014, 07:18 GMT

Status on 3.16?

Comment by Felix Seidel (Sh4rk) - Thursday, 14 August 2014, 20:16 GMT

Still present. See http://bit.ly/Y9SHZW

Comment by Tobias Powalowski (tpowa) - Monday, 08 September 2014, 04:33 GMT

This should be fixed in 3.16.2, can you confirm?

Comment by Adriano Moura (MaMuS) - Saturday, 13 September 2014, 00:18 GMT

My uptime so far is almost 3.5 days since I've updated. Also, no errors in dmesg. It's looking good.

	Tasks related to this task (0)

Duplicate tasks of this task (0)

Arch Linux

FS#41200 - [linux] 3.15.x btrfs hangups with different oops in run_delalloc_range

Details

Loading...