FS#41200 - [linux] 3.15.x btrfs hangups with different oops in run_delalloc_range

Attached to Project: Arch Linux
Opened by Daniele C. (legolas558) - Monday, 14 July 2014, 20:10 GMT
Last edited by Tobias Powalowski (tpowa) - Monday, 06 October 2014, 14:19 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 5
Private No

Details

I noticed this issue in Firefox because it would hang up and then the process would never possibly be terminated (not even on reboot/shutdown).
The fact processes cannot be terminated causes partitions to not be mounted correctly and disk writes to be forgotten (this can't be any good: it's data loss).

Symptoms:
- firefox/thunderbird (and I assume it could happen also with other disk intensive applications) will hang up indefinitively, rest of the system works as expected and if you don't go to check the tail of journalctl your will never notice a thing (risky!)

Most crashes happen on run_delalloc_range

I cannot report any other data corruption/loss than those due to bad unmounting; I am using LUKS and btrfs.

See attachment for some of the crashes in kworker/thunderbird (the others for firefox are all alike).
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Monday, 06 October 2014, 14:19 GMT
Reason for closing:  Fixed
Comment by Daniele C. (legolas558) - Monday, 14 July 2014, 20:12 GMT
sorry, wrong category: should be core (linux), latest version is affected (3.5.15)

I have currently downgraded to 3.14.6-1 as a workaround
Comment by Tobias Powalowski (tpowa) - Tuesday, 15 July 2014, 07:55 GMT
Please get in contact with btrfs developers on their IRC channel and mailinglist.
We cannot help you here.
Comment by Carl George (cgtx) - Saturday, 19 July 2014, 19:56 GMT
I have a similar issue when plexmediaserver is transcoding video.

https://gist.github.com/cgtx/49d001f72e03e2e3083e

Daniele, were you able to make contact with any btrfs developers?
Comment by Felix Seidel (Sh4rk) - Saturday, 19 July 2014, 20:37 GMT
This bug occurs on my machine as well. Running stock 3.15.5 x64 btrfs RAID1 with LUKS.

Everyone that complained was also running LUKS/dm-crypt – see the thread on the linux-btrfs mailing list. Maybe this isn't even related to btrfs?
-> http://www.spinics.net/lists/linux-btrfs/msg34586.html

I reverted to 3.14.6 for now, let's see if that helps.
Comment by Carl George (cgtx) - Saturday, 19 July 2014, 21:13 GMT
No LUKS/dm-crypt in my case. I'm using a btrfs RAID-10 across four 3TB drives.
Comment by Felix Seidel (Sh4rk) - Saturday, 19 July 2014, 22:30 GMT
Downgrading to 3.14.6 didn't help either, just got a deadlock while using rsync. Other processes were still running completely normal though.
I'd really like to know what's going on there...
Comment by Daniele C. (legolas558) - Friday, 25 July 2014, 08:34 GMT
@Sh4rk are you sure that you got same deadlock? Please attach journalctl log, it will say at bottom of oops "Not tainted 3.1xxxx" reporting the running kernel version.

I have no such messages anymore after downgrading to 3.14.6-1

@tpowa I think there should be major news about this bug and possibly 3.15 should be retired

@cgtx perhaps LUKS just increases the likelihood of triggering the bug, wouldn't be the first time..
Comment by Felix Seidel (Sh4rk) - Friday, 25 July 2014, 08:46 GMT
See attached file. It's not exactly the same stack trace, but it's definitely related to btrfs.
Btw, on linux-btrfs there was a hint about disabling LZO. Got no deadlock since I disabled compression (but I'm on mainline 3.16-rc6 for now).
Comment by Tobias Powalowski (tpowa) - Friday, 25 July 2014, 10:50 GMT
If you use an experimental filesystem you should know what you are doing. Breakage can happen all the time, you need to stay in touch with the btrfs developers.
Comment by Thomas (thomasbk) - Monday, 28 July 2014, 02:20 GMT
I had the same issue (also btrfs on LUKS with LZO compression enabled), and it made my backups (from ext4 to btrfs) fail silently.

I disabled LZO compression in the mount options as Felix mentioned above, and that seems to have fixed the issue for now on 3.15.5-2 (or rather, worked around it).
Comment by Javier Viñal (fjvinal) - Wednesday, 30 July 2014, 10:17 GMT
I also had the same bug in a single disk containing big multimedia files. Disabling "compress=lzo" from the mount options solve the problem, that started with the 3.15 kernel.
Comment by Tobias Powalowski (tpowa) - Wednesday, 13 August 2014, 07:18 GMT
Status on 3.16?
Comment by Felix Seidel (Sh4rk) - Thursday, 14 August 2014, 20:16 GMT
Still present. See http://bit.ly/Y9SHZW
Comment by Tobias Powalowski (tpowa) - Monday, 08 September 2014, 04:33 GMT
This should be fixed in 3.16.2, can you confirm?
Comment by Adriano Moura (MaMuS) - Saturday, 13 September 2014, 00:18 GMT
My uptime so far is almost 3.5 days since I've updated. Also, no errors in dmesg. It's looking good.

Loading...