FS#63733 - [linux] BTRFS dev recommends not yet running 5.2 or 5.3
Attached to Project:
Arch Linux
Opened by James Harvey (jamespharvey20) - Thursday, 12 September 2019, 08:26 GMT
Last edited by Jan Alexander Steffens (heftig) - Saturday, 14 September 2019, 12:03 GMT
Opened by James Harvey (jamespharvey20) - Thursday, 12 September 2019, 08:26 GMT
Last edited by Jan Alexander Steffens (heftig) - Saturday, 14 September 2019, 12:03 GMT
|
Details
BTRFS strikes again.
BTRFS dev Filipe Manana (SUSE): "So we definitely have a serious regression Until the fix gets merged to 5.2 kernels (and 5.3), I don't really recommend running 5.2 or 5.3." Description: BTRFS 5.2 regression can cause either: 1. system hang, doesn't risk corruption. 2. BTRFS transaction is committed despite required btree nodes not having been written, which leads to "parent transid verify failed on ..." messages which are often volume-fatal. I have ran into effect #1 (a system hang) in a VM about 10 times under heavy I/O load. I've been tracking it down, initially thinking it was a QEMU bug. Additional info: * linux 5.2.x/5.3rc to date * https://marc.info/?l=linux-btrfs&m=156827465218288&w=2 Steps to reproduce: 1. Use BTRFS 2. Use linux 5.2.x/5.3rc to date, or it even looks like git master to date 3. Get unlucky I've asked in the linked mailing list thread recommendations to distros and users, regarding backporting vs downgrading. |
This task depends upon
Closed by Jan Alexander Steffens (heftig)
Saturday, 14 September 2019, 12:03 GMT
Reason for closing: Fixed
Additional comments about closing: linux 5.2.14.arch2-1
Saturday, 14 September 2019, 12:03 GMT
Reason for closing: Fixed
Additional comments about closing: linux 5.2.14.arch2-1
* on Packages: Core
* critical
Some background and brief description from btrfs mailing list. Approximately in late summer one user started discussion about btrfs data corruption after updating to 5.2. In later June another user reported data corruption after running 5.2 for some time. After fixing his problem the second user continued running 5.2 without issues. His message "I am running 5.2 and everything currently is OK" was sent in late August. This issue seemed to be resolved. Afterwards this discussion switched to relatively separate issue about spurious space cache warnings after running 5.2. Several users said thay they received such space cache warnings (I also found such message in my journal log). One additional user reported data corruption. Some days ago one (not with very high contribution) developer claimed that there is critical regression and proposed a patch.
Please note, that currently there are several reported cases with data corruption after switching to 5.2/running 5.2 for some time. In addition, there are several messages about space cache warnings which do not harm. Kernel 5.2 released some time ago, so the number of btrfs users running 5.2 without any issue is strongly higher than 3. Circumstances and conditions which trigger data corruption are not understood. Proposed patch was not reviewed by now.
I also wish the proposed patch would have comments by other btrfs devs by now. Granted, it's only been 1.5 days. I can only say 2 things. 1) I've been running the proposed patch for quite a few hours now without a lockup. With intermittent issues, you never know for sure, but it absolutely appears to have fixed the problems I've been having. 2) I'm glad I'm not making the call of deciding whether to include this patch or not, as it always could be premature.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18dfa7117a3f379862dcd3f67cadd678013bb9dd
[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=d5039ae9d07e5df61cde9d2b5db1e6803a583374