FS#74233 - [linux] EXT4 commits fail with fast_commit +- fscrypt

Attached to Project: Arch Linux
Opened by Angus (gusnz) - Friday, 25 March 2022, 09:23 GMT
Last edited by Toolybird (Toolybird) - Thursday, 14 September 2023, 07:08 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
David Runge (dvzrv)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

This affects kernels 5.16.15-arch1 and 5.16.16-arch1 at least.

While running 5.16.16 with my home directory encrypted with fscrypt (automatically unlocked on login), I enabled EXT4 fast_commit on my root drive with:

sudo tune2fs -O fast_commit /dev/nvme0n1p2

( Other filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg encrypt sparse_super large_file huge_file dir_nlink extra_isize metadata_csum )

Not too long after that my system wouldn't suspend or shutdown, with repeated episodes of this in the log:

1 Mar 23 22:19:15 hostname kernel: INFO: task IndexedDB #19:3350 blocked for more than 122 seconds.
1 Mar 23 22:19:15 hostname kernel: Not tainted 5.16.16-arch1-1 #1
2 Mar 23 22:19:15 hostname kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
3 Mar 23 22:19:15 hostname kernel: task:IndexedDB #19 state:D stack: 0 pid: 3350 ppid: 1393 flags:0x00000002
4 Mar 23 22:19:15 hostname kernel: Call Trace:
5 Mar 23 22:19:15 hostname kernel: <TASK>
6 Mar 23 22:19:15 hostname kernel: __schedule+0x2f6/0xf80
7 Mar 23 22:19:15 hostname kernel: ? touch_atime+0x44/0x190
8 Mar 23 22:19:15 hostname kernel: schedule+0x4b/0xc0
9 Mar 23 22:19:15 hostname kernel: ext4_fc_wait_committing_inode+0xad/0xe0 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
10 Mar 23 22:19:15 hostname kernel: ? var_wake_function+0x20/0x20
11 Mar 23 22:19:15 hostname kernel: ext4_fc_start_update+0x43/0x90 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
12 Mar 23 22:19:15 hostname kernel: ext4_buffered_write_iter+0x35/0x120 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
13 Mar 23 22:19:15 hostname kernel: new_sync_write+0x15c/0x1f0
14 Mar 23 22:19:15 hostname kernel: vfs_write+0x1eb/0x280
15 Mar 23 22:19:15 hostname kernel: ksys_write+0x67/0xe0
16 Mar 23 22:19:15 hostname kernel: do_syscall_64+0x5c/0x80
17 Mar 23 22:19:15 hostname kernel: ? do_syscall_64+0x69/0x80
18 Mar 23 22:19:15 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40
19 Mar 23 22:19:15 hostname kernel: ? do_syscall_64+0x69/0x80
20 Mar 23 22:19:15 hostname kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
21 Mar 23 22:19:15 hostname kernel: RIP: 0033:0x7f5c0222928f
22 Mar 23 22:19:15 hostname kernel: RSP: 002b:00007f5becaac970 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
23 Mar 23 22:19:15 hostname kernel: RAX: ffffffffffffffda RBX: 00007f5bbcfcffa0 RCX: 00007f5c0222928f
24 Mar 23 22:19:15 hostname kernel: RDX: 0000000000001000 RSI: 00007f5bb8e89000 RDI: 0000000000000038
25 Mar 23 22:19:15 hostname kernel: RBP: 0000000000000038 R08: 0000000000000000 R09: 0000000000000001
26 Mar 23 22:19:15 hostname kernel: R10: 00007ffea35cb080 R11: 0000000000000293 R12: 00007f5bb8e89000
27 Mar 23 22:19:15 hostname kernel: R13: 0000000000001000 R14: 00007f5bbcfcffa0 R15: 0000000000012000
28 Mar 23 22:19:15 hostname kernel: </TASK>

Shutdown was held up by Firefox/Thunderbird processes hanging, after an expired timeout I had to forcefully reboot with Magic SysRq key. At one point the system wouldn't restart with disk errors and I had to fsck.ext4 in the initrd, luckily managed to get booting again (but errors persisted). Log from the latter part of an attempted suspend:

Mar 24 00:41:16 hostname kernel: PM: suspend entry (s2idle)
Mar 24 00:41:16 hostname kernel: Filesystems sync: 0.009 seconds
Mar 24 00:41:36 hostname kernel: Freezing user space processes ...
Mar 24 00:41:36 hostname kernel: Freezing of tasks failed after 20.009 seconds (3 tasks refusing to freeze, wq_busy=0):
Mar 24 00:41:36 hostname kernel: task:IndexedDB #19 state:D stack: 0 pid: 3350 ppid: 1393 flags:0x00000006
Mar 24 00:41:36 hostname kernel: Call Trace:
Mar 24 00:41:36 hostname kernel: <TASK>
Mar 24 00:41:36 hostname kernel: __schedule+0x2f6/0xf80
Mar 24 00:41:36 hostname kernel: ? touch_atime+0x44/0x190
Mar 24 00:41:36 hostname kernel: schedule+0x4b/0xc0
Mar 24 00:41:36 hostname kernel: ext4_fc_wait_committing_inode+0xad/0xe0 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: ? var_wake_function+0x20/0x20
Mar 24 00:41:36 hostname kernel: ext4_fc_start_update+0x43/0x90 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: ext4_buffered_write_iter+0x35/0x120 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: new_sync_write+0x15c/0x1f0
Mar 24 00:41:36 hostname kernel: vfs_write+0x1eb/0x280
Mar 24 00:41:36 hostname kernel: ksys_write+0x67/0xe0
Mar 24 00:41:36 hostname kernel: do_syscall_64+0x5c/0x80
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Mar 24 00:41:36 hostname kernel: RIP: 0033:0x7f5c0222928f
Mar 24 00:41:36 hostname kernel: RSP: 002b:00007f5becaac970 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
Mar 24 00:41:36 hostname kernel: RAX: ffffffffffffffda RBX: 00007f5bbcfcffa0 RCX: 00007f5c0222928f
Mar 24 00:41:36 hostname kernel: RDX: 0000000000001000 RSI: 00007f5bb8e89000 RDI: 0000000000000038
Mar 24 00:41:36 hostname kernel: RBP: 0000000000000038 R08: 0000000000000000 R09: 0000000000000001
Mar 24 00:41:36 hostname kernel: R10: 00007ffea35cb080 R11: 0000000000000293 R12: 00007f5bb8e89000
Mar 24 00:41:36 hostname kernel: R13: 0000000000001000 R14: 00007f5bbcfcffa0 R15: 0000000000012000
Mar 24 00:41:36 hostname kernel: </TASK>
Mar 24 00:41:36 hostname kernel: task:Indexed~ Mnt #1 state:D stack: 0 pid:64999 ppid: 1393 flags:0x00000006
Mar 24 00:41:36 hostname kernel: Call Trace:
Mar 24 00:41:36 hostname kernel: <TASK>
Mar 24 00:41:36 hostname kernel: __schedule+0x2f6/0xf80
Mar 24 00:41:36 hostname kernel: schedule+0x4b/0xc0
Mar 24 00:41:36 hostname kernel: ext4_fc_wait_committing_inode+0xad/0xe0 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: ? var_wake_function+0x20/0x20
Mar 24 00:41:36 hostname kernel: ext4_fc_start_update+0x43/0x90 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: ext4_buffered_write_iter+0x35/0x120 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: new_sync_write+0x15c/0x1f0
Mar 24 00:41:36 hostname kernel: vfs_write+0x1eb/0x280
Mar 24 00:41:36 hostname kernel: ksys_write+0x67/0xe0
Mar 24 00:41:36 hostname kernel: do_syscall_64+0x5c/0x80
Mar 24 00:41:36 hostname kernel: ? __fget_light+0x8f/0x110
Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Mar 24 00:41:36 hostname kernel: RIP: 0033:0x7f5c0222928f
Mar 24 00:41:36 hostname kernel: RSP: 002b:00007f5bbc0ebcf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
Mar 24 00:41:36 hostname kernel: RAX: ffffffffffffffda RBX: 00007f5ba7fa5608 RCX: 00007f5c0222928f
Mar 24 00:41:36 hostname kernel: RDX: 0000000000000018 RSI: 00007f5bbc0ebeb0 RDI: 00000000000000ce
Mar 24 00:41:36 hostname kernel: RBP: 00000000000000ce R08: 0000000000000000 R09: 0000000000000000
Mar 24 00:41:36 hostname kernel: R10: 00007ffea35cb080 R11: 0000000000000293 R12: 00007f5bbc0ebeb0
Mar 24 00:41:36 hostname kernel: R13: 0000000000000018 R14: 00007f5ba7fa5608 R15: 0000000000058860
Mar 24 00:41:36 hostname kernel: </TASK>
Mar 24 00:41:36 hostname kernel: task:Indexed~ Mnt #2 state:D stack: 0 pid:65000 ppid: 1393 flags:0x00000006
Mar 24 00:41:36 hostname kernel: Call Trace:
Mar 24 00:41:36 hostname kernel: <TASK>
Mar 24 00:41:36 hostname kernel: __schedule+0x2f6/0xf80
Mar 24 00:41:36 hostname kernel: ? atime_needs_update+0x82/0x100
Mar 24 00:41:36 hostname kernel: schedule+0x4b/0xc0
Mar 24 00:41:36 hostname kernel: ext4_fc_wait_committing_inode+0xad/0xe0 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: ? var_wake_function+0x20/0x20
Mar 24 00:41:36 hostname kernel: ext4_fc_start_update+0x43/0x90 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: ext4_buffered_write_iter+0x35/0x120 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: new_sync_write+0x15c/0x1f0
Mar 24 00:41:36 hostname kernel: vfs_write+0x1eb/0x280
Mar 24 00:41:36 hostname kernel: ksys_write+0x67/0xe0
Mar 24 00:41:36 hostname kernel: do_syscall_64+0x5c/0x80
Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: ? native_flush_tlb_local+0x31/0x40
Mar 24 00:41:36 hostname kernel: ? flush_tlb_func+0xc8/0x1d0
Mar 24 00:41:36 hostname kernel: ? sched_clock_cpu+0x9/0xb0
Mar 24 00:41:36 hostname kernel: ? irqtime_account_irq+0x38/0xb0
Mar 24 00:41:36 hostname kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Mar 24 00:41:36 hostname kernel: RIP: 0033:0x7f5c0222928f
Mar 24 00:41:36 hostname kernel: RSP: 002b:00007f5baf502130 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
Mar 24 00:41:36 hostname kernel: RAX: ffffffffffffffda RBX: 00007f5b98511da0 RCX: 00007f5c0222928f
Mar 24 00:41:36 hostname kernel: RDX: 0000000000001000 RSI: 00007f5baaf4c000 RDI: 00000000000000a2
Mar 24 00:41:36 hostname kernel: RBP: 00000000000000a2 R08: 0000000000000000 R09: 0000000000000001
Mar 24 00:41:36 hostname kernel: R10: 00007ffea35cb080 R11: 0000000000000293 R12: 00007f5baaf4c000
Mar 24 00:41:36 hostname kernel: R13: 0000000000001000 R14: 00007f5b98511da0 R15: 0000000000007000
Mar 24 00:41:36 hostname kernel: </TASK>
Mar 24 00:41:36 hostname kernel:
Mar 24 00:41:36 hostname kernel: OOM killer enabled.
Mar 24 00:41:36 hostname kernel: Restarting tasks ... done.
Mar 24 00:41:36 hostname kernel: PM: suspend exit
Mar 24 00:41:36 hostname systemd-sleep[65047]: Failed to put system to sleep. System resumed again: Device or resource busy
Mar 24 00:41:36 hostname systemd[1]: systemd-suspend.service: Main process exited, code=exited, status=1/FAILURE
Mar 24 00:41:36 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Mar 24 00:41:36 hostname kernel: audit: type=1130 audit(1648035696.209:166): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Mar 24 00:41:36 hostname systemd[1]: systemd-suspend.service: Failed with result 'exit-code'.
Mar 24 00:41:36 hostname systemd[1]: Failed to start System Suspend.
Mar 24 00:41:36 hostname systemd[1]: Dependency failed for Suspend.
Mar 24 00:41:36 hostname systemd[1]: suspend.target: Job suspend.target/start failed with result 'dependency'.
Mar 24 00:41:36 hostname systemd[1]: Stopped target Sleep.
Mar 24 00:41:36 hostname systemd-logind[595]: Operation 'sleep' finished.

I tried both kernels with the same issue. Then to remove it I ran:

sudo tune2fs -O ^fast_commit /dev/nvme0n1p2

which still resulted in errors until I rebooted and appended "fsck.mode=force" to the Linux boot cmdline, which fixed it and the system now runs OK.

This might be a fscrypt bug but filing against the kernel as userspace shouldn't break filesystems this badly with dataloss. I suspect it's a fscrypt/fast_commit interaction?
This task depends upon

Closed by  Toolybird (Toolybird)
Thursday, 14 September 2023, 07:08 GMT
Reason for closing:  Upstream
Additional comments about closing:  Clearly not an Arch packaging issue. If still happening with latest kernels, please continue to liaise with upstream.
Comment by jb (jb.1234abcd) - Friday, 25 March 2022, 18:52 GMT
There is a similar/identical kernel bug:
https://bugzilla.kernel.org/show_bug.cgi?id=211819
You may report it there.
Comment by Angus (gusnz) - Friday, 25 March 2022, 22:40 GMT
Thanks, held off posting there initially as their bugzilla said don't report distro kernels, but have added to that report.

Loading...