FS#74233 - [linux] EXT4 commits fail with fast_commit +- fscrypt
Attached to Project:
Arch Linux
Opened by Angus (gusnz) - Friday, 25 March 2022, 09:23 GMT
Last edited by Toolybird (Toolybird) - Thursday, 14 September 2023, 07:08 GMT
Opened by Angus (gusnz) - Friday, 25 March 2022, 09:23 GMT
Last edited by Toolybird (Toolybird) - Thursday, 14 September 2023, 07:08 GMT
|
Details
Description:
This affects kernels 5.16.15-arch1 and 5.16.16-arch1 at least. While running 5.16.16 with my home directory encrypted with fscrypt (automatically unlocked on login), I enabled EXT4 fast_commit on my root drive with: sudo tune2fs -O fast_commit /dev/nvme0n1p2 ( Other filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg encrypt sparse_super large_file huge_file dir_nlink extra_isize metadata_csum ) Not too long after that my system wouldn't suspend or shutdown, with repeated episodes of this in the log: 1 Mar 23 22:19:15 hostname kernel: INFO: task IndexedDB #19:3350 blocked for more than 122 seconds. 1 Mar 23 22:19:15 hostname kernel: Not tainted 5.16.16-arch1-1 #1 2 Mar 23 22:19:15 hostname kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 3 Mar 23 22:19:15 hostname kernel: task:IndexedDB #19 state:D stack: 0 pid: 3350 ppid: 1393 flags:0x00000002 4 Mar 23 22:19:15 hostname kernel: Call Trace: 5 Mar 23 22:19:15 hostname kernel: <TASK> 6 Mar 23 22:19:15 hostname kernel: __schedule+0x2f6/0xf80 7 Mar 23 22:19:15 hostname kernel: ? touch_atime+0x44/0x190 8 Mar 23 22:19:15 hostname kernel: schedule+0x4b/0xc0 9 Mar 23 22:19:15 hostname kernel: ext4_fc_wait_committing_inode+0xad/0xe0 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0] 10 Mar 23 22:19:15 hostname kernel: ? var_wake_function+0x20/0x20 11 Mar 23 22:19:15 hostname kernel: ext4_fc_start_update+0x43/0x90 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0] 12 Mar 23 22:19:15 hostname kernel: ext4_buffered_write_iter+0x35/0x120 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0] 13 Mar 23 22:19:15 hostname kernel: new_sync_write+0x15c/0x1f0 14 Mar 23 22:19:15 hostname kernel: vfs_write+0x1eb/0x280 15 Mar 23 22:19:15 hostname kernel: ksys_write+0x67/0xe0 16 Mar 23 22:19:15 hostname kernel: do_syscall_64+0x5c/0x80 17 Mar 23 22:19:15 hostname kernel: ? do_syscall_64+0x69/0x80 18 Mar 23 22:19:15 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40 19 Mar 23 22:19:15 hostname kernel: ? do_syscall_64+0x69/0x80 20 Mar 23 22:19:15 hostname kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae 21 Mar 23 22:19:15 hostname kernel: RIP: 0033:0x7f5c0222928f 22 Mar 23 22:19:15 hostname kernel: RSP: 002b:00007f5becaac970 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 23 Mar 23 22:19:15 hostname kernel: RAX: ffffffffffffffda RBX: 00007f5bbcfcffa0 RCX: 00007f5c0222928f 24 Mar 23 22:19:15 hostname kernel: RDX: 0000000000001000 RSI: 00007f5bb8e89000 RDI: 0000000000000038 25 Mar 23 22:19:15 hostname kernel: RBP: 0000000000000038 R08: 0000000000000000 R09: 0000000000000001 26 Mar 23 22:19:15 hostname kernel: R10: 00007ffea35cb080 R11: 0000000000000293 R12: 00007f5bb8e89000 27 Mar 23 22:19:15 hostname kernel: R13: 0000000000001000 R14: 00007f5bbcfcffa0 R15: 0000000000012000 28 Mar 23 22:19:15 hostname kernel: </TASK> Shutdown was held up by Firefox/Thunderbird processes hanging, after an expired timeout I had to forcefully reboot with Magic SysRq key. At one point the system wouldn't restart with disk errors and I had to fsck.ext4 in the initrd, luckily managed to get booting again (but errors persisted). Log from the latter part of an attempted suspend: Mar 24 00:41:16 hostname kernel: PM: suspend entry (s2idle) Mar 24 00:41:16 hostname kernel: Filesystems sync: 0.009 seconds Mar 24 00:41:36 hostname kernel: Freezing user space processes ... Mar 24 00:41:36 hostname kernel: Freezing of tasks failed after 20.009 seconds (3 tasks refusing to freeze, wq_busy=0): Mar 24 00:41:36 hostname kernel: task:IndexedDB #19 state:D stack: 0 pid: 3350 ppid: 1393 flags:0x00000006 Mar 24 00:41:36 hostname kernel: Call Trace: Mar 24 00:41:36 hostname kernel: <TASK> Mar 24 00:41:36 hostname kernel: __schedule+0x2f6/0xf80 Mar 24 00:41:36 hostname kernel: ? touch_atime+0x44/0x190 Mar 24 00:41:36 hostname kernel: schedule+0x4b/0xc0 Mar 24 00:41:36 hostname kernel: ext4_fc_wait_committing_inode+0xad/0xe0 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0] Mar 24 00:41:36 hostname kernel: ? var_wake_function+0x20/0x20 Mar 24 00:41:36 hostname kernel: ext4_fc_start_update+0x43/0x90 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0] Mar 24 00:41:36 hostname kernel: ext4_buffered_write_iter+0x35/0x120 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0] Mar 24 00:41:36 hostname kernel: new_sync_write+0x15c/0x1f0 Mar 24 00:41:36 hostname kernel: vfs_write+0x1eb/0x280 Mar 24 00:41:36 hostname kernel: ksys_write+0x67/0xe0 Mar 24 00:41:36 hostname kernel: do_syscall_64+0x5c/0x80 Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80 Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40 Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80 Mar 24 00:41:36 hostname kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Mar 24 00:41:36 hostname kernel: RIP: 0033:0x7f5c0222928f Mar 24 00:41:36 hostname kernel: RSP: 002b:00007f5becaac970 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 Mar 24 00:41:36 hostname kernel: RAX: ffffffffffffffda RBX: 00007f5bbcfcffa0 RCX: 00007f5c0222928f Mar 24 00:41:36 hostname kernel: RDX: 0000000000001000 RSI: 00007f5bb8e89000 RDI: 0000000000000038 Mar 24 00:41:36 hostname kernel: RBP: 0000000000000038 R08: 0000000000000000 R09: 0000000000000001 Mar 24 00:41:36 hostname kernel: R10: 00007ffea35cb080 R11: 0000000000000293 R12: 00007f5bb8e89000 Mar 24 00:41:36 hostname kernel: R13: 0000000000001000 R14: 00007f5bbcfcffa0 R15: 0000000000012000 Mar 24 00:41:36 hostname kernel: </TASK> Mar 24 00:41:36 hostname kernel: task:Indexed~ Mnt #1 state:D stack: 0 pid:64999 ppid: 1393 flags:0x00000006 Mar 24 00:41:36 hostname kernel: Call Trace: Mar 24 00:41:36 hostname kernel: <TASK> Mar 24 00:41:36 hostname kernel: __schedule+0x2f6/0xf80 Mar 24 00:41:36 hostname kernel: schedule+0x4b/0xc0 Mar 24 00:41:36 hostname kernel: ext4_fc_wait_committing_inode+0xad/0xe0 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0] Mar 24 00:41:36 hostname kernel: ? var_wake_function+0x20/0x20 Mar 24 00:41:36 hostname kernel: ext4_fc_start_update+0x43/0x90 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0] Mar 24 00:41:36 hostname kernel: ext4_buffered_write_iter+0x35/0x120 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0] Mar 24 00:41:36 hostname kernel: new_sync_write+0x15c/0x1f0 Mar 24 00:41:36 hostname kernel: vfs_write+0x1eb/0x280 Mar 24 00:41:36 hostname kernel: ksys_write+0x67/0xe0 Mar 24 00:41:36 hostname kernel: do_syscall_64+0x5c/0x80 Mar 24 00:41:36 hostname kernel: ? __fget_light+0x8f/0x110 Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40 Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80 Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80 Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40 Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80 Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80 Mar 24 00:41:36 hostname kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Mar 24 00:41:36 hostname kernel: RIP: 0033:0x7f5c0222928f Mar 24 00:41:36 hostname kernel: RSP: 002b:00007f5bbc0ebcf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 Mar 24 00:41:36 hostname kernel: RAX: ffffffffffffffda RBX: 00007f5ba7fa5608 RCX: 00007f5c0222928f Mar 24 00:41:36 hostname kernel: RDX: 0000000000000018 RSI: 00007f5bbc0ebeb0 RDI: 00000000000000ce Mar 24 00:41:36 hostname kernel: RBP: 00000000000000ce R08: 0000000000000000 R09: 0000000000000000 Mar 24 00:41:36 hostname kernel: R10: 00007ffea35cb080 R11: 0000000000000293 R12: 00007f5bbc0ebeb0 Mar 24 00:41:36 hostname kernel: R13: 0000000000000018 R14: 00007f5ba7fa5608 R15: 0000000000058860 Mar 24 00:41:36 hostname kernel: </TASK> Mar 24 00:41:36 hostname kernel: task:Indexed~ Mnt #2 state:D stack: 0 pid:65000 ppid: 1393 flags:0x00000006 Mar 24 00:41:36 hostname kernel: Call Trace: Mar 24 00:41:36 hostname kernel: <TASK> Mar 24 00:41:36 hostname kernel: __schedule+0x2f6/0xf80 Mar 24 00:41:36 hostname kernel: ? atime_needs_update+0x82/0x100 Mar 24 00:41:36 hostname kernel: schedule+0x4b/0xc0 Mar 24 00:41:36 hostname kernel: ext4_fc_wait_committing_inode+0xad/0xe0 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0] Mar 24 00:41:36 hostname kernel: ? var_wake_function+0x20/0x20 Mar 24 00:41:36 hostname kernel: ext4_fc_start_update+0x43/0x90 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0] Mar 24 00:41:36 hostname kernel: ext4_buffered_write_iter+0x35/0x120 [ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0] Mar 24 00:41:36 hostname kernel: new_sync_write+0x15c/0x1f0 Mar 24 00:41:36 hostname kernel: vfs_write+0x1eb/0x280 Mar 24 00:41:36 hostname kernel: ksys_write+0x67/0xe0 Mar 24 00:41:36 hostname kernel: do_syscall_64+0x5c/0x80 Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40 Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80 Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40 Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80 Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40 Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80 Mar 24 00:41:36 hostname kernel: ? native_flush_tlb_local+0x31/0x40 Mar 24 00:41:36 hostname kernel: ? flush_tlb_func+0xc8/0x1d0 Mar 24 00:41:36 hostname kernel: ? sched_clock_cpu+0x9/0xb0 Mar 24 00:41:36 hostname kernel: ? irqtime_account_irq+0x38/0xb0 Mar 24 00:41:36 hostname kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Mar 24 00:41:36 hostname kernel: RIP: 0033:0x7f5c0222928f Mar 24 00:41:36 hostname kernel: RSP: 002b:00007f5baf502130 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 Mar 24 00:41:36 hostname kernel: RAX: ffffffffffffffda RBX: 00007f5b98511da0 RCX: 00007f5c0222928f Mar 24 00:41:36 hostname kernel: RDX: 0000000000001000 RSI: 00007f5baaf4c000 RDI: 00000000000000a2 Mar 24 00:41:36 hostname kernel: RBP: 00000000000000a2 R08: 0000000000000000 R09: 0000000000000001 Mar 24 00:41:36 hostname kernel: R10: 00007ffea35cb080 R11: 0000000000000293 R12: 00007f5baaf4c000 Mar 24 00:41:36 hostname kernel: R13: 0000000000001000 R14: 00007f5b98511da0 R15: 0000000000007000 Mar 24 00:41:36 hostname kernel: </TASK> Mar 24 00:41:36 hostname kernel: Mar 24 00:41:36 hostname kernel: OOM killer enabled. Mar 24 00:41:36 hostname kernel: Restarting tasks ... done. Mar 24 00:41:36 hostname kernel: PM: suspend exit Mar 24 00:41:36 hostname systemd-sleep[65047]: Failed to put system to sleep. System resumed again: Device or resource busy Mar 24 00:41:36 hostname systemd[1]: systemd-suspend.service: Main process exited, code=exited, status=1/FAILURE Mar 24 00:41:36 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' Mar 24 00:41:36 hostname kernel: audit: type=1130 audit(1648035696.209:166): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' Mar 24 00:41:36 hostname systemd[1]: systemd-suspend.service: Failed with result 'exit-code'. Mar 24 00:41:36 hostname systemd[1]: Failed to start System Suspend. Mar 24 00:41:36 hostname systemd[1]: Dependency failed for Suspend. Mar 24 00:41:36 hostname systemd[1]: suspend.target: Job suspend.target/start failed with result 'dependency'. Mar 24 00:41:36 hostname systemd[1]: Stopped target Sleep. Mar 24 00:41:36 hostname systemd-logind[595]: Operation 'sleep' finished. I tried both kernels with the same issue. Then to remove it I ran: sudo tune2fs -O ^fast_commit /dev/nvme0n1p2 which still resulted in errors until I rebooted and appended "fsck.mode=force" to the Linux boot cmdline, which fixed it and the system now runs OK. This might be a fscrypt bug but filing against the kernel as userspace shouldn't break filesystems this badly with dataloss. I suspect it's a fscrypt/fast_commit interaction? |
This task depends upon
Closed by Toolybird (Toolybird)
Thursday, 14 September 2023, 07:08 GMT
Reason for closing: Upstream
Additional comments about closing: Clearly not an Arch packaging issue. If still happening with latest kernels, please continue to liaise with upstream.
Thursday, 14 September 2023, 07:08 GMT
Reason for closing: Upstream
Additional comments about closing: Clearly not an Arch packaging issue. If still happening with latest kernels, please continue to liaise with upstream.
https://bugzilla.kernel.org/show_bug.cgi?id=211819
You may report it there.