FS#53663 - [linux] frequent f2fs lockups

Attached to Project: Arch Linux
Opened by Rulatir (Rulatir) - Thursday, 13 April 2017, 13:12 GMT
Last edited by Toolybird (Toolybird) - Saturday, 27 May 2023, 08:05 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 4
Private No

Details

Description:


Additional info:
* package version(s)
* config and/or log files etc.


Steps to reproduce:

Programs that use the f2fs partition a lot (including systemd-journal) get randomly stuck in Disk Sleep forever and remain unkillable until reboot. Examples from dmesg:

[ 9340.673596] INFO: task systemd-journal:18653 blocked for more than 120 seconds.
[ 9340.673604] Tainted: P O 4.10.9-1-ARCH #1
[ 9340.673607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9340.673611] systemd-journal D 0 18653 1 0x00000104
[ 9340.673615] Call Trace:
[ 9340.673626] __schedule+0x22f/0x700
[ 9340.673631] ? resched_curr+0xc3/0xd0
[ 9340.673635] schedule+0x3d/0x90
[ 9340.673639] schedule_timeout+0x243/0x3d0
[ 9340.673642] ? ttwu_do_activate+0x6f/0x80
[ 9340.673646] ? try_to_wake_up+0x18d/0x3c0
[ 9340.673650] wait_for_common+0xbe/0x180
[ 9340.673653] ? wake_up_q+0x80/0x80
[ 9340.673657] wait_for_completion+0x1d/0x20
[ 9340.673670] f2fs_issue_flush+0x160/0x1b0 [f2fs]
[ 9340.673679] f2fs_do_sync_file+0x46f/0x730 [f2fs]
[ 9340.673688] f2fs_sync_file+0x11/0x20 [f2fs]
[ 9340.673692] vfs_fsync_range+0x4b/0xb0
[ 9340.673695] do_fsync+0x3d/0x70
[ 9340.673698] SyS_fsync+0x10/0x20
[ 9340.673701] do_syscall_64+0x54/0xc0
[ 9340.673704] entry_SYSCALL64_slow_path+0x25/0x25
[ 9340.673706] RIP: 0033:0x7f6b338b54cd
[ 9340.673708] RSP: 002b:00007f6b30d82e50 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[ 9340.673711] RAX: ffffffffffffffda RBX: 0000555d2e0d6060 RCX: 00007f6b338b54cd
[ 9340.673713] RDX: 00007f6b31584000 RSI: 0000000000000000 RDI: 0000000000000027
[ 9340.673715] RBP: 00007f6b33e26b40 R08: 00007f6b30d83700 R09: 00007f6b30d83700
[ 9340.673716] R10: 00007f6b30d839d0 R11: 0000000000000293 R12: 0000000000000002
[ 9340.673718] R13: 00007ffed627b30f R14: 0000555d2e0c6880 R15: 00007f6b30d83700
[ 9340.673771] INFO: task firefox:2654 blocked for more than 120 seconds.
[ 9340.673775] Tainted: P O 4.10.9-1-ARCH #1
[ 9340.673777] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9340.673780] firefox D 0 2654 1012 0x00000004
[ 9340.673783] Call Trace:
[ 9340.673787] __schedule+0x22f/0x700
[ 9340.673792] ? native_smp_send_reschedule+0x27/0x40
[ 9340.673795] schedule+0x3d/0x90
[ 9340.673799] schedule_timeout+0x243/0x3d0
[ 9340.673802] ? ttwu_do_activate+0x6f/0x80
[ 9340.673806] ? try_to_wake_up+0x18d/0x3c0
[ 9340.673810] wait_for_common+0xbe/0x180
[ 9340.673813] ? wake_up_q+0x80/0x80
[ 9340.673817] wait_for_completion+0x1d/0x20
[ 9340.673826] f2fs_issue_flush+0x160/0x1b0 [f2fs]
[ 9340.673835] f2fs_do_sync_file+0x46f/0x730 [f2fs]
[ 9340.673844] f2fs_sync_file+0x11/0x20 [f2fs]
[ 9340.673848] vfs_fsync_range+0x4b/0xb0
[ 9340.673850] do_fsync+0x3d/0x70
[ 9340.673853] SyS_fsync+0x10/0x20
[ 9340.673856] entry_SYSCALL_64_fastpath+0x1a/0xa9
[ 9340.673858] RIP: 0033:0x7f41c1d0d4cd
[ 9340.673860] RSP: 002b:00007fff2e816a30 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[ 9340.673862] RAX: ffffffffffffffda RBX: 000000000000001c RCX: 00007f41c1d0d4cd
[ 9340.673864] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000000000b3
[ 9340.673866] RBP: 00007f41a8fc60c0 R08: 0000000000000000 R09: 00007f41c0d00300
[ 9340.673867] R10: 00007f41c0d00300 R11: 0000000000000293 R12: 00007fff2e8168b0
[ 9340.673869] R13: 0000000000000034 R14: 00007fff2e816930 R15: 00007f41b41f8ff4

In case it's relevant, I think this started when I started running two vagrant boxes simultaneously, but these are host processes that get stuck.
This task depends upon

Closed by  Toolybird (Toolybird)
Saturday, 27 May 2023, 08:05 GMT
Reason for closing:  No response
Additional comments about closing:  Plus it's old and stale. If still an issue, please report upstream.
Comment by Rulatir (Rulatir) - Thursday, 13 April 2017, 13:13 GMT
Sorry for bad formatting but I just wanted it sent quickly before Chrome locks up too and I can't figure out how to edit it.
Comment by Rulatir (Rulatir) - Sunday, 16 April 2017, 00:37 GMT
It seems that switching journald storage to volatile alleviates the problem. I haven't had a lockup during the over 24h uptime since I made that change.
Comment by Rulatir (Rulatir) - Tuesday, 30 May 2017, 22:10 GMT
Unfortunately I must BUMP this one. The problem didn't go away, just hid for a while and became less predictable. It tends to happen at midnight when some periodic tasks kick in, and when it happens it is actually worse than previously because the failure quickly cascades: not only the "guilty" processes get stuck in Disk Sleep, but many innocent processes eventually do too, until everything stops working and I must hard reset.
Comment by Jared Sutton (jpsutton) - Friday, 02 June 2017, 14:03 GMT
I may also be observing this problem. Thunderbird was hanging frequently and unkillable. I traced the issue to the process being in deep sleep and hanging at f2fs_issue_flush. I've switched to the arch lts kernel and haven't seen a recurrence yet (cross fingers). If there's anything I can do to provide more debugging information, please let me know.
Comment by mwohah (mwohah) - Saturday, 03 June 2017, 07:57 GMT
Seth from the forums pointed me here. Looks like there are more occurrences of this or a very similar issue. See also [1].

I don't experience any actual complete lock-ups, but when some processes that experience this issue seem to become zombie processes and can't be gotten rid of without a reboot. In my case, an often recurrent and noticable victim of this seems to be dockerd, which just freezes up randomly during various operations (see also the forum).

[1] https://bbs.archlinux.org/viewtopic.php?pid=1715404
Comment by mwohah (mwohah) - Saturday, 03 June 2017, 08:41 GMT
@jpsutton Thanks for the tip of switching back to lts, I've just done the same and it already fixed one of the issues I was experiencing: I've been trying to get Atom packages to update for weeks now and they usually just hung up whilst trying to update (even running apm directly froze up). strace showed that it was waiting for some kind of read. Now that I've switched to linux-lts, I all of a sudden can upgrade all my packages without any issues.

I'm curious to see if this will fix my other issues as well, although that doesn't mean that this issue is any less problematic as sooner or later lts will be bumped to a newer version that includes this issue if it isn't fixed before that.
Comment by mwohah (mwohah) - Sunday, 04 June 2017, 19:19 GMT Comment by Roman Beslik (beroal) - Monday, 31 July 2017, 12:15 GMT
I experience this issue with Thunderbird too. The Thunderbird process is in "uninterruptible sleep" in "f2fs_issue_flush". I use F2FS for the root and home file systems. "linux" 4.11.9-1. I found this using a manual http://blog.tanelpoder.com/2013/02/21/peeking-into-linux-kernel-land-using-proc-filesystem-for-quickndirty-troubleshooting/ .
Comment by Emma (flurem) - Monday, 07 August 2017, 17:13 GMT
It seems that I also faced this problem. Linux 4.12.4-1, F2FS root. Some processes have become unkillable, in particular systemd-journal, gnome-keyring-daemon and Totem.
Comment by mattia (nTia89) - Sunday, 27 February 2022, 14:21 GMT
I cannot reproduce the issue. Is it still valid?

Loading...