FS#48163 - [nftables] systemd-hostnamed and nftables deadlock

Attached to Project: Arch Linux
Opened by Miroslav Koškár (mkoskar) - Saturday, 13 February 2016, 12:58 GMT
Last edited by Sébastien Luttringer (seblu) - Saturday, 12 March 2016, 18:19 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Sébastien Luttringer (seblu)
Architecture All
Severity Low
Priority Low
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

Given that 'nftables.service' is enabled then system initialization ends up in
deadlock (not always but often, say with 50/50 chance). I've tracked it down
to interference of 'nftables.service' and 'systemd-hostnamed'.

Investigating call traces below it seems that whereas 'nftables.service'
triggers loading of 'nf_conntract_ip4' module (which is quite logical),
'systemd-hostnamed' because of its 'PrivateNetwork=yes' creates new network
namespace which in turns somehow leads to deadlock.

As a workaround I've added 'nf_conntrack_ipv4' to '/etc/modules-load.d/'.

Additional info:
* package version(s)
* config and/or log files etc.

Name : systemd
Version : 228-4

Name : nftables
Version : 1:0.5-2

PID PPID SESS PGID USER PRI NI VSZ RSS STAT %CPU %MEM STIME TIME TTY WCHAN UNIT CMD
1 0 1 1 root 19 0 114788 5304 Ss 0.1 0.0 10:41 00:00:00 ? ep_poll init.scope /sbin/init
2 0 0 0 root 19 0 0 0 S 0.0 0.0 10:41 00:00:00 ? kthreadd - [kthreadd]
176 2 0 0 root 19 0 0 0 S 0.0 0.0 10:41 00:00:00 ? wait - [kworker/u16:4]
752 1 752 752 root 19 0 16328 1936 Ds 0.0 0.0 10:41 00:00:00 ? call_usermodehelper_exec nftables.service /usr/bin/nft -f /etc/nftables.conf
803 176 0 0 root 19 0 21228 3800 D 0.0 0.0 10:41 00:00:00 ? register_pernet_subsys - /sbin/modprobe -q -- nf_conntrack-2
807 1 807 807 root 19 0 32860 3400 Ds 0.0 0.0 10:41 00:00:00 ? nfnl_lock systemd-hostnamed.service (ostnamed)

Feb 13 10:47:38.223896 mirci kernel: INFO: task modprobe:803 blocked for more than 120 seconds.
Feb 13 10:47:38.223996 mirci kernel: Tainted: G O 4.4.1-2-ARCH #1
Feb 13 10:47:38.224040 mirci kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 13 10:47:38.224074 mirci kernel: modprobe D ffff880401cffba8 0 803 176 0x00000000
Feb 13 10:47:38.224108 mirci kernel: ffff880401cffba8 0000000000000000 ffff88040c272940 ffff880401cc8dc0
Feb 13 10:47:38.224142 mirci kernel: ffff880401d00000 ffffffff818b5624 ffff880401cc8dc0 00000000ffffffff
Feb 13 10:47:38.224172 mirci kernel: ffffffff818b5628 ffff880401cffbc0 ffffffff8158e03c ffffffff818b5620
Feb 13 10:47:38.224201 mirci kernel: Call Trace:
Feb 13 10:47:38.224231 mirci kernel: [<ffffffff8158e03c>] schedule+0x3c/0x90
Feb 13 10:47:38.224262 mirci kernel: [<ffffffff8158e415>] schedule_preempt_disabled+0x15/0x20
Feb 13 10:47:38.224291 mirci kernel: [<ffffffff8158f8be>] __mutex_lock_slowpath+0xce/0x140
Feb 13 10:47:38.224319 mirci kernel: [<ffffffffa0a4c000>] ? 0xffffffffa0a4c000
Feb 13 10:47:38.224335 mirci kernel: [<ffffffff8158f947>] mutex_lock+0x17/0x30
Feb 13 10:47:38.224350 mirci kernel: [<ffffffff81483239>] register_pernet_subsys+0x19/0x40
Feb 13 10:47:38.224370 mirci kernel: [<ffffffffa0a4c040>] nf_conntrack_l3proto_ipv4_init+0x40/0x1000 [nf_conntrack_ipv4]
Feb 13 10:47:38.224386 mirci kernel: [<ffffffff81002123>] do_one_initcall+0xb3/0x200
Feb 13 10:47:38.224402 mirci kernel: [<ffffffff811619d7>] do_init_module+0x5f/0x1e8
Feb 13 10:47:38.224417 mirci kernel: [<ffffffff810fbf7f>] load_module+0x219f/0x27e0
Feb 13 10:47:38.224432 mirci kernel: [<ffffffff810f8e70>] ? symbol_put_addr+0x50/0x50
Feb 13 10:47:38.224449 mirci kernel: [<ffffffff810fc70e>] SyS_init_module+0x14e/0x190
Feb 13 10:47:38.224464 mirci kernel: [<ffffffff81591b2e>] entry_SYSCALL_64_fastpath+0x12/0x71

Feb 13 10:47:38.224480 mirci kernel: INFO: task (ostnamed):807 blocked for more than 120 seconds.
Feb 13 10:47:38.224498 mirci kernel: Tainted: G O 4.4.1-2-ARCH #1
Feb 13 10:47:38.224512 mirci kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 13 10:47:38.224527 mirci kernel: (ostnamed) D ffff880402b0fcf8 0 807 1 0x00000004
Feb 13 10:47:38.224542 mirci kernel: ffff880402b0fcf8 0000000000000000 ffff88040c2244c0 ffff8800c90da940
Feb 13 10:47:38.224557 mirci kernel: ffff880402b10000 ffffffffa080a5a4 ffff8800c90da940 00000000ffffffff
Feb 13 10:47:38.224574 mirci kernel: ffffffffa080a5a8 ffff880402b0fd10 ffffffff8158e03c ffffffffa080a5a0
Feb 13 10:47:38.224588 mirci kernel: Call Trace:
Feb 13 10:47:38.224603 mirci kernel: [<ffffffff8158e03c>] schedule+0x3c/0x90
Feb 13 10:47:38.224618 mirci kernel: [<ffffffff8158e415>] schedule_preempt_disabled+0x15/0x20
Feb 13 10:47:38.224638 mirci kernel: [<ffffffff8158f8be>] __mutex_lock_slowpath+0xce/0x140
Feb 13 10:47:38.224658 mirci kernel: [<ffffffff8158f947>] mutex_lock+0x17/0x30
Feb 13 10:47:38.224673 mirci kernel: [<ffffffffa0808021>] nfnl_lock+0x21/0x30 [nfnetlink]
Feb 13 10:47:38.224688 mirci kernel: [<ffffffffa0a147e8>] nft_register_afinfo+0x28/0x60 [nf_tables]
Feb 13 10:47:38.224703 mirci kernel: [<ffffffffa09e31d1>] nf_tables_ipv4_init_net+0xf1/0x120 [nf_tables_ipv4]
Feb 13 10:47:38.224719 mirci kernel: [<ffffffff81482614>] ops_init+0xa4/0x120
Feb 13 10:47:38.224737 mirci kernel: [<ffffffff81482fa0>] setup_net+0x90/0x140
Feb 13 10:47:38.224750 mirci kernel: [<ffffffff8148387d>] copy_net_ns+0x6d/0xf0
Feb 13 10:47:38.224768 mirci kernel: [<ffffffff810948ca>] create_new_namespaces+0xfa/0x190
Feb 13 10:47:38.224783 mirci kernel: [<ffffffff81094ae1>] unshare_nsproxy_namespaces+0x61/0xa0
Feb 13 10:47:38.224798 mirci kernel: [<ffffffff81075e6a>] SyS_unshare+0x1ba/0x310
Feb 13 10:47:38.224813 mirci kernel: [<ffffffff81591b2e>] entry_SYSCALL_64_fastpath+0x12/0x71
This task depends upon

Closed by  Sébastien Luttringer (seblu)
Saturday, 12 March 2016, 18:19 GMT
Reason for closing:  Upstream
Comment by Miroslav Koškár (mkoskar) - Saturday, 13 February 2016, 13:06 GMT
I'm sorry for those inline logs in the body of task, here I attach them instead.
Comment by Sébastien Luttringer (seblu) - Saturday, 12 March 2016, 17:49 GMT
Hello, except I didn't correctly understand what you want, we will not debug kernel trace issues down here. Directly deal with kernel guys, they will be really more useful.

Loading...