FS#67963 - [netctl] interface-dependencies time out when fsck takes too long (?)

Attached to Project: Arch Linux
Opened by Erich Eckner (deepthought) - Tuesday, 22 September 2020, 07:19 GMT
Last edited by Jelle van der Waa (jelly) - Friday, 11 August 2023, 15:48 GMT
Task Type Bug Report
Category Arch Projects
Status Closed
Assigned To Jouke Witteveen (jouke)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

This morning, my server did not bring up all its networks properly after a reboot.

There is lag0, which merges eno3 and eno4. On top of that are the vlans lag0.1, lag0.3 and lag0.4. All these failed to come up properly, but luckily the backup-networks on eno1 and eno2 did come up (so I was able to hot-fix the problem from remote).

Looking at the logs (see below), it appears, like some fsck took some time and delayed the startup of the network interfaces. Therefore, lag0 timed out (and all of its dependencies), but the profiles for eno1 and eno2 were started correctly at the end.

I think, there are two solutions to this issue:
1: like eno1 and eno2, the waiting dependency could later start lag0 and its dependent profiles (is this possible with systemd directives?)
2 (imho the "right" solution): make the network interfaces not dependent on non-vital file systems (the checked file system has "0 2" in /etc/fstab, which should make its check non-fatal and non-blocking if I understand correctly).

Additional info:
* package version(s)
netctl 1.23-1
* config and/or log files etc.

Sep 22 06:03:08 kernel: audit: initializing netlink subsys (disabled)
Sep 22 06:03:08 kernel: drop_monitor: Initializing network drop monitor service
Sep 22 06:03:08 systemd[1]: Created slice system-netctl.slice.
Sep 22 06:03:08 systemd[1]: Created slice system-systemd\x2dfsck.slice.
Sep 22 06:03:08 systemd-sysctl[310]: Not setting net/ipv4/conf/all/rp_filter (explicit setting exists).
Sep 22 06:03:08 systemd-sysctl[310]: Not setting net/ipv4/conf/default/rp_filter (explicit setting exists).
Sep 22 06:03:08 systemd-sysctl[310]: Not setting net/ipv4/conf/all/accept_source_route (explicit setting exists).
Sep 22 06:03:08 systemd-sysctl[310]: Not setting net/ipv4/conf/default/accept_source_route (explicit setting exists).
Sep 22 06:03:08 systemd-sysctl[310]: Not setting net/ipv4/conf/all/promote_secondaries (explicit setting exists).
Sep 22 06:03:08 systemd-sysctl[310]: Not setting net/ipv4/conf/default/promote_secondaries (explicit setting exists).
Sep 22 06:03:11 systemd-fsck[381]: fsck.fat 4.1 (2017-01-24)
Sep 22 06:03:11 systemd-fsck[381]: /dev/sda1: 16 files, 12227/261373 clusters
Sep 22 06:03:11 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-fsck@dev-disk-by\x2duuid-3C11\x2dFE0D comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 22 06:03:11 kernel: audit: type=1130 audit(1600747391.041:10): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-fsck@dev-disk-by\x2duuid-3C11\x2dFE0D comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 22 06:03:11 systemd-fsck[382]: 4TB wurde 91 Tage ohne Überprüfung genutzt, Prüfung erzwungen.
Sep 22 06:03:11 systemd-fsck[379]: data wurde 91 Tage ohne Überprüfung genutzt, Prüfung erzwungen.
Sep 22 06:03:11 systemd[1]: Found device NetXtreme II BCM57800 1/10 Gigabit Ethernet (BCM57800 10-Gigabit Ethernet).
Sep 22 06:03:11 systemd[1]: Found device NetXtreme II BCM57800 1/10 Gigabit Ethernet (BCM57800 1-Gigabit Ethernet).
Sep 22 06:03:11 systemd[1]: Found device NetXtreme II BCM57800 1/10 Gigabit Ethernet (BCM57800 10-Gigabit Ethernet).
Sep 22 06:03:11 systemd[1]: Found device NetXtreme II BCM57800 1/10 Gigabit Ethernet (BCM57800 1-Gigabit Ethernet).
Sep 22 06:03:22 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 153190575 (auf Ebene 1) könnte kürzer sein. IGNORIERT.
Sep 22 06:03:22 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 153191418 (auf Ebene 1) könnte kürzer sein. IGNORIERT.
Sep 22 06:03:28 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 165708004 (auf Ebene 1) könnte kürzer sein. IGNORIERT.
Sep 22 06:03:31 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 168330955 (auf Ebene 1) könnte kürzer sein. IGNORIERT.
Sep 22 06:03:31 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 169705652 (auf Ebene 1) könnte kürzer sein. IGNORIERT.
Sep 22 06:03:31 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 169706536 (auf Ebene 1) könnte kürzer sein. IGNORIERT.
Sep 22 06:03:35 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 214564868 (auf Ebene 2) könnte schmaler sein. IGNORIERT.
Sep 22 06:03:35 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 214564877 (auf Ebene 2) könnte schmaler sein. IGNORIERT.
Sep 22 06:03:35 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 214564885 (auf Ebene 2) könnte schmaler sein. IGNORIERT.
Sep 22 06:03:35 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 214564886 (auf Ebene 1) könnte schmaler sein. IGNORIERT.
Sep 22 06:03:35 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 214564888 (auf Ebene 2) könnte schmaler sein. IGNORIERT.
Sep 22 06:03:35 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 214564897 (auf Ebene 1) könnte schmaler sein. IGNORIERT.
Sep 22 06:03:35 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 214564898 (auf Ebene 2) könnte schmaler sein. IGNORIERT.
Sep 22 06:03:35 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 214564899 (auf Ebene 1) könnte schmaler sein. IGNORIERT.
Sep 22 06:03:35 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 214564900 (auf Ebene 2) könnte schmaler sein. IGNORIERT.
Sep 22 06:03:36 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 214564904 (auf Ebene 2) könnte schmaler sein. IGNORIERT.
Sep 22 06:03:36 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 214564917 (auf Ebene 1) könnte schmaler sein. IGNORIERT.
Sep 22 06:03:36 systemd-fsck[379]: data: Der Erweiterungsbaum von Inode 214564919 (auf Ebene 1) könnte schmaler sein. IGNORIERT.
Sep 22 06:04:37 systemd-fsck[379]: data: 2182653/366231552 Dateien (2.8% nicht zusammenhängend), 2330371692/5859704832 Blöcke
Sep 22 06:04:37 systemd[1]: sys-subsystem-net-devices-lag0.device: Job sys-subsystem-net-devices-lag0.device/start timed out.
Sep 22 06:04:37 systemd[1]: Timed out waiting for device /sys/subsystem/net/devices/lag0.
Sep 22 06:04:37 systemd[1]: netctl@laser_control\x2dvlan\x2dstatic\x2dlag0.service: Job netctl@laser_control\x2dvlan\x2dstatic\x2dlag0.service/start failed with result 'dependency'.
Sep 22 06:04:37 systemd[1]: netctl@laser_amplitude\x2dvlan\x2dstatic\x2dlag0.service: Job netctl@laser_amplitude\x2dvlan\x2dstatic\x2dlag0.service/start failed with result 'dependency'.
Sep 22 06:04:37 systemd[1]: netctl@lab\x2dvlan\x2dstatic\x2dlag0.service: Job netctl@lab\x2dvlan\x2dstatic\x2dlag0.service/start failed with result 'dependency'.
Sep 22 06:04:37 systemd[1]: sys-subsystem-net-devices-lag0.device: Job sys-subsystem-net-devices-lag0.device/start failed with result 'timeout'.
Sep 22 06:04:38 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-fsck@dev-disk-by\x2duuid-8d3979cd\x2de152\x2d45e7\x2d848a\x2dce6dd0390d42 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 22 06:04:38 kernel: audit: type=1130 audit(1600747478.004:11): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-fsck@dev-disk-by\x2duuid-8d3979cd\x2de152\x2d45e7\x2d848a\x2dce6dd0390d42 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 22 06:54:09 systemd-fsck[382]: 4TB: 39710120/244154368 Dateien (0.1% nicht zusammenhängend), 496370243/976617472 Blöcke
Sep 22 06:54:14 kernel: audit: type=1130 audit(1600750454.644:12): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-fsck@dev-disk-by\x2duuid-9379dd3c\x2dfafc\x2d4174\x2dba5b\x2dadbb1add7267 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 22 06:54:14 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-fsck@dev-disk-by\x2duuid-9379dd3c\x2dfafc\x2d4174\x2dba5b\x2dadbb1add7267 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 22 06:54:15 systemd[1]: Starting Static ethernet connection to the containers...
Sep 22 06:54:15 systemd[1]: Starting A static IP for the Hinterzimmer-net...
Sep 22 06:54:15 network[464]: Starting network profile 'containers-static'...
Sep 22 06:54:15 network[466]: Starting network profile 'lag0profile'...
Sep 22 06:54:15 network[468]: Starting network profile 'uni-dhcp'...
Sep 22 06:54:15 network[465]: Starting network profile 'hinterzimmer-static'...
Sep 22 06:54:18 kernel: bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
Sep 22 06:54:18 systemd[1]: Started Static ethernet connection to the containers.
Sep 22 06:54:18 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=netctl@containers\x2dstatic comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 22 06:54:18 network[464]: Started network profile 'containers-static'
Sep 22 06:54:19 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=netctl@lag0profile comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 22 06:54:19 network[466]: Started network profile 'lag0profile'
Sep 22 06:54:19 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=netctl@uni\x2ddhcp comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 22 06:54:21 systemd[1]: Started A static IP for the Hinterzimmer-net.
Sep 22 06:54:21 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=netctl@hinterzimmer\x2dstatic comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 22 06:54:21 kernel: audit: type=1130 audit(1600750461.431:31): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=netctl@hinterzimmer\x2dstatic comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 22 06:54:21 systemd[1]: Started Internet domain name server.
Sep 22 06:54:21 dhcpd[645]: Internet Systems Consortium DHCP Server 4.4.2
Sep 22 06:54:21 dhcpd[645]: Copyright 2004-2020 Internet Systems Consortium.
Sep 22 06:54:21 dhcpd[645]: No subnet declaration for lag0.1 (no IPv4 addresses).
Sep 22 06:54:21 dhcpd[645]: you want, please write a subnet declaration
Sep 22 06:54:21 dhcpd[645]: in your dhcpd.conf file for the network segment
Sep 22 06:54:21 systemd[1]: Started vnStat network traffic monitor.
Sep 22 06:54:21 network[465]: Started network profile 'hinterzimmer-static'
Sep 22 06:54:21 kernel: NFSD: starting 90-second grace period (net f00000a0)
Sep 22 06:54:25 kernel: NFSD: all clients done reclaiming, ending NFSv4 grace period (net f00000a0)
Sep 22 06:54:26 network[468]: Started network profile 'uni-dhcp'
Sep 22 06:54:32 named[643]: BIND 9 is maintained by Internet Systems Consortium,

for sake of completeness, here's the netctl configuration:

# cat lag0profile
Description='bonding eno3 and eno4 to lag0'
Interface=lag0
Connection=bond
# The variable name is plural, but needs precisely one interface
BindsToInterfaces=(eno3 eno4)
IP=static
Adress=('127.0.0.1')

# cat lab-vlan-static-lag0
Description='Virtual LAN 1 (labor default in switch M4100) on interface lag0'
Interface=lag0.1
Connection=vlan
# This just produces "Failed to add dependency on "netctl@lag0profile.service", ignoring: Invalid argument"
# After=('lag0profile')
BindsToInterfaces=(lag0)
VLANID=1
IP=static
Address=('192.168.4.200/23')

# cat laser_amplitude-vlan-static-lag0
Description='Virtual LAN 3 (laser_amplitude in switch M4100) on interface lag0'
Interface=lag0.3
Connection=vlan
# This just produces "Failed to add dependency on "netctl@lag0profile.service", ignoring: Invalid argument"
# After=('lag0profile')
BindsToInterfaces=(lag0)
VLANID=3
IP=static
Address=('192.168.0.200/24')

# cat laser_control-vlan-static-lag0
Description='Virtual LAN 4 (laser_control in switch M4100) on interface lag0'
Interface=lag0.4
Connection=vlan
# This just produces "Failed to add dependency on "netctl@lag0profile.service", ignoring: Invalid argument"
# After=('lag0profile')
BindsToInterfaces=(lag0)
VLANID=4
IP=static
Address=('192.168.1.200/24')

(the following interfaces came up properly after fsck finished)
# cat uni-dhcp
Description='Connection to the Uni'
Interface=eno2
Connection=ethernet
IP=dhcp

# cat hinterzimmer-static
Description='A static IP for the Hinterzimmer-net'
Interface=eno1
Connection=ethernet
IP=static
Address=('192.168.8.200/24')

Steps to reproduce:
maybe this is a minimal working example (I did not try):
create a lag of two interfaces, then force invocation of fsck on a big device on next reboot, then reboot.
This task depends upon

Closed by  Jelle van der Waa (jelly)
Friday, 11 August 2023, 15:48 GMT
Reason for closing:  Moved
Additional comments about closing:  https://gitlab.archlinux.org/archlinux/n etctl/issues/7
Comment by Jouke Witteveen (jouke) - Tuesday, 22 September 2020, 16:11 GMT
With the After= statements in place, I guess it should work. Are you sure they cause issues?
Comment by Erich Eckner (deepthought) - Tuesday, 22 September 2020, 19:32 GMT
These files have not been touched since 2018-03-15. Maybe, something changed since then - I'll try :-)
Comment by Erich Eckner (deepthought) - Tuesday, 22 September 2020, 19:41 GMT
ok, adding the After=... line does not seem to bring up the old error. But what should I add for the lag0profile? After what should that be scheduled?
Note, that it was lag0, that timed out and made all the lag0.1, lag0.3, lag0.4 fail.
Comment by Jouke Witteveen (jouke) - Friday, 09 October 2020, 10:37 GMT
To be honest, I don't know why sys-subsystem-net-devices-lag0.device times out. I agree that there should be no reason for the system (udev?) to delay bringing up the device when an fsck is being run. Unfortunately, I can't think of any solution other than to increase your timeout values (but I am not certain which are involved here). If you ever learn more about this issue and have an idea on how it can be addressed, I would love to hear from you!
Comment by Buggy McBugFace (bugbot) - Tuesday, 08 August 2023, 19:11 GMT
This is an automated comment as this bug is open for more then 2 years. Please reply if you still experience this bug otherwise this issue will be closed after 1 month.

Loading...