FS#67382 - [netctl] netctl-ifplugd fails to resume after suspend-to-ram
Attached to Project:
Arch Linux
Opened by Nicola Mori (snack) - Friday, 24 July 2020, 12:04 GMT
Last edited by Jouke Witteveen (jouke) - Friday, 16 October 2020, 11:51 GMT
Opened by Nicola Mori (snack) - Friday, 24 July 2020, 12:04 GMT
Last edited by Jouke Witteveen (jouke) - Friday, 16 October 2020, 11:51 GMT
|
Details
Description:
On my system, netctl-ifplugd fails to resume after suspend-to-ram. Just after resuming the system (in the log below I resumed it at 13:45 i.e. 1:45 PM) it is found to be in a "deactivating" state: $ sudo systemctl status netctl-ifplugd@eth0 * netctl-ifplugd@eth0.service - Automatic wired network connection using netctl profiles Loaded: loaded (/usr/lib/systemd/system/netctl-ifplugd@.service; disabled; vendor preset: disabled) Active: deactivating (stop-sigterm) since Fri 2020-07-24 13:45:17 CEST; 56s ago Docs: man:netctl.special(7) Main PID: 450 (ifplugd) Status: "online" Tasks: 1 (limit: 18771) Memory: 4.7M CGroup: /system.slice/system-netctl\x2difplugd.slice/netctl-ifplugd@eth0.service `-450 /usr/bin/ifplugd -i eth0 -r /etc/ifplugd/netctl.action -bfIns lug 24 13:45:17 stryke dhcpcd[918]: eth0: deleting default route via 192.168.1.254 lug 24 13:45:17 stryke dhcpcd[918]: rt_delete: Broken pipe lug 24 13:45:17 stryke ifplugd[450]: Program executed successfully. lug 24 13:45:17 stryke dhcpcd[918]: script_runreason: Broken pipe lug 24 13:45:17 stryke dhcpcd[918]: main: control_stop: Broken pipe lug 24 13:45:17 stryke dhcpcd[918]: ps_dostop: Connection refused lug 24 13:45:17 stryke dhcpcd[918]: ps_dostop: Broken pipe lug 24 13:45:17 stryke dhcpcd[918]: dhcpcd exited lug 24 13:45:18 stryke ifplugd[450]: Link beat lost. lug 24 13:45:21 stryke ifplugd[450]: Link beat detected. and after a while (~ 1 min) it definitely exits and results to be in a "failed" state: $ sudo systemctl status netctl-ifplugd@eth0 -n 40 * netctl-ifplugd@eth0.service - Automatic wired network connection using netctl profiles Loaded: loaded (/usr/lib/systemd/system/netctl-ifplugd@.service; disabled; vendor preset: disabled) Active: failed (Result: timeout) since Fri 2020-07-24 13:46:47 CEST; 38s ago Docs: man:netctl.special(7) Process: 450 ExecStart=/usr/bin/ifplugd -i eth0 -r /etc/ifplugd/netctl.action -bfIns (code=killed, signal=KILL) Main PID: 450 (code=killed, signal=KILL) Status: "online" lug 24 09:43:32 stryke dhcpcd[918]: eth0: probing address 192.168.1.162/24 lug 24 09:43:38 stryke dhcpcd[918]: eth0: leased 192.168.1.162 for 86400 seconds lug 24 09:43:38 stryke dhcpcd[918]: eth0: adding route to 192.168.1.0/24 lug 24 09:43:38 stryke dhcpcd[918]: eth0: adding default route via 192.168.1.254 lug 24 09:43:38 stryke ifplugd[450]: client: Started network profile 'ethernet' lug 24 13:45:17 stryke dhcpcd[918]: eth0: carrier lost lug 24 13:45:17 stryke ifplugd[450]: client: ps_bpf_recvbpf: Network is down lug 24 13:45:17 stryke dhcpcd[999]: ps_bpf_recvbpf: Network is down lug 24 13:45:17 stryke ifplugd[450]: Killing child. lug 24 13:45:17 stryke dhcpcd[918]: script_runreason: Resource temporarily unavailable lug 24 13:45:17 stryke dhcpcd[918]: arp_free: Broken pipe lug 24 13:45:17 stryke ifplugd[450]: client: script_runreason: Resource temporarily unavailable lug 24 13:45:17 stryke ifplugd[450]: client: arp_free: Broken pipe lug 24 13:45:17 stryke ifplugd[450]: client: eth0: ipv4_deladdr: Broken pipe lug 24 13:45:17 stryke dhcpcd[918]: eth0: ipv4_deladdr: Broken pipe lug 24 13:45:17 stryke ifplugd[450]: client: rt_delete: Broken pipe lug 24 13:45:17 stryke ifplugd[450]: client: rt_delete: Broken pipe lug 24 13:45:17 stryke ifplugd[450]: client: script_runreason: Broken pipe lug 24 13:45:17 stryke ifplugd[450]: client: main: control_stop: Broken pipe lug 24 13:45:17 stryke ifplugd[450]: client: ps_dostop: Connection refused lug 24 13:45:17 stryke ifplugd[450]: client: ps_dostop: Broken pipe lug 24 13:45:17 stryke dhcpcd[918]: eth0: deleting route to 192.168.1.0/24 lug 24 13:45:17 stryke systemd[1]: Stopping Automatic wired network connection using netctl profiles... lug 24 13:45:17 stryke ifplugd[450]: client: ps_root_recvmsg: Connection refused lug 24 13:45:17 stryke dhcpcd[918]: rt_delete: Broken pipe lug 24 13:45:17 stryke dhcpcd[918]: eth0: deleting default route via 192.168.1.254 lug 24 13:45:17 stryke dhcpcd[918]: rt_delete: Broken pipe lug 24 13:45:17 stryke ifplugd[450]: Program executed successfully. lug 24 13:45:17 stryke dhcpcd[918]: script_runreason: Broken pipe lug 24 13:45:17 stryke dhcpcd[918]: main: control_stop: Broken pipe lug 24 13:45:17 stryke dhcpcd[918]: ps_dostop: Connection refused lug 24 13:45:17 stryke dhcpcd[918]: ps_dostop: Broken pipe lug 24 13:45:17 stryke dhcpcd[918]: dhcpcd exited lug 24 13:45:18 stryke ifplugd[450]: Link beat lost. lug 24 13:45:21 stryke ifplugd[450]: Link beat detected. lug 24 13:46:47 stryke systemd[1]: netctl-ifplugd@eth0.service: State 'stop-sigterm' timed out. Killing. lug 24 13:46:47 stryke systemd[1]: netctl-ifplugd@eth0.service: Killing process 450 (ifplugd) with signal SIGKILL. lug 24 13:46:47 stryke systemd[1]: netctl-ifplugd@eth0.service: Main process exited, code=killed, status=9/KILL lug 24 13:46:47 stryke systemd[1]: netctl-ifplugd@eth0.service: Failed with result 'timeout'. At this point I have to manually restart it. Additional info: * package version(s) netctl 1.23-1 ifplugd 0.28-16 * config and/or log files etc. Profile file attached. Steps to reproduce: - start netctl-ifplugd for a wired connection - suspend to ram - resume from susped |
This task depends upon
Closed by Jouke Witteveen (jouke)
Friday, 16 October 2020, 11:51 GMT
Reason for closing: Fixed
Additional comments about closing: 833c18a
Friday, 16 October 2020, 11:51 GMT
Reason for closing: Fixed
Additional comments about closing: 833c18a
If you have any idea what is going wrong here exactly, I'd be interested to hear it.
I can provide more information if needed, in case just ask me.
$ sudo systemctl status netctl-ifplugd@eth0 -n 50
* netctl-ifplugd@eth0.service - Automatic wired network connection using netctl profiles
Loaded: loaded (/usr/lib/systemd/system/netctl-ifplugd@.service; disabled; vendor preset: disabled)
Active: failed (Result: timeout) since Sun 2020-07-26 14:29:25 CEST; 16s ago
Docs: man:netctl.special(7)
Process: 6647 ExecStart=/usr/bin/ifplugd -i eth0 -r /etc/ifplugd/netctl.action -bfIns (code=killed, signal=KILL)
Main PID: 6647 (code=killed, signal=KILL)
Status: "online"
lug 26 14:27:35 stryke systemd[1]: Started Automatic wired network connection using netctl profiles.
lug 26 14:27:35 stryke ifplugd[6647]: Process 5429 died: No such process; trying to remove PID file. (/run/ifplugd.eth0.pid)
lug 26 14:27:35 stryke ifplugd[6647]: ifplugd 0.28 initializing.
lug 26 14:27:35 stryke ifplugd[6647]: Using interface eth0/00:E0:4C:A2:02:4E with driver <r8152> (version: v1.11.11)
lug 26 14:27:35 stryke ifplugd[6647]: Using detection mode: SIOCETHTOOL
lug 26 14:27:35 stryke ifplugd[6647]: Initialization complete, link beat detected.
lug 26 14:27:35 stryke ifplugd[6647]: Executing '/etc/ifplugd/netctl.action eth0 up'.
lug 26 14:27:35 stryke ifplugd[6647]: client: Starting network profile 'ethernet'...
lug 26 14:27:35 stryke dhcpcd[6691]: dhcpcd-9.1.4 starting
lug 26 14:27:35 stryke dhcpcd[6693]: DUID 00:04:4c:4c:45:44:00:50:4b:10:80:4d:c4:c0:4f:4a:52:32
lug 26 14:27:35 stryke dhcpcd[6693]: eth0: IAID 4c:a2:02:4e
lug 26 14:27:37 stryke dhcpcd[6693]: eth0: rebinding lease of 192.168.1.162
lug 26 14:27:37 stryke dhcpcd[6693]: eth0: leased 192.168.1.162 for 86400 seconds
lug 26 14:27:37 stryke dhcpcd[6693]: eth0: adding route to 192.168.1.0/24
lug 26 14:27:37 stryke dhcpcd[6693]: eth0: adding default route via 192.168.1.254
lug 26 14:27:37 stryke ifplugd[6647]: client: Started network profile 'ethernet'
lug 26 14:27:55 stryke ifplugd[6647]: Killing child.
lug 26 14:27:55 stryke ifplugd[6647]: client: arp_free: Broken pipe
lug 26 14:27:55 stryke ifplugd[6647]: client: script_runreason: Broken pipe
lug 26 14:27:55 stryke ifplugd[6647]: client: main: control_stop: Broken pipe
lug 26 14:27:55 stryke ifplugd[6647]: client: ps_dostop: Broken pipe
lug 26 14:27:55 stryke ifplugd[6647]: client: ps_dostop: Broken pipe
lug 26 14:27:55 stryke systemd[1]: Stopping Automatic wired network connection using netctl profiles...
lug 26 14:27:55 stryke ifplugd[6647]: Program executed successfully.
lug 26 14:29:25 stryke systemd[1]: netctl-ifplugd@eth0.service: State 'stop-sigterm' timed out. Killing.
lug 26 14:29:25 stryke systemd[1]: netctl-ifplugd@eth0.service: Killing process 6647 (ifplugd) with signal SIGKILL.
lug 26 14:29:25 stryke systemd[1]: netctl-ifplugd@eth0.service: Main process exited, code=killed, status=9/KILL
lug 26 14:29:25 stryke systemd[1]: netctl-ifplugd@eth0.service: Failed with result 'timeout'.
lug 26 14:29:25 stryke systemd[1]: Stopped Automatic wired network connection using netctl profiles.
It is interesting that after the stop the network is still up and dhcpcd is not running:
$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 9c:b6:d0:be:bd:8d brd ff:ff:ff:ff:ff:ff
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:e0:4c:a2:02:4e brd ff:ff:ff:ff:ff:ff
altname enp58s0u1u4
inet 192.168.1.162/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0
valid_lft 85735sec preferred_lft 74935sec
inet6 fe80::2e0:4cff:fea2:24e/64 scope link
valid_lft forever preferred_lft forever
$ sudo killall dhcpcd
dhcpcd: no process found
In this case there is no error from dhcpcd, just some broken pipes from ifplugd. The stopping procedure then gets stuck for about 1m30s until systemd kills netctl-ifplugd. I can reproduce this consistently but I don't have any idea about the reason.
In case it matters, my eth0 device is:
Bus 004 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
on a Startech USB-C multiport adapter.
I'm going to try again disabling the automatic bring-up of netctl-auto when netctl-ifplugd stops, just in case the two services interfere in some unpredictable manner.
ago 04 14:19:12 stryke systemd[1]: Stopping Automatic wired network connection using netctl profiles...
ago 04 14:19:12 stryke dhcpcd[678]: eth0: removing interface
ago 04 14:19:12 stryke ifplugd[452]: client: Stopping network profile 'ethernet'...
ago 04 14:19:12 stryke dhcpcd[679]: process 679 unexpectedly terminating on signal 15
ago 04 14:19:12 stryke ifplugd[452]: client: Stopped network profile 'ethernet'
ago 04 14:19:12 stryke ifplugd[452]: Program executed successfully.
ago 04 14:19:12 stryke ifplugd[452]: Exiting.
ago 04 14:20:42 stryke systemd[1]: netctl-ifplugd@eth0.service: State 'stop-final-sigterm' timed out. Killing.
ago 04 14:20:42 stryke systemd[1]: netctl-ifplugd@eth0.service: Killing process 678 (dhcpcd) with signal SIGKILL.
ago 04 14:20:42 stryke systemd[1]: netctl-ifplugd@eth0.service: Failed with result 'timeout'.
ago 04 14:20:42 stryke systemd[1]: Stopped Automatic wired network connection using netctl profiles.
(the "signal 15" error is a known issue of this version, see https://bugs.archlinux.org/task/66405) while on the second attempt I got no error and this log:
ago 04 14:30:33 stryke systemd[1]: Stopping Automatic wired network connection using netctl profiles...
ago 04 14:30:33 stryke ifplugd[2305]: client: Stopping network profile 'ethernet'...
ago 04 14:30:33 stryke ifplugd[2305]: client: dhcpcd not running
ago 04 14:30:33 stryke ifplugd[2305]: client: Stopped network profile 'ethernet'
ago 04 14:30:33 stryke ifplugd[2305]: Program executed successfully.
ago 04 14:30:33 stryke ifplugd[2305]: Exiting.
ago 04 14:30:33 stryke systemd[1]: netctl-ifplugd@eth0.service: Succeeded.
ago 04 14:30:33 stryke systemd[1]: Stopped Automatic wired network connection using netctl profiles.
At this point I'm a bit clueless and I don't know how to proceed.
br
$ sudo systemctl status dhcpcd@eth0 -n 30
* dhcpcd@eth0.service - dhcpcd on eth0
Loaded: loaded (/usr/lib/systemd/system/dhcpcd@.service; disabled; vendor preset: disabled)
Active: inactive (dead)
ago 07 12:20:10 stryke dhcpcd[2221]: sending signal TERM to pid 1312
ago 07 12:20:10 stryke dhcpcd[2221]: waiting for pid 1312 to exit
ago 07 12:20:10 stryke dhcpcd[1312]: script_runreason: Resource temporarily unavailable
ago 07 12:20:10 stryke dhcpcd[1312]: ipv6_deleteaddr: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: arp_free: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: eth0: ipv4_deladdr: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: rt_delete: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: rt_delete: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: script_runreason: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: main: control_stop: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: ps_dostop: Broken pipe
ago 07 12:20:10 stryke dhcpcd[2221]: sending signal TERM to pid 1312
ago 07 12:20:10 stryke dhcpcd[2221]: waiting for pid 1312 to exit
ago 07 12:20:10 stryke dhcpcd[1312]: script_runreason: Resource temporarily unavailable
ago 07 12:20:10 stryke dhcpcd[1312]: eth0: deleting address fe80::145e:4e49:22aa:5f8d
ago 07 12:20:10 stryke dhcpcd[1312]: ipv6_deleteaddr: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: arp_free: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: eth0: ipv4_deladdr: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: eth0: deleting route to 192.168.1.0/24
ago 07 12:20:10 stryke dhcpcd[1312]: rt_delete: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: eth0: deleting default route via 192.168.1.254
ago 07 12:20:10 stryke dhcpcd[1312]: rt_delete: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: script_runreason: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: main: control_stop: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: ps_dostop: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: dhcpcd exited
ago 07 12:20:10 stryke dhcpcd[1313]: ps_root_recvmsg: Connection refused
ago 07 12:20:10 stryke dhcpcd[1313]: ps_root_recvmsg: Connection refused
ago 07 12:20:10 stryke systemd[1]: dhcpcd@eth0.service: Succeeded.
ago 07 12:20:10 stryke systemd[1]: Stopped dhcpcd on eth0.
This happened only once. After manually restarting dhcpcd@eth0 and suspending and resuming, it worked again (tried 2 times successfully).
I'm growing tired of this randomness...
I can understand! And these things are hard to debug too. Let's see what the dhcpcd maintainer has to say. He is usually very helpful:
https://roy.marples.name/archives/dhcpcd-discuss/0003142.html
When resuming, the network interface appears gone and since the service is bound to the interface, systemd takes the service down by sending SIGTERM to its process. Oddly enough, it does not appear as if ifplugd responds to the signal, and 90 seconds later, the service is taken down forcefully.
Is the issue fixed when you replace
BindsTo=sys-subsystem-net-devices-%i.device
on line 5 of /usr/lib/systemd/system/netctl-ifplugd@.service by
Requires=sys-subsystem-net-devices-%i.device
Let me know!
I read the reply of Roy Marples and it came to my mind that maybe all the problems stem from dhcpcd exiting with a non-zero exit code due to the broken pipe error? The fail in resuming from suspend might come from the fact that the systemd unit is a failed state due to this non-zero exit value and thus is not restarted automatically on resume. I still don't understand why systemd waits 1:30 for netctl-ifplugd to stop, maybe ifplugd shutdown if screwed by the bad return value of dhcpcd?
ago 11 14:44:20 stryke kernel: Restarting tasks ...
ago 11 14:44:20 stryke kernel: usb 4-1.4: USB disconnect, device number 3
ago 11 14:44:20 stryke kernel: mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_component_ops [i915])
ago 11 14:44:20 stryke kernel: done.
ago 11 14:44:20 stryke kernel: pci_bus 0000:04: Allocating resources
ago 11 14:44:20 stryke kernel: pci_bus 0000:3a: Allocating resources
ago 11 14:44:20 stryke kernel: pci_bus 0000:04: Allocating resources
ago 11 14:44:20 stryke kernel: pci_bus 0000:3a: Allocating resources
ago 11 14:44:20 stryke upowerd[809]: treating change event as add on /sys/devices/pci0000:00/0000:00:14.0/usb1/1-0:1.0
ago 11 14:44:20 stryke systemd-sleep[16063]: System resumed.
ago 11 14:44:20 stryke dhcpcd[3647]: eth0: carrier lost
ago 11 14:44:20 stryke dhcpcd[3656]: ps_bpf_recvbpf: Network is down
ago 11 14:44:20 stryke dhcpcd[3656]: ps_bpf_recvbpf: Network is down
ago 11 14:44:20 stryke systemd[1]: systemd-suspend.service: Succeeded.
ago 11 14:44:20 stryke systemd[1]: Finished Suspend.
For what I understand, for some reason the USB multiport adapter is disconnected at the beginning of the task resume. This causes the loss of eth0 and this generates errors in dchpcd. After this, the dhcpcd service tries to shut down:
ago 11 14:44:20 stryke dhcpcd[16208]: sending signal TERM to pid 3647
ago 11 14:44:20 stryke dhcpcd[16208]: waiting for pid 3647 to exit
ago 11 14:44:20 stryke dhcpcd[3647]: script_runreason: Resource temporarily unavailable
ago 11 14:44:20 stryke dhcpcd[3647]: arp_free: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: eth0: ipv4_deladdr: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: rt_delete: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: rt_delete: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: script_runreason: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: main: control_stop: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: ps_dostop: Broken pipe
ago 11 14:44:20 stryke dhcpcd[16208]: sending signal TERM to pid 3647
ago 11 14:44:20 stryke dhcpcd[16208]: waiting for pid 3647 to exit
ago 11 14:44:20 stryke dhcpcd[3647]: script_runreason: Resource temporarily unavailable
ago 11 14:44:20 stryke dhcpcd[3647]: arp_free: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: eth0: ipv4_deladdr: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: eth0: deleting route to 192.168.1.0/24
ago 11 14:44:20 stryke dhcpcd[3647]: rt_delete: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: eth0: deleting default route via 192.168.1.254
ago 11 14:44:20 stryke dhcpcd[3647]: rt_delete: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: script_runreason: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: main: control_stop: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: ps_dostop: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: dhcpcd exited
ago 11 14:44:20 stryke dhcpcd[3648]: ps_root_recvmsg: Connection refused
ago 11 14:44:20 stryke kernel: audit: type=1130 audit(1597149860.150:163): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=intel-undervolt comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
ago 11 14:44:20 stryke kernel: audit: type=1131 audit(1597149860.150:164): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=intel-undervolt comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
ago 11 14:44:20 stryke dhcpcd[3648]: ps_root_recvmsg: Connection refused
ago 11 14:44:20 stryke systemd[1]: dhcpcd@eth0.service: Succeeded.
ago 11 14:44:20 stryke systemd[1]: Stopped dhcpcd on eth0.
but it cannot do it correctly (maybe because it tries to shut down eth0, which is no longer available?) and so the service enters a failed state. After this, the USB multiport is brought up again:
ago 11 14:44:20 stryke kernel: usb 4-1.4: New USB device found, idVendor=0bda, idProduct=8153, bcdDevice=30.00
ago 11 14:44:20 stryke kernel: usb 4-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
ago 11 14:44:20 stryke kernel: usb 4-1.4: Product: USB 10/100/1000 LAN
ago 11 14:44:20 stryke kernel: usb 4-1.4: Manufacturer: Realtek
ago 11 14:44:20 stryke kernel: usb 4-1.4: SerialNumber: 000001
ago 11 14:44:20 stryke kernel: usb 4-1.4: reset SuperSpeed Gen 1 USB device number 4 using xhci_hcd
ago 11 14:44:20 stryke kernel: r8152 4-1.4:1.0: load rtl8153a-4 v2 02/07/20 successfully
ago 11 14:44:20 stryke kernel: r8152 4-1.4:1.0 eth0: v1.11.11
ago 11 14:44:20 stryke systemd-udevd[16170]: Using default interface naming scheme 'v245'.
ago 11 14:44:20 stryke systemd-udevd[16170]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
To verify this hypothesis, i.e. dhcpcd fails if eth0 suddenly disappears, I did a simple check by unplugging the USB multiport while dhcpcd@eth0 was up and running, and this i sthe result:
ago 11 14:59:15 stryke dhcpcd[17201]: eth0: carrier lost
ago 11 14:59:15 stryke dhcpcd[17222]: ps_bpf_recvbpf: Network is down
ago 11 14:59:15 stryke dhcpcd[17222]: ps_bpf_recvbpf: Network is down
ago 11 14:59:15 stryke systemd[1]: Stopping dhcpcd on eth0...
ago 11 14:59:15 stryke dhcpcd[18591]: sending signal TERM to pid 17201
ago 11 14:59:15 stryke dhcpcd[18591]: waiting for pid 17201 to exit
ago 11 14:59:15 stryke dhcpcd[17201]: script_runreason: Resource temporarily unavailable
ago 11 14:59:15 stryke dhcpcd[17201]: ipv6_deleteaddr: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: arp_free: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: eth0: ipv4_deladdr: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: rt_delete: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: rt_delete: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: script_runreason: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: main: control_stop: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: ps_dostop: Broken pipe
ago 11 14:59:15 stryke dhcpcd[18591]: sending signal TERM to pid 17201
ago 11 14:59:15 stryke dhcpcd[18591]: waiting for pid 17201 to exit
ago 11 14:59:15 stryke dhcpcd[17201]: script_runreason: Resource temporarily unavailable
ago 11 14:59:15 stryke dhcpcd[17201]: eth0: deleting address fe80::145e:4e49:22aa:5f8d
ago 11 14:59:15 stryke dhcpcd[17201]: ipv6_deleteaddr: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: arp_free: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: eth0: ipv4_deladdr: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: eth0: deleting route to 192.168.1.0/24
ago 11 14:59:15 stryke dhcpcd[17201]: rt_delete: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: eth0: deleting default route via 192.168.1.254
ago 11 14:59:15 stryke dhcpcd[17201]: rt_delete: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: script_runreason: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: main: control_stop: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: ps_dostop: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: dhcpcd exited
ago 11 14:59:15 stryke dhcpcd[17202]: ps_root_recvmsg: Connection refused
ago 11 14:59:15 stryke dhcpcd[17202]: ps_root_recvmsg: Connection refused
ago 11 14:59:15 stryke systemd[1]: dhcpcd@eth0.service: Succeeded.
ago 11 14:59:15 stryke systemd[1]: Stopped dhcpcd on eth0.
This log seems quite similar to those after resuming from suspend, so I'd say that they are in favor of my hypothesis.
Jouke, what do you think? Does this make sense to you?
I think is some kind of race condition with ifplugd, netctl and dhcpcd.
@Nicola & Alejandro: just to be sure, you have no additional dhcpcd services active, do you?
Slowly, we're getting to the bottom of this.
Disable all netctl-auto and netctl-ifplugd services. I mean disable from autostart.
Create a file on /usr/lib/dhcpcd/dhcpcd-hooks/99-local or something, with the following:
## Sample dhcpcd hook script
wired=enp0s31f6
wireless=wlp3s0
if [ "${interface}" = $wired ]; then
case "${reason}" in NOCARRIER|BOUND)
if $if_up; then
systemctl stop netctl-auto@$wireless
elif $if_down; then
systemctl start netctl-auto@$wireless
fi
;;
esac
fi
Change wired and wireless variables for your ethernet device and wifi.
next, start the dhcpcd@eth service and try plugging and unplugging just like the behavior with ifplugd
if the outcomes fills your needs, enable the dhcpcd@eth at boot.
Hope this can help.
Alejandro,
@Nicola: At least once, we saw systemd kill dhcpcd in your logs:
> ago 04 14:20:42 stryke systemd[1]: netctl-ifplugd@eth0.service: Killing process 678 (dhcpcd) with signal SIGKILL.
In that same log, we see that dhcpcd is active at two pids: 678 and 679, and one of them dies before the other is terminated (but doesn't respond).
Are you able to patch and recompile dhcpcd to do some more testing? If so, let me know what the effect of this patch is:
https://roy.marples.name/archives/dhcpcd-discuss/0003151.html
Thanks a lot!
To autoconfigure proxy settings in my corporate network (ethernet) vs. no proxy in WiFi mode I am using hooks.
I have enabled netctl-auto@wlp8s0.service and netctl-ifplugd@enp2s0.servide
This setup has been working flawlessly for some years, but since several weeks switching between ethernet and wifi does not work and rebooting/ shutting down takes ages because of a hanging process.
Specifically, I switch off wifi when connecting to ethernet and change my Privoxy (transparent anti-tracking/ads proxy) configuration:
/etc/netctl/interfaces/enp2s0:
#!/bin/sh
ExecUpPost="systemctl stop netctl-auto@wlp8s0.service; cp /etc/privoxy/config.employer /etc/privoxy/config"
ExecDownPre="systemctl start netctl-auto@wlp8s0.service; cp /etc/privoxy/config.direct /etc/privoxy/config"
/etc/netctl/interfaces/wlp8s0
#!/bin/sh
ExecUpPost="cp /etc/privoxy/config.direct /etc/privoxy/config"
Now, with:
/etc/netctl/hooks/debug
#!/bin/sh
NETCTL_DEBUG=yes
I looked at the journal log when plugging ethernet in and out:
the interfaces hooks are not executed and the system hangs at shutdown
Step 1: Downgraded netctl to netctl-1.22-1
Same issues, interface-hooks not executed, system hangs at shutdown
Step 2 (additional to Step 1): Downgraded dhcpcd tp 9.0.2-1
When pluggin in cable:
░░ Subject: A stop job for unit netctl-auto@wlp8s0.service has begun execution
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A stop job for unit netctl-auto@wlp8s0.service has begun execution.
░░
░░ The job identifier is 2097.
And later:
Journal log:
Aug 18 17:11:43 ray systemd[1]: netctl-auto@wlp8s0.service: State 'stop-sigterm' timed out. Killing.
Aug 18 17:11:43 ray systemd[1]: netctl-auto@wlp8s0.service: Killing process 1931 (dhcpcd) with signal SIGKILL.
Aug 18 17:11:43 ray systemd[1]: netctl-auto@wlp8s0.service: Failed with result 'timeout'.
During the waiting time for killing dhcpcd:
ps -A | grep dhc
1010 ? 00:00:00 dhcpcd
1011 ? 00:00:00 dhcpcd
1079 ? 00:00:00 dhcpcd
2525 ? 00:00:00 dhcpcd
2526 ? 00:00:00 dhcpcd <defunct>
2600 ? 00:00:00 dhcpcd
After Killing process 1931 (dhcpcd) with signal SIGKILL.
ps -A | grep dhc
1010 ? 00:00:00 dhcpcd
1011 ? 00:00:00 dhcpcd
1079 ? 00:00:00 dhcpcd
2600 ? 00:00:00 dhcpcd
Step 3: downgrading package dhcpcd (9.0.2-1 => 8.1.7-1)
Hooks are correctly executed
ps -A | grep dhc
2338 ? 00:00:00 dhcpcd
Everything is fine
Step 4: Upgrading to netctl-1.23-1 again
Hook is executed once - a not anymore later
ps -A | grep dhc
1308 ? 00:00:00 dhcpcd
1961 ? 00:00:00 dhcpcd
And in Journal log:
Aug 18 17:23:54 ray dhcpcd[2003]: dhcpcd already running on pid 1308 (/run/dhcpcd-enp2s0-4.pid)
So here, I need to downgrade both netctl (to 1.22-1) AND dhcpcd (8.1.7-1) in order to make everything work
I hope that helps
1) dhcpcd@eth0 still fails when the eth0 interface is removed by removing the USB adapter or by putting the machine to sleep as below:
$ sudo systemctl status dhcpcd@eth0 -n 20
* dhcpcd@eth0.service - dhcpcd on eth0
Loaded: loaded (/usr/lib/systemd/system/dhcpcd@.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2020-08-23 16:44:41 CEST; 26s ago
Process: 1623 ExecStart=/usr/bin/dhcpcd -q -w eth0 (code=exited, status=0/SUCCESS)
Process: 1845 ExecStop=/usr/bin/dhcpcd -x eth0 (code=exited, status=0/SUCCESS)
Main PID: 1625 (code=exited, status=1/FAILURE)
Aug 23 16:44:15 stryke dhcpcd[1625]: eth0: adding route to 192.168.0.0/24
Aug 23 16:44:15 stryke dhcpcd[1625]: eth0: adding default route via 192.168.0.1
Aug 23 16:44:16 stryke systemd[1]: Started dhcpcd on eth0.
Aug 23 16:44:23 stryke dhcpcd[1625]: eth0: no IPv6 Routers available
Aug 23 16:44:40 stryke dhcpcd[1625]: eth0: carrier lost
Aug 23 16:44:40 stryke dhcpcd[1651]: ps_bpf_recvbpf: Network is down
Aug 23 16:44:40 stryke dhcpcd[1651]: ps_bpf_recvbpf: Network is down
Aug 23 16:44:40 stryke systemd[1]: Stopping dhcpcd on eth0...
Aug 23 16:44:40 stryke dhcpcd[1845]: sending signal TERM to pid 1625
Aug 23 16:44:40 stryke dhcpcd[1845]: waiting for pid 1625 to exit
Aug 23 16:44:40 stryke dhcpcd[1845]: sending signal TERM to pid 1625
Aug 23 16:44:40 stryke dhcpcd[1845]: waiting for pid 1625 to exit
Aug 23 16:44:40 stryke dhcpcd[1625]: eth0: deleting address fe80::145e:4e49:22aa:5f8d
Aug 23 16:44:40 stryke dhcpcd[1625]: eth0: deleting route to 192.168.0.0/24
Aug 23 16:44:40 stryke dhcpcd[1625]: eth0: deleting default route via 192.168.0.1
Aug 23 16:44:40 stryke dhcpcd[1625]: eth0: removing interface
Aug 23 16:44:40 stryke dhcpcd[1625]: dhcpcd exited
Aug 23 16:44:40 stryke systemd[1]: dhcpcd@eth0.service: Main process exited, code=exited, status=1/FAILURE
Aug 23 16:44:41 stryke systemd[1]: dhcpcd@eth0.service: Failed with result 'exit-code'.
Aug 23 16:44:41 stryke systemd[1]: Stopped dhcpcd on eth0.
2) netctl-ifplugd@eth0 seemed to work at a first try: I have been able to resume from sleep twice successfully, but then I hade the usual 1m30s timeout on shutdown and al swhen manually stopping the service. Then I experienced the usual failures in resuming from suspend; below the service seems to be correctly restarted after resimng frm suspend, but I had no network connectivity and dhcpcd was not running:
$ sudo systemctl status netctl-ifplugd@eth0 -n 15
* netctl-ifplugd@eth0.service - Automatic wired network connection using netctl profiles
Loaded: loaded (/usr/lib/systemd/system/netctl-ifplugd@.service; disabled; vendor preset: disabled)
Active: active (running) since Sun 2020-08-23 16:58:29 CEST; 9min ago
Docs: man:netctl.special(7)
Main PID: 1622 (ifplugd)
Status: "online"
Tasks: 1 (limit: 18770)
Memory: 4.0M
CGroup: /system.slice/system-netctl\x2difplugd.slice/netctl-ifplugd@eth0.service
`-1622 /usr/bin/ifplugd -i eth0 -r /etc/ifplugd/netctl.action -bfIns
Aug 23 16:58:30 stryke dhcpcd[1670]: eth0: rebinding lease of 192.168.0.8
Aug 23 16:58:30 stryke dhcpcd[1670]: eth0: leased 192.168.0.8 for 86400 seconds
Aug 23 16:58:30 stryke dhcpcd[1670]: eth0: adding route to 192.168.0.0/24
Aug 23 16:58:30 stryke dhcpcd[1670]: eth0: adding default route via 192.168.0.1
Aug 23 16:58:30 stryke ifplugd[1622]: client: Started network profile 'ethernet'
Aug 23 17:07:32 stryke dhcpcd[1670]: eth0: carrier lost
Aug 23 17:07:32 stryke ifplugd[1622]: client: ps_bpf_recvbpf: Network is down
Aug 23 17:07:32 stryke dhcpcd[1676]: ps_bpf_recvbpf: Network is down
Aug 23 17:07:32 stryke dhcpcd[1670]: eth0: deleting route to 192.168.0.0/24
Aug 23 17:07:32 stryke dhcpcd[1670]: eth0: deleting default route via 192.168.0.1
Aug 23 17:07:32 stryke dhcpcd[1670]: eth0: removing interface
Aug 23 17:07:32 stryke dhcpcd[1670]: dhcpcd exited
Aug 23 17:07:32 stryke ifplugd[1622]: Program executed successfully.
Aug 23 17:07:33 stryke ifplugd[1622]: Link beat lost.
Aug 23 17:07:35 stryke ifplugd[1622]: Link beat detected.
All in all, the usual random behavior... My feeling is still that the root of all this problems is the bad behavior of dhcpcd when the interface is removed.
2307:Aug 27 12:17:52 stryke kernel: usb usb3: root hub lost power or was reset
2308:Aug 27 12:17:52 stryke kernel: usb usb4: root hub lost power or was reset
2310:Aug 27 12:17:52 stryke kernel: usb 4-1: Disable of device-initiated U1 failed.
2311:Aug 27 12:17:52 stryke kernel: usb 4-1: Disable of device-initiated U2 failed.
2313:Aug 27 12:17:52 stryke kernel: usb 1-12: reset high-speed USB device number 6 using xhci_hcd
2314:Aug 27 12:17:52 stryke kernel: usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
2333:Aug 27 12:17:52 stryke upowerd[1841]: treating change event as add on /sys/devices/pci0000:00/0000:00:14.0/usb1/1-0:1.0
2334:Aug 27 12:17:52 stryke kernel: usb 4-1.4: USB disconnect, device number 3
2463:Aug 27 12:17:53 stryke kernel: usb 4-1.4: new SuperSpeed Gen 1 USB device number 4 using xhci_hcd
2464:Aug 27 12:17:53 stryke kernel: usb 4-1.4: New USB device found, idVendor=0bda, idProduct=8153, bcdDevice=30.00
2465:Aug 27 12:17:53 stryke kernel: usb 4-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
2466:Aug 27 12:17:53 stryke kernel: usb 4-1.4: Product: USB 10/100/1000 LAN
2467:Aug 27 12:17:53 stryke kernel: usb 4-1.4: Manufacturer: Realtek
2468:Aug 27 12:17:53 stryke kernel: usb 4-1.4: SerialNumber: 000001
2469:Aug 27 12:17:53 stryke kernel: usb 4-1.4: reset SuperSpeed Gen 1 USB device number 4 using xhci_hcd
You can see that the USB adapter providing the eth0 interface gets disconnected (by upowerd?) and then reconnected. In the middle, the dhcpcd messages are:
2306:Aug 27 12:17:52 stryke dhcpcd[5755]: eth0: carrier lost
2307:Aug 27 12:17:52 stryke dhcpcd[5783]: ps_bpf_recvbpf: Network is down
2308:Aug 27 12:17:52 stryke dhcpcd[5783]: ps_bpf_recvbpf: Network is down
2361:Aug 27 12:17:52 stryke systemd[1]: Stopping dhcpcd on eth0...
2362:-- Subject: A stop job for unit dhcpcd@eth0.service has begun execution
2366:-- A stop job for unit dhcpcd@eth0.service has begun execution.
2369:Aug 27 12:17:52 stryke dhcpcd[36849]: sending signal TERM to pid 5755
2370:Aug 27 12:17:52 stryke dhcpcd[36849]: waiting for pid 5755 to exit
2371:Aug 27 12:17:52 stryke dhcpcd[36849]: sending signal TERM to pid 5755
2372:Aug 27 12:17:52 stryke dhcpcd[36849]: waiting for pid 5755 to exit
2396:Aug 27 12:17:52 stryke dhcpcd[5755]: eth0: deleting address fe80::145e:4e49:22aa:5f8d
2397:Aug 27 12:17:52 stryke dhcpcd[5755]: eth0: deleting route to 192.168.0.0/24
2398:Aug 27 12:17:52 stryke dhcpcd[5755]: eth0: deleting default route via 192.168.0.1
2399:Aug 27 12:17:52 stryke dhcpcd[5755]: eth0: removing interface
2400:Aug 27 12:17:52 stryke dhcpcd[5755]: dhcpcd exited
2401:Aug 27 12:17:52 stryke systemd[1]: dhcpcd@eth0.service: Main process exited, code=exited, status=1/FAILURE
2406:-- An ExecStart= process belonging to unit dhcpcd@eth0.service has exited.
2409:Aug 27 12:17:52 stryke systemd[1]: dhcpcd@eth0.service: Failed with result 'exit-code'.
2414:-- The unit dhcpcd@eth0.service has entered the 'failed' state with result 'exit-code'.
2415:Aug 27 12:17:52 stryke systemd[1]: Stopped dhcpcd on eth0.
2416:-- Subject: A stop job for unit dhcpcd@eth0.service has finished
2420:-- A stop job for unit dhcpcd@eth0.service has finished.
2423:Aug 27 12:17:52 stryke audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dhcpcd@eth0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
2424:Aug 27 12:17:52 stryke kernel: audit: type=1131 audit(1598523472.969:108): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dhcpcd@eth0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
So once again it seems that the sudden disappearance of the eth0 interface makes dhcpcd exit with a failure code and the dhcpcd@eth0 unit enters a failed state.
This is the same part of the log for the second, successful trial:
2793:Aug 27 14:34:05 stryke kernel: usb usb3: root hub lost power or was reset
2794:Aug 27 14:34:05 stryke kernel: usb usb4: root hub lost power or was reset
2796:Aug 27 14:34:05 stryke kernel: usb 4-1: Disable of device-initiated U1 failed.
2797:Aug 27 14:34:05 stryke kernel: usb 4-1: Disable of device-initiated U2 failed.
2799:Aug 27 14:34:05 stryke kernel: usb 1-12: reset high-speed USB device number 6 using xhci_hcd
2800:Aug 27 14:34:05 stryke kernel: usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
2801:Aug 27 14:34:05 stryke kernel: usb 1-5.3.1: reset high-speed USB device number 7 using xhci_hcd
2802:Aug 27 14:34:05 stryke kernel: usb 4-1.4: reset SuperSpeed Gen 1 USB device number 4 using xhci_hcd
2776:Aug 27 14:34:05 stryke dhcpcd[37091]: eth0: carrier lost
2778:Aug 27 14:34:05 stryke dhcpcd[37091]: eth0: deleting address fe80::145e:4e49:22aa:5f8d
2779:Aug 27 14:34:05 stryke dhcpcd[37091]: eth0: deleting route to 192.168.0.0/24
2780:Aug 27 14:34:05 stryke dhcpcd[37091]: eth0: deleting default route via 192.168.0.1
2872:Aug 27 14:34:07 stryke dhcpcd[37091]: eth0: carrier acquired
2874:Aug 27 14:34:07 stryke dhcpcd[37091]: eth0: IAID 4c:a2:02:4e
2875:Aug 27 14:34:07 stryke dhcpcd[37091]: eth0: adding address fe80::145e:4e49:22aa:5f8d
2876:Aug 27 14:34:07 stryke dhcpcd[37091]: eth0: soliciting an IPv6 router
2877:Aug 27 14:34:09 stryke dhcpcd[37091]: eth0: rebinding lease of 192.168.0.2
2878:Aug 27 14:34:09 stryke dhcpcd[37091]: eth0: probing address 192.168.0.2/24
2891:Aug 27 14:34:14 stryke dhcpcd[37091]: eth0: leased 192.168.0.2 for 86400 seconds
2892:Aug 27 14:34:14 stryke dhcpcd[37091]: eth0: adding route to 192.168.0.0/24
2893:Aug 27 14:34:14 stryke dhcpcd[37091]: eth0: adding default route via 192.168.0.1
2896:Aug 27 14:34:20 stryke dhcpcd[37091]: eth0: no IPv6 Routers available
Now there's no disconnect/reconnect event for the USB adapter, and dhcpcd simply looses and re-acquires the carrier, thus resuming correctly.
$ sudo dhcpcd -dB eth0
dhcpcd-9.1.4 starting
spawned privileged actioneer on PID 46365
spawned network proxy on PID 46366
spawned controller proxy on PID 46367
DUID 00:04:4c:4c:45:44:00:50:4b:10:80:4d:c4:c0:4f:4a:52:32
chrooting to `/var/lib/dhcpcd' as dhcpcd
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' PREINIT
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' NOCARRIER
eth0: waiting for carrier
eth0: carrier acquired
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' CARRIER
eth0: IAID 4c:a2:02:4e
eth0: adding address fe80::145e:4e49:22aa:5f8d
eth0: pltime infinity, vltime infinity
eth0: delaying IPv6 router solicitation for 0.1 seconds
eth0: delaying IPv4 for 0.4 seconds
eth0: carrier lost
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' NOCARRIER
eth0: deleting address fe80::145e:4e49:22aa:5f8d
eth0: carrier acquired
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' CARRIER
eth0: IAID 4c:a2:02:4e
eth0: adding address fe80::145e:4e49:22aa:5f8d
eth0: pltime infinity, vltime infinity
eth0: delaying IPv6 router solicitation for 0.0 seconds
eth0: delaying IPv4 for 0.0 seconds
eth0: reading lease `/var/lib/dhcpcd/eth0.lease'
eth0: rebinding lease of 192.168.0.2
eth0: sending REQUEST (xid 0x7e44c107), next in 3.2 seconds
eth0: soliciting an IPv6 router
eth0: delaying Router Solicitation for LL address
eth0: acknowledged 192.168.0.2 from 192.168.0.1
eth0: probing address 192.168.0.2/24
eth0: probing for 192.168.0.2
eth0: ARP probing 192.168.0.2 (1 of 3), next in 1.5 seconds
eth0: sending Router Solicitation
spawned listener fe80::145e:4e49:22aa:5f8d on PID 46392
eth0: ARP probing 192.168.0.2 (2 of 3), next in 1.7 seconds
eth0: ARP probing 192.168.0.2 (3 of 3), next in 2.0 seconds
eth0: DAD completed for 192.168.0.2
eth0: leased 192.168.0.2 for 86400 seconds
eth0: renew in 43200 seconds, rebind in 75600 seconds
eth0: writing lease `/var/lib/dhcpcd/eth0.lease'
eth0: adding IP address 192.168.0.2/24 broadcast 192.168.0.255
eth0: adding route to 192.168.0.0/24
eth0: adding default route via 192.168.0.1
eth0: ARP announcing 192.168.0.2 (1 of 2), next in 2.0 seconds
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' BOUND
spawned listener 192.168.0.2 on PID 46450
eth0: sending Router Solicitation
eth0: ARP announcing 192.168.0.2 (2 of 2)
eth0: sending Router Solicitation
eth0: sending Router Solicitation
eth0: no IPv6 Routers available
<HERE I DETACH THE USB ADAPTER>
eth0: carrier lost
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' NOCARRIER
ps_bpf_recvbpf: Network is down
eth0: deleting address fe80::145e:4e49:22aa:5f8d
eth0: deleting IP address 192.168.0.2/24
eth0: deleting route to 192.168.0.0/24
eth0: deleting default route via 192.168.0.1
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' EXPIRE
eth0: interface departed
eth0: removing interface
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' DEPARTED
dhcpcd exited
$ echo $?
1
The differences I see with the tap0 case are:
- when executing the NOCARRIER hook the message "ps_bpf_recvbpf: Network is down" appears. This is seen as an error in systemd logs, i.e. it's usually printed in red;
- the exit code is not 0, so it's seen as an error (I guess that with tap0 it exits with 0).
I don't know if this can be useful at all. If it helps I can send this directly to Roy Marple's mailing list. Thanks.
Anyway, part of what you are seeing can maybe be explained by the
BindsTo=sys-subsystem-net-devices-%i.device
statement in the service file of both dhcpcd@ and netctl-ifplugd@. Due to this statement, systemd terminates a service when the interface disappears. However, it will not restart a service when the interface reappears and I do not think there is a property to achieve this. That's why I suggested using Requires= instead, as that will only have an effect on the initial startup of the service. What I don't understand, though, is that after this line
Aug 23 17:07:35 stryke ifplugd[1622]: Link beat detected.
There is no sign of ifplugd running the netctl script... That could be something that needs to be fixed in netctl.
While I don't fully understand what is going on, the recent dhcpcd patch does seem to improve things, right?
About the "Link beat detected" after quitting netctl-ifplugd, it was with the unpatched dhcpcd and actually there was a leftover dhcpcd instance running, if I correctly remember.
The problem is visible after starting netctl-ifplugd@INTERFACE as "[netctl.action] <defunct>" in the output of `ps`.
Many thanks to Roy Marples, the author of dhcpcd.
However, resume from suspend is still broken on my system, and I believe it's still due to the fact that my eth0 is on an external USB adapter that gets disconnected on suspend, and on resume it's still not available when dhcpcd is resumed so dhcpcd exits badly:
$ sudo systemctl status netctl-ifplugd@eth0
* netctl-ifplugd@eth0.service - Automatic wired network connection using netctl profiles
Loaded: loaded (/usr/lib/systemd/system/netctl-ifplugd@.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2020-09-08 09:31:24 CEST; 53s ago
Docs: man:netctl.special(7)
Main PID: 10469 (ifplugd)
Status: "online"
Tasks: 1 (limit: 18770)
Memory: 4.9M
CGroup: /system.slice/system-netctl\x2difplugd.slice/netctl-ifplugd@eth0.service
`-10469 /usr/bin/ifplugd -i eth0 -r /etc/ifplugd/netctl.action -bfIns
set 08 09:31:33 stryke ifplugd[10469]: client: Started network profile 'ethernet'
set 08 09:31:33 stryke ifplugd[10469]: Program executed successfully.
set 08 09:32:03 stryke dhcpcd[10528]: eth0: carrier lost <-------------------- Resume starts here
set 08 09:32:03 stryke dhcpcd[10534]: ps_bpf_recvbpf: Network is down
set 08 09:32:03 stryke dhcpcd[10528]: eth0: deleting route to 192.168.1.0/24
set 08 09:32:03 stryke dhcpcd[10528]: eth0: deleting default route via 192.168.1.254
set 08 09:32:03 stryke dhcpcd[10528]: eth0: removing interface
set 08 09:32:03 stryke dhcpcd[10528]: dhcpcd exited
set 08 09:32:04 stryke ifplugd[10469]: Link beat lost.
set 08 09:32:08 stryke ifplugd[10469]: Link beat detected.
The unit itself is still active, and ifplugd still seems to work: it detects the link beat when the eth0 device is added again bit at that point dhcpcd is dead so no dhcp lease is re-acquired. This is just my hypothesis based on the logs, but I might be wrong since I'm not an expert at all. However, if I'm correct then maybe the issue could be fixed by ifplugd spawning a new dhcpcd instance if the old one is no longer alive?
Alejandro,
The fact that dhcpcd is not running once the interface is added again is expected. However, it should have been shut down because ifplugd told it to. I guess the reason ifplugd does not forward the loss of a link to netctl is because of its default down delay of 5 seconds. I see two possible mitigations.
1) Lower/disable the down delay of ifplugd (e.g. `ifplugd -d 0`).
2) Disable dhcpcd's internal link management.
The second option you could test by adding
DhcpcdOptions="--nolink"
to your profile. Does it work then?
Another solution that is somewhat appealing to me would be to add this configurable down delay also to dhcpcd.
set 23 14:37:02 stryke systemd[1]: Started Automatic wired network connection using netctl profiles.
set 23 14:37:02 stryke ifplugd[61760]: ifplugd 0.28 initializing.
set 23 14:37:02 stryke ifplugd[61760]: Using interface eth0/00:E0:4C:A2:02:4E with driver <r8152> (version: v1.11.11)
set 23 14:37:02 stryke ifplugd[61760]: Using detection mode: SIOCETHTOOL
set 23 14:37:02 stryke ifplugd[61760]: Initialization complete, link beat not detected.
set 23 14:37:03 stryke ifplugd[61760]: Link beat detected.
set 23 14:37:04 stryke ifplugd[61760]: Executing '/etc/ifplugd/netctl.action eth0 up'.
set 23 14:37:04 stryke ifplugd[61760]: client: Starting network profile 'ethernet'...
set 23 14:37:04 stryke dhcpcd[61837]: dhcpcd-9.2.0 starting
set 23 14:37:04 stryke dhcpcd[61839]: DUID 00:04:4c:4c:45:44:00:50:4b:10:80:4d:c4:c0:4f:4a:52:32
set 23 14:37:04 stryke dhcpcd[61839]: eth0: IAID 4c:a2:02:4e
set 23 14:37:05 stryke dhcpcd[61839]: eth0: rebinding lease of 192.168.1.162
set 23 14:37:05 stryke dhcpcd[61839]: eth0: probing address 192.168.1.162/24
set 23 14:37:10 stryke dhcpcd[61839]: eth0: leased 192.168.1.162 for 86400 seconds
set 23 14:37:10 stryke dhcpcd[61839]: eth0: adding route to 192.168.1.0/24
set 23 14:37:10 stryke dhcpcd[61839]: eth0: adding default route via 192.168.1.254
set 23 14:37:10 stryke ifplugd[61760]: client: Started network profile 'ethernet'
set 23 14:37:10 stryke ifplugd[61760]: Program executed successfully.
set 23 14:38:02 stryke ifplugd[61760]: Link beat lost. <------- 1st resume attempt
set 23 14:38:05 stryke ifplugd[61760]: Link beat detected.
set 23 14:41:09 stryke ifplugd[61760]: Link beat lost. <------- 2nd resume attempt
set 23 14:41:12 stryke ifplugd[61760]: Link beat detected.
I'll monitor this in the next days given the random behavior sees in the past.
There are still strange errors when unplugging the USB-C adapter with the network port:
* netctl-ifplugd@eth0.service - Automatic wired network connection using netctl profiles
Loaded: loaded (/usr/lib/systemd/system/netctl-ifplugd@.service; disabled; vendor preset: disabled)
Active: active (running) since Wed 2020-09-23 14:37:02 CEST; 6min ago
Docs: man:netctl.special(7)
Main PID: 61760 (ifplugd)
Tasks: 1 (limit: 18770)
Memory: 4.6M
CGroup: /system.slice/system-netctl\x2difplugd.slice/netctl-ifplugd@eth0.service
`-61760 /usr/bin/ifplugd -i eth0 -r /etc/ifplugd/netctl.action -bfIns
set 23 14:43:23 stryke dhcpcd[61852]: ps_bpf_recvbpf: Network is down <------ Adapter unplugged
set 23 14:43:23 stryke dhcpcd[61839]: eth0: pid 0 deleted IP address 192.168.1.162/24
set 23 14:43:23 stryke dhcpcd[61839]: eth0: deleting route to 192.168.1.0/24
set 23 14:43:23 stryke dhcpcd[61839]: eth0: deleting default route via 192.168.1.254
set 23 14:43:23 stryke dhcpcd[61839]: eth0: rebinding lease of 192.168.1.162
set 23 14:43:23 stryke dhcpcd[61839]: eth0: if_getmtu: No such device <------- Error?
set 23 14:43:23 stryke dhcpcd[61839]: eth0: removing interface
set 23 14:43:23 stryke dhcpcd[61839]: dhcpcd exited
set 23 14:43:24 stryke ifplugd[61760]: Link beat lost.
set 23 14:43:30 stryke ifplugd[61760]: Executing '/etc/ifplugd/netctl.action eth0 down'.
set 23 14:43:30 stryke ifplugd[61760]: client: Stopping network profile 'ethernet'...
set 23 14:43:31 stryke ifplugd[61760]: client: /usr/lib/netctl/interface: line 46: /sys/class/net/eth0/flags: No such file or directory <------- ??
set 23 14:43:31 stryke ifplugd[61760]: client: Stopped network profile 'ethernet'
set 23 14:43:31 stryke ifplugd[61760]: Program executed successfully.
However, this is not a big issue since replugging the adapter a link beat is detected so dhcp is restarted correctly.
set 24 14:29:38 stryke dhcpcd[893]: ps_bpf_recvbpf: Network is down
set 24 14:29:38 stryke dhcpcd[888]: eth0: pid 0 deleted IP address 192.168.1.162/24
set 24 14:29:38 stryke dhcpcd[888]: eth0: deleting route to 192.168.1.0/24
set 24 14:29:38 stryke dhcpcd[888]: eth0: deleting default route via 192.168.1.254
set 24 14:29:38 stryke dhcpcd[888]: eth0: rebinding lease of 192.168.1.162
set 24 14:29:38 stryke dhcpcd[888]: eth0: if_getmtu: No such device
set 24 14:29:38 stryke dhcpcd[888]: eth0: removing interface
set 24 14:29:38 stryke ifplugd[467]: Link beat lost.
set 24 14:29:38 stryke dhcpcd[888]: dhcpcd exited
set 24 14:29:43 stryke ifplugd[467]: Link beat detected.
So I'd say that adding the `DhcpcdOptions="--nolink"` line to the profile does not fix the problem. I'm going to try option 1, i.e. `ifplugd -d 0` by modifying the systemd unit file (@Jouke is this the right way to do that?).
* netctl-ifplugd@eth0.service - Automatic wired network connection using netctl profiles
Loaded: loaded (/usr/lib/systemd/system/netctl-ifplugd@.service; disabled; vendor preset: disabled)
Active: active (running) since Fri 2020-09-25 09:58:25 CEST; 3h 54min ago
Docs: man:netctl.special(7)
Main PID: 482 (ifplugd)
Status: "online"
Tasks: 6 (limit: 18770)
Memory: 9.5M
CGroup: /system.slice/system-netctl\x2difplugd.slice/netctl-ifplugd@eth0.service
|- 482 /usr/bin/ifplugd -d 0 -i eth0 -r /etc/ifplugd/netctl.action -bfIns
|-30947 dhcpcd: eth0 [ip4]
|-30948 dhcpcd: [privileged actioneer] eth0 [ip4]
|-30949 dhcpcd: [control proxy] eth0 [ip4]
|-30952 dhcpcd: [BPF ARP] eth0 192.168.1.162
`-31116 dhcpcd: [network proxy] 192.168.1.162
set 25 09:58:34 stryke dhcpcd[701]: eth0: probing address 192.168.1.162/24
set 25 09:58:38 stryke dhcpcd[701]: eth0: leased 192.168.1.162 for 86400 seconds
set 25 09:58:38 stryke dhcpcd[701]: eth0: adding route to 192.168.1.0/24
set 25 09:58:38 stryke dhcpcd[701]: eth0: adding default route via 192.168.1.254
set 25 09:58:38 stryke ifplugd[482]: client: Started network profile 'ethernet'
set 25 09:58:38 stryke ifplugd[482]: Program executed successfully.
set 25 13:52:04 stryke dhcpcd[701]: eth0: carrier lost <---------- Resume begins here
set 25 13:52:04 stryke dhcpcd[712]: ps_bpf_recvbpf: Network is down
set 25 13:52:04 stryke dhcpcd[701]: eth0: deleting route to 192.168.1.0/24
set 25 13:52:04 stryke dhcpcd[701]: eth0: deleting default route via 192.168.1.254
set 25 13:52:04 stryke dhcpcd[701]: eth0: removing interface
set 25 13:52:04 stryke dhcpcd[701]: dhcpcd exited
set 25 13:52:05 stryke ifplugd[482]: Link beat lost.
set 25 13:52:06 stryke ifplugd[482]: Executing '/etc/ifplugd/netctl.action eth0 down'.
set 25 13:52:06 stryke ifplugd[482]: client: Stopping network profile 'ethernet'...
set 25 13:52:06 stryke ifplugd[482]: client: Stopped network profile 'ethernet'
set 25 13:52:06 stryke ifplugd[482]: Program executed successfully.
set 25 13:52:08 stryke ifplugd[482]: Link beat detected.
set 25 13:52:09 stryke ifplugd[482]: Executing '/etc/ifplugd/netctl.action eth0 up'.
set 25 13:52:09 stryke ifplugd[482]: client: Starting network profile 'ethernet'...
set 25 13:52:09 stryke dhcpcd[30945]: dhcpcd-9.2.0 starting
set 25 13:52:09 stryke dhcpcd[30947]: DUID 00:04:4c:4c:45:44:00:50:4b:10:80:4d:c4:c0:4f:4a:52:32
set 25 13:52:09 stryke dhcpcd[30947]: eth0: IAID 4c:a2:02:4e
set 25 13:52:10 stryke dhcpcd[30947]: eth0: rebinding lease of 192.168.1.162
set 25 13:52:13 stryke dhcpcd[30947]: eth0: probing address 192.168.1.162/24
set 25 13:52:18 stryke dhcpcd[30947]: eth0: leased 192.168.1.162 for 86400 seconds
set 25 13:52:18 stryke dhcpcd[30947]: eth0: adding route to 192.168.1.0/24
set 25 13:52:18 stryke dhcpcd[30947]: eth0: adding default route via 192.168.1.254
set 25 13:52:18 stryke ifplugd[482]: client: Started network profile 'ethernet'
set 25 13:52:18 stryke ifplugd[482]: Program executed successfully.
So it seems that on resume dhcpcd is shut down because eth0 is still not available, ifplugd looses the link beat and stops the profile, then when eth0 is re-added it detects the link beat and restarts dhcpcd. So far so good, no failed resume attempts up to now.
I still refrain to affirm that the issue is fixed, I'd like to test for a couple of days more, but for the first time I'm becoming optimistic.
https://git.archlinux.org/netctl.git/commit/?id=65c5f5514dbad36ff978d0f05ba7a9937b923201
https://git.archlinux.org/netctl.git/commit/?id=833c18ae8cbf012bf147e577e99a762f4aad523f