FS#67382 - [netctl] netctl-ifplugd fails to resume after suspend-to-ram

Attached to Project: Arch Linux
Opened by Nicola Mori (snack) - Friday, 24 July 2020, 12:04 GMT
Last edited by Jouke Witteveen (jouke) - Friday, 16 October 2020, 11:51 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Jouke Witteveen (jouke)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description:
On my system, netctl-ifplugd fails to resume after suspend-to-ram. Just after resuming the system (in the log below I resumed it at 13:45 i.e. 1:45 PM) it is found to be in a "deactivating" state:

$ sudo systemctl status netctl-ifplugd@eth0
* netctl-ifplugd@eth0.service - Automatic wired network connection using netctl profiles
Loaded: loaded (/usr/lib/systemd/system/netctl-ifplugd@.service; disabled; vendor preset: disabled)
Active: deactivating (stop-sigterm) since Fri 2020-07-24 13:45:17 CEST; 56s ago
Docs: man:netctl.special(7)
Main PID: 450 (ifplugd)
Status: "online"
Tasks: 1 (limit: 18771)
Memory: 4.7M
CGroup: /system.slice/system-netctl\x2difplugd.slice/netctl-ifplugd@eth0.service
`-450 /usr/bin/ifplugd -i eth0 -r /etc/ifplugd/netctl.action -bfIns

lug 24 13:45:17 stryke dhcpcd[918]: eth0: deleting default route via 192.168.1.254
lug 24 13:45:17 stryke dhcpcd[918]: rt_delete: Broken pipe
lug 24 13:45:17 stryke ifplugd[450]: Program executed successfully.
lug 24 13:45:17 stryke dhcpcd[918]: script_runreason: Broken pipe
lug 24 13:45:17 stryke dhcpcd[918]: main: control_stop: Broken pipe
lug 24 13:45:17 stryke dhcpcd[918]: ps_dostop: Connection refused
lug 24 13:45:17 stryke dhcpcd[918]: ps_dostop: Broken pipe
lug 24 13:45:17 stryke dhcpcd[918]: dhcpcd exited
lug 24 13:45:18 stryke ifplugd[450]: Link beat lost.
lug 24 13:45:21 stryke ifplugd[450]: Link beat detected.

and after a while (~ 1 min) it definitely exits and results to be in a "failed" state:

$ sudo systemctl status netctl-ifplugd@eth0 -n 40
* netctl-ifplugd@eth0.service - Automatic wired network connection using netctl profiles
Loaded: loaded (/usr/lib/systemd/system/netctl-ifplugd@.service; disabled; vendor preset: disabled)
Active: failed (Result: timeout) since Fri 2020-07-24 13:46:47 CEST; 38s ago
Docs: man:netctl.special(7)
Process: 450 ExecStart=/usr/bin/ifplugd -i eth0 -r /etc/ifplugd/netctl.action -bfIns (code=killed, signal=KILL)
Main PID: 450 (code=killed, signal=KILL)
Status: "online"

lug 24 09:43:32 stryke dhcpcd[918]: eth0: probing address 192.168.1.162/24
lug 24 09:43:38 stryke dhcpcd[918]: eth0: leased 192.168.1.162 for 86400 seconds
lug 24 09:43:38 stryke dhcpcd[918]: eth0: adding route to 192.168.1.0/24
lug 24 09:43:38 stryke dhcpcd[918]: eth0: adding default route via 192.168.1.254
lug 24 09:43:38 stryke ifplugd[450]: client: Started network profile 'ethernet'
lug 24 13:45:17 stryke dhcpcd[918]: eth0: carrier lost
lug 24 13:45:17 stryke ifplugd[450]: client: ps_bpf_recvbpf: Network is down
lug 24 13:45:17 stryke dhcpcd[999]: ps_bpf_recvbpf: Network is down
lug 24 13:45:17 stryke ifplugd[450]: Killing child.
lug 24 13:45:17 stryke dhcpcd[918]: script_runreason: Resource temporarily unavailable
lug 24 13:45:17 stryke dhcpcd[918]: arp_free: Broken pipe
lug 24 13:45:17 stryke ifplugd[450]: client: script_runreason: Resource temporarily unavailable
lug 24 13:45:17 stryke ifplugd[450]: client: arp_free: Broken pipe
lug 24 13:45:17 stryke ifplugd[450]: client: eth0: ipv4_deladdr: Broken pipe
lug 24 13:45:17 stryke dhcpcd[918]: eth0: ipv4_deladdr: Broken pipe
lug 24 13:45:17 stryke ifplugd[450]: client: rt_delete: Broken pipe
lug 24 13:45:17 stryke ifplugd[450]: client: rt_delete: Broken pipe
lug 24 13:45:17 stryke ifplugd[450]: client: script_runreason: Broken pipe
lug 24 13:45:17 stryke ifplugd[450]: client: main: control_stop: Broken pipe
lug 24 13:45:17 stryke ifplugd[450]: client: ps_dostop: Connection refused
lug 24 13:45:17 stryke ifplugd[450]: client: ps_dostop: Broken pipe
lug 24 13:45:17 stryke dhcpcd[918]: eth0: deleting route to 192.168.1.0/24
lug 24 13:45:17 stryke systemd[1]: Stopping Automatic wired network connection using netctl profiles...
lug 24 13:45:17 stryke ifplugd[450]: client: ps_root_recvmsg: Connection refused
lug 24 13:45:17 stryke dhcpcd[918]: rt_delete: Broken pipe
lug 24 13:45:17 stryke dhcpcd[918]: eth0: deleting default route via 192.168.1.254
lug 24 13:45:17 stryke dhcpcd[918]: rt_delete: Broken pipe
lug 24 13:45:17 stryke ifplugd[450]: Program executed successfully.
lug 24 13:45:17 stryke dhcpcd[918]: script_runreason: Broken pipe
lug 24 13:45:17 stryke dhcpcd[918]: main: control_stop: Broken pipe
lug 24 13:45:17 stryke dhcpcd[918]: ps_dostop: Connection refused
lug 24 13:45:17 stryke dhcpcd[918]: ps_dostop: Broken pipe
lug 24 13:45:17 stryke dhcpcd[918]: dhcpcd exited
lug 24 13:45:18 stryke ifplugd[450]: Link beat lost.
lug 24 13:45:21 stryke ifplugd[450]: Link beat detected.
lug 24 13:46:47 stryke systemd[1]: netctl-ifplugd@eth0.service: State 'stop-sigterm' timed out. Killing.
lug 24 13:46:47 stryke systemd[1]: netctl-ifplugd@eth0.service: Killing process 450 (ifplugd) with signal SIGKILL.
lug 24 13:46:47 stryke systemd[1]: netctl-ifplugd@eth0.service: Main process exited, code=killed, status=9/KILL
lug 24 13:46:47 stryke systemd[1]: netctl-ifplugd@eth0.service: Failed with result 'timeout'.


At this point I have to manually restart it.


Additional info:
* package version(s)
netctl 1.23-1
ifplugd 0.28-16

* config and/or log files etc.
Profile file attached.


Steps to reproduce:
- start netctl-ifplugd for a wired connection
- suspend to ram
- resume from susped
   ethernet (0.3 KiB)
This task depends upon

Closed by  Jouke Witteveen (jouke)
Friday, 16 October 2020, 11:51 GMT
Reason for closing:  Fixed
Additional comments about closing:  833c18a
Comment by Jouke Witteveen (jouke) - Friday, 24 July 2020, 17:15 GMT
Interesting. I can't really tell from this log whether it is dhcpcd, ifplugd, or systemd that chokes. If it is dhcpcd, than there should be no problem with suspend-to-ram for profiles with a static address. If it is ifplugd that falters and your are only interested in dhcpcd, I believe dhcpcd has carrier detection itself nowadays, so you wouldn't need ifplugd in that case. This can also be put under control of netctl with an ordinary dhcp profile that contains SkipNoCarrier=yes.

If you have any idea what is going wrong here exactly, I'd be interested to hear it.
Comment by Nicola Mori (snack) - Sunday, 26 July 2020, 12:24 GMT
@jouke Thanks for your comments. I use netctl-ifplugd since I like the wired profile to be activated when I plug the cable in. Also, with netctl-ifplugd and my configuration file I can automatically start netctl-auto for a wireless connection when I unplug the cable, and vice-versa. This does not work if I simply start netctl with the ethernet profile (although it correctly detects the cable and acquires/releases the dhcp bind), and it's the main reason why I'd like to stick with netctl-ifplugd.

I can provide more information if needed, in case just ask me.
Comment by Nicola Mori (snack) - Sunday, 26 July 2020, 12:41 GMT
By the way, I did a test by manually stopping netctl-ifplugd:

$ sudo systemctl status netctl-ifplugd@eth0 -n 50
* netctl-ifplugd@eth0.service - Automatic wired network connection using netctl profiles
Loaded: loaded (/usr/lib/systemd/system/netctl-ifplugd@.service; disabled; vendor preset: disabled)
Active: failed (Result: timeout) since Sun 2020-07-26 14:29:25 CEST; 16s ago
Docs: man:netctl.special(7)
Process: 6647 ExecStart=/usr/bin/ifplugd -i eth0 -r /etc/ifplugd/netctl.action -bfIns (code=killed, signal=KILL)
Main PID: 6647 (code=killed, signal=KILL)
Status: "online"

lug 26 14:27:35 stryke systemd[1]: Started Automatic wired network connection using netctl profiles.
lug 26 14:27:35 stryke ifplugd[6647]: Process 5429 died: No such process; trying to remove PID file. (/run/ifplugd.eth0.pid)
lug 26 14:27:35 stryke ifplugd[6647]: ifplugd 0.28 initializing.
lug 26 14:27:35 stryke ifplugd[6647]: Using interface eth0/00:E0:4C:A2:02:4E with driver <r8152> (version: v1.11.11)
lug 26 14:27:35 stryke ifplugd[6647]: Using detection mode: SIOCETHTOOL
lug 26 14:27:35 stryke ifplugd[6647]: Initialization complete, link beat detected.
lug 26 14:27:35 stryke ifplugd[6647]: Executing '/etc/ifplugd/netctl.action eth0 up'.
lug 26 14:27:35 stryke ifplugd[6647]: client: Starting network profile 'ethernet'...
lug 26 14:27:35 stryke dhcpcd[6691]: dhcpcd-9.1.4 starting
lug 26 14:27:35 stryke dhcpcd[6693]: DUID 00:04:4c:4c:45:44:00:50:4b:10:80:4d:c4:c0:4f:4a:52:32
lug 26 14:27:35 stryke dhcpcd[6693]: eth0: IAID 4c:a2:02:4e
lug 26 14:27:37 stryke dhcpcd[6693]: eth0: rebinding lease of 192.168.1.162
lug 26 14:27:37 stryke dhcpcd[6693]: eth0: leased 192.168.1.162 for 86400 seconds
lug 26 14:27:37 stryke dhcpcd[6693]: eth0: adding route to 192.168.1.0/24
lug 26 14:27:37 stryke dhcpcd[6693]: eth0: adding default route via 192.168.1.254
lug 26 14:27:37 stryke ifplugd[6647]: client: Started network profile 'ethernet'
lug 26 14:27:55 stryke ifplugd[6647]: Killing child.
lug 26 14:27:55 stryke ifplugd[6647]: client: arp_free: Broken pipe
lug 26 14:27:55 stryke ifplugd[6647]: client: script_runreason: Broken pipe
lug 26 14:27:55 stryke ifplugd[6647]: client: main: control_stop: Broken pipe
lug 26 14:27:55 stryke ifplugd[6647]: client: ps_dostop: Broken pipe
lug 26 14:27:55 stryke ifplugd[6647]: client: ps_dostop: Broken pipe
lug 26 14:27:55 stryke systemd[1]: Stopping Automatic wired network connection using netctl profiles...
lug 26 14:27:55 stryke ifplugd[6647]: Program executed successfully.
lug 26 14:29:25 stryke systemd[1]: netctl-ifplugd@eth0.service: State 'stop-sigterm' timed out. Killing.
lug 26 14:29:25 stryke systemd[1]: netctl-ifplugd@eth0.service: Killing process 6647 (ifplugd) with signal SIGKILL.
lug 26 14:29:25 stryke systemd[1]: netctl-ifplugd@eth0.service: Main process exited, code=killed, status=9/KILL
lug 26 14:29:25 stryke systemd[1]: netctl-ifplugd@eth0.service: Failed with result 'timeout'.
lug 26 14:29:25 stryke systemd[1]: Stopped Automatic wired network connection using netctl profiles.

It is interesting that after the stop the network is still up and dhcpcd is not running:

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 9c:b6:d0:be:bd:8d brd ff:ff:ff:ff:ff:ff
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:e0:4c:a2:02:4e brd ff:ff:ff:ff:ff:ff
altname enp58s0u1u4
inet 192.168.1.162/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0
valid_lft 85735sec preferred_lft 74935sec
inet6 fe80::2e0:4cff:fea2:24e/64 scope link
valid_lft forever preferred_lft forever

$ sudo killall dhcpcd
dhcpcd: no process found

In this case there is no error from dhcpcd, just some broken pipes from ifplugd. The stopping procedure then gets stuck for about 1m30s until systemd kills netctl-ifplugd. I can reproduce this consistently but I don't have any idea about the reason.
In case it matters, my eth0 device is:

Bus 004 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter

on a Startech USB-C multiport adapter.
Comment by Alejandro Rodriguez (alejometal) - Wednesday, 29 July 2020, 14:35 GMT
I've the same behavior since some weeks ago. I use ifplugd and netctl-auto to accomplish use net from ethernet when plugged and use wifi profiles when unplugged. I read some update of dhcpcd to version 9 showed 1:30 stop job delay. This affect me as when I need to shutdown, it makes me wait more than 2 minutes to get laptop powered off. Thank you.
Comment by Alejandro Rodriguez (alejometal) - Wednesday, 29 July 2020, 14:52 GMT
I just tried to downgrade dhcpcd to 8.1.7 version and the issue disappeared. Maybe you can try downgrading it and see if it works for you in order to figure out the root cause.
Comment by Alejandro Rodriguez (alejometal) - Wednesday, 29 July 2020, 16:12 GMT
You can also try to use dhclient and remove dhcpcd. check here to set dhclient on netctl configuration. https://wiki.archlinux.org/index.php/netctl#Using_hooks
Comment by Jouke Witteveen (jouke) - Wednesday, 29 July 2020, 20:04 GMT
So do I understand correctly that the error appears to be linked to a dhcpcd update? It would be interesting to know which was the last version that worked to see whether it is a bug in dhcpcd, or something that netctl should do differently.
Comment by Nicola Mori (snack) - Thursday, 30 July 2020, 07:12 GMT
@alejometal thanks for sharing your findings. I will try downgrading dhcpcd at the beginning of the next week, currently I'm out on holidays and have no wired connection to do the test. I will report my findings here.
Comment by Nicola Mori (snack) - Tuesday, 04 August 2020, 11:09 GMT
I did some tests but I found an extremely erratic behavior. I confirm that with dhcpcd 8.1.7 the timeout issue is not present, but when trying more recent versions to identify which one introduced the problem I found that the timeout issue and the error messages in the logs appear randomly. Just as an example, with dhcpcd 9.0.1-4 I got the timeout issue two times over a dozen tries.
I'm going to try again disabling the automatic bring-up of netctl-auto when netctl-ifplugd stops, just in case the two services interfere in some unpredictable manner.

Comment by Nicola Mori (snack) - Tuesday, 04 August 2020, 12:36 GMT
I definitely cannot find any rationale in the system service stop behavior. I installed version 9.0.2-1, rebooted, started and stopped the service twice. On the first attempt I got the timeout issue and this error message:

ago 04 14:19:12 stryke systemd[1]: Stopping Automatic wired network connection using netctl profiles...
ago 04 14:19:12 stryke dhcpcd[678]: eth0: removing interface
ago 04 14:19:12 stryke ifplugd[452]: client: Stopping network profile 'ethernet'...
ago 04 14:19:12 stryke dhcpcd[679]: process 679 unexpectedly terminating on signal 15
ago 04 14:19:12 stryke ifplugd[452]: client: Stopped network profile 'ethernet'
ago 04 14:19:12 stryke ifplugd[452]: Program executed successfully.
ago 04 14:19:12 stryke ifplugd[452]: Exiting.
ago 04 14:20:42 stryke systemd[1]: netctl-ifplugd@eth0.service: State 'stop-final-sigterm' timed out. Killing.
ago 04 14:20:42 stryke systemd[1]: netctl-ifplugd@eth0.service: Killing process 678 (dhcpcd) with signal SIGKILL.
ago 04 14:20:42 stryke systemd[1]: netctl-ifplugd@eth0.service: Failed with result 'timeout'.
ago 04 14:20:42 stryke systemd[1]: Stopped Automatic wired network connection using netctl profiles.

(the "signal 15" error is a known issue of this version, see https://bugs.archlinux.org/task/66405) while on the second attempt I got no error and this log:

ago 04 14:30:33 stryke systemd[1]: Stopping Automatic wired network connection using netctl profiles...
ago 04 14:30:33 stryke ifplugd[2305]: client: Stopping network profile 'ethernet'...
ago 04 14:30:33 stryke ifplugd[2305]: client: dhcpcd not running
ago 04 14:30:33 stryke ifplugd[2305]: client: Stopped network profile 'ethernet'
ago 04 14:30:33 stryke ifplugd[2305]: Program executed successfully.
ago 04 14:30:33 stryke ifplugd[2305]: Exiting.
ago 04 14:30:33 stryke systemd[1]: netctl-ifplugd@eth0.service: Succeeded.
ago 04 14:30:33 stryke systemd[1]: Stopped Automatic wired network connection using netctl profiles.

At this point I'm a bit clueless and I don't know how to proceed.
Comment by j (potatoface) - Thursday, 06 August 2020, 12:39 GMT
i had the same problem. i solved the problem by installing dhclient package

br
Comment by Alejandro Rodriguez (alejometal) - Thursday, 06 August 2020, 17:51 GMT
I also tried with dhclient, and it eliminate the problem, but dhclient have issues with openresolv in my setup and it became worst than to have to wait 1:30 at shutdown time. I think the code of netctl needs to be updated to new dhcpcd versions as this is not happening with versions older than 9.x
Comment by Jouke Witteveen (jouke) - Thursday, 06 August 2020, 21:44 GMT
Indeed, this needs to be fixed. I can't think of what netctl can do about it, so I'll forward this bug to the dhcpcd mailing list. Perhaps it is a bug in dhcpcd.
Comment by Jouke Witteveen (jouke) - Thursday, 06 August 2020, 21:47 GMT
It occurred to me: is this reproducible without netctl? Just suspend-to-ram with the dhcpcd@eth0 service running...
Comment by Alejandro Rodriguez (alejometal) - Thursday, 06 August 2020, 22:12 GMT
I'll do that test as soon as I finish my work. Thanks for the help
Comment by Nicola Mori (snack) - Friday, 07 August 2020, 10:19 GMT
I tried with dhcpcd@eth0 without netctl-ifplugd. It is correctly restarted after resuming from suspend-to-ram, and does not give a timeout problem during shutdown. So I'd say that the issue is entirely in netctl-ifplugd.
Comment by Nicola Mori (snack) - Friday, 07 August 2020, 10:31 GMT
As usual, I spoke too early. Another try with suspend and resume after a fresh boot resulted in this error:

$ sudo systemctl status dhcpcd@eth0 -n 30
* dhcpcd@eth0.service - dhcpcd on eth0
Loaded: loaded (/usr/lib/systemd/system/dhcpcd@.service; disabled; vendor preset: disabled)
Active: inactive (dead)

ago 07 12:20:10 stryke dhcpcd[2221]: sending signal TERM to pid 1312
ago 07 12:20:10 stryke dhcpcd[2221]: waiting for pid 1312 to exit
ago 07 12:20:10 stryke dhcpcd[1312]: script_runreason: Resource temporarily unavailable
ago 07 12:20:10 stryke dhcpcd[1312]: ipv6_deleteaddr: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: arp_free: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: eth0: ipv4_deladdr: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: rt_delete: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: rt_delete: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: script_runreason: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: main: control_stop: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: ps_dostop: Broken pipe
ago 07 12:20:10 stryke dhcpcd[2221]: sending signal TERM to pid 1312
ago 07 12:20:10 stryke dhcpcd[2221]: waiting for pid 1312 to exit
ago 07 12:20:10 stryke dhcpcd[1312]: script_runreason: Resource temporarily unavailable
ago 07 12:20:10 stryke dhcpcd[1312]: eth0: deleting address fe80::145e:4e49:22aa:5f8d
ago 07 12:20:10 stryke dhcpcd[1312]: ipv6_deleteaddr: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: arp_free: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: eth0: ipv4_deladdr: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: eth0: deleting route to 192.168.1.0/24
ago 07 12:20:10 stryke dhcpcd[1312]: rt_delete: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: eth0: deleting default route via 192.168.1.254
ago 07 12:20:10 stryke dhcpcd[1312]: rt_delete: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: script_runreason: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: main: control_stop: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: ps_dostop: Broken pipe
ago 07 12:20:10 stryke dhcpcd[1312]: dhcpcd exited
ago 07 12:20:10 stryke dhcpcd[1313]: ps_root_recvmsg: Connection refused
ago 07 12:20:10 stryke dhcpcd[1313]: ps_root_recvmsg: Connection refused
ago 07 12:20:10 stryke systemd[1]: dhcpcd@eth0.service: Succeeded.
ago 07 12:20:10 stryke systemd[1]: Stopped dhcpcd on eth0.

This happened only once. After manually restarting dhcpcd@eth0 and suspending and resuming, it worked again (tried 2 times successfully).

I'm growing tired of this randomness...
Comment by Jouke Witteveen (jouke) - Saturday, 08 August 2020, 16:33 GMT
> I'm growing tired of this randomness

I can understand! And these things are hard to debug too. Let's see what the dhcpcd maintainer has to say. He is usually very helpful:
https://roy.marples.name/archives/dhcpcd-discuss/0003142.html
Comment by Jouke Witteveen (jouke) - Monday, 10 August 2020, 19:00 GMT
Okay, here is a new hypothesis on what is happening:

When resuming, the network interface appears gone and since the service is bound to the interface, systemd takes the service down by sending SIGTERM to its process. Oddly enough, it does not appear as if ifplugd responds to the signal, and 90 seconds later, the service is taken down forcefully.

Is the issue fixed when you replace

BindsTo=sys-subsystem-net-devices-%i.device

on line 5 of /usr/lib/systemd/system/netctl-ifplugd@.service by

Requires=sys-subsystem-net-devices-%i.device

Let me know!
Comment by Alejandro Rodriguez (alejometal) - Monday, 10 August 2020, 22:55 GMT
I did that change on systemd service, but the behavior is the same. 1:30 delay at shutdown...
Comment by Nicola Mori (snack) - Tuesday, 11 August 2020, 06:53 GMT
Also for me, no benefit in changing the service file as proposed. Might it be significant that my ethernet adapter is not internal but on a USB-C external adapter? I already wrote this but it might have been overlooked.

I read the reply of Roy Marples and it came to my mind that maybe all the problems stem from dhcpcd exiting with a non-zero exit code due to the broken pipe error? The fail in resuming from suspend might come from the fact that the systemd unit is a failed state due to this non-zero exit value and thus is not restarted automatically on resume. I still don't understand why systemd waits 1:30 for netctl-ifplugd to stop, maybe ifplugd shutdown if screwed by the bad return value of dhcpcd?
Comment by Nicola Mori (snack) - Tuesday, 11 August 2020, 13:02 GMT
I did some more checks after dhcpcd@eth0 failed in resuming from suspend. After resuming the system journalctl says:

ago 11 14:44:20 stryke kernel: Restarting tasks ...
ago 11 14:44:20 stryke kernel: usb 4-1.4: USB disconnect, device number 3
ago 11 14:44:20 stryke kernel: mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_component_ops [i915])
ago 11 14:44:20 stryke kernel: done.
ago 11 14:44:20 stryke kernel: pci_bus 0000:04: Allocating resources
ago 11 14:44:20 stryke kernel: pci_bus 0000:3a: Allocating resources
ago 11 14:44:20 stryke kernel: pci_bus 0000:04: Allocating resources
ago 11 14:44:20 stryke kernel: pci_bus 0000:3a: Allocating resources
ago 11 14:44:20 stryke upowerd[809]: treating change event as add on /sys/devices/pci0000:00/0000:00:14.0/usb1/1-0:1.0
ago 11 14:44:20 stryke systemd-sleep[16063]: System resumed.
ago 11 14:44:20 stryke dhcpcd[3647]: eth0: carrier lost
ago 11 14:44:20 stryke dhcpcd[3656]: ps_bpf_recvbpf: Network is down
ago 11 14:44:20 stryke dhcpcd[3656]: ps_bpf_recvbpf: Network is down
ago 11 14:44:20 stryke systemd[1]: systemd-suspend.service: Succeeded.
ago 11 14:44:20 stryke systemd[1]: Finished Suspend.

For what I understand, for some reason the USB multiport adapter is disconnected at the beginning of the task resume. This causes the loss of eth0 and this generates errors in dchpcd. After this, the dhcpcd service tries to shut down:

ago 11 14:44:20 stryke dhcpcd[16208]: sending signal TERM to pid 3647
ago 11 14:44:20 stryke dhcpcd[16208]: waiting for pid 3647 to exit
ago 11 14:44:20 stryke dhcpcd[3647]: script_runreason: Resource temporarily unavailable
ago 11 14:44:20 stryke dhcpcd[3647]: arp_free: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: eth0: ipv4_deladdr: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: rt_delete: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: rt_delete: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: script_runreason: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: main: control_stop: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: ps_dostop: Broken pipe
ago 11 14:44:20 stryke dhcpcd[16208]: sending signal TERM to pid 3647
ago 11 14:44:20 stryke dhcpcd[16208]: waiting for pid 3647 to exit
ago 11 14:44:20 stryke dhcpcd[3647]: script_runreason: Resource temporarily unavailable
ago 11 14:44:20 stryke dhcpcd[3647]: arp_free: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: eth0: ipv4_deladdr: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: eth0: deleting route to 192.168.1.0/24
ago 11 14:44:20 stryke dhcpcd[3647]: rt_delete: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: eth0: deleting default route via 192.168.1.254
ago 11 14:44:20 stryke dhcpcd[3647]: rt_delete: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: script_runreason: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: main: control_stop: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: ps_dostop: Broken pipe
ago 11 14:44:20 stryke dhcpcd[3647]: dhcpcd exited
ago 11 14:44:20 stryke dhcpcd[3648]: ps_root_recvmsg: Connection refused
ago 11 14:44:20 stryke kernel: audit: type=1130 audit(1597149860.150:163): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=intel-undervolt comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
ago 11 14:44:20 stryke kernel: audit: type=1131 audit(1597149860.150:164): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=intel-undervolt comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
ago 11 14:44:20 stryke dhcpcd[3648]: ps_root_recvmsg: Connection refused
ago 11 14:44:20 stryke systemd[1]: dhcpcd@eth0.service: Succeeded.
ago 11 14:44:20 stryke systemd[1]: Stopped dhcpcd on eth0.

but it cannot do it correctly (maybe because it tries to shut down eth0, which is no longer available?) and so the service enters a failed state. After this, the USB multiport is brought up again:

ago 11 14:44:20 stryke kernel: usb 4-1.4: New USB device found, idVendor=0bda, idProduct=8153, bcdDevice=30.00
ago 11 14:44:20 stryke kernel: usb 4-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
ago 11 14:44:20 stryke kernel: usb 4-1.4: Product: USB 10/100/1000 LAN
ago 11 14:44:20 stryke kernel: usb 4-1.4: Manufacturer: Realtek
ago 11 14:44:20 stryke kernel: usb 4-1.4: SerialNumber: 000001
ago 11 14:44:20 stryke kernel: usb 4-1.4: reset SuperSpeed Gen 1 USB device number 4 using xhci_hcd
ago 11 14:44:20 stryke kernel: r8152 4-1.4:1.0: load rtl8153a-4 v2 02/07/20 successfully
ago 11 14:44:20 stryke kernel: r8152 4-1.4:1.0 eth0: v1.11.11
ago 11 14:44:20 stryke systemd-udevd[16170]: Using default interface naming scheme 'v245'.
ago 11 14:44:20 stryke systemd-udevd[16170]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.

To verify this hypothesis, i.e. dhcpcd fails if eth0 suddenly disappears, I did a simple check by unplugging the USB multiport while dhcpcd@eth0 was up and running, and this i sthe result:

ago 11 14:59:15 stryke dhcpcd[17201]: eth0: carrier lost
ago 11 14:59:15 stryke dhcpcd[17222]: ps_bpf_recvbpf: Network is down
ago 11 14:59:15 stryke dhcpcd[17222]: ps_bpf_recvbpf: Network is down
ago 11 14:59:15 stryke systemd[1]: Stopping dhcpcd on eth0...
ago 11 14:59:15 stryke dhcpcd[18591]: sending signal TERM to pid 17201
ago 11 14:59:15 stryke dhcpcd[18591]: waiting for pid 17201 to exit
ago 11 14:59:15 stryke dhcpcd[17201]: script_runreason: Resource temporarily unavailable
ago 11 14:59:15 stryke dhcpcd[17201]: ipv6_deleteaddr: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: arp_free: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: eth0: ipv4_deladdr: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: rt_delete: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: rt_delete: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: script_runreason: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: main: control_stop: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: ps_dostop: Broken pipe
ago 11 14:59:15 stryke dhcpcd[18591]: sending signal TERM to pid 17201
ago 11 14:59:15 stryke dhcpcd[18591]: waiting for pid 17201 to exit
ago 11 14:59:15 stryke dhcpcd[17201]: script_runreason: Resource temporarily unavailable
ago 11 14:59:15 stryke dhcpcd[17201]: eth0: deleting address fe80::145e:4e49:22aa:5f8d
ago 11 14:59:15 stryke dhcpcd[17201]: ipv6_deleteaddr: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: arp_free: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: eth0: ipv4_deladdr: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: eth0: deleting route to 192.168.1.0/24
ago 11 14:59:15 stryke dhcpcd[17201]: rt_delete: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: eth0: deleting default route via 192.168.1.254
ago 11 14:59:15 stryke dhcpcd[17201]: rt_delete: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: script_runreason: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: main: control_stop: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: ps_dostop: Broken pipe
ago 11 14:59:15 stryke dhcpcd[17201]: dhcpcd exited
ago 11 14:59:15 stryke dhcpcd[17202]: ps_root_recvmsg: Connection refused
ago 11 14:59:15 stryke dhcpcd[17202]: ps_root_recvmsg: Connection refused
ago 11 14:59:15 stryke systemd[1]: dhcpcd@eth0.service: Succeeded.
ago 11 14:59:15 stryke systemd[1]: Stopped dhcpcd on eth0.

This log seems quite similar to those after resuming from suspend, so I'd say that they are in favor of my hypothesis.

Jouke, what do you think? Does this make sense to you?
Comment by Alejandro Rodriguez (alejometal) - Wednesday, 12 August 2020, 15:11 GMT
In my case, my ethernet is the one internal on my laptop, BTW.
I think is some kind of race condition with ifplugd, netctl and dhcpcd.
Comment by Jouke Witteveen (jouke) - Friday, 14 August 2020, 10:27 GMT
@Nicola: This makes sense to me. I have forwarded your findings to the dhcpcd mailing list.

@Nicola & Alejandro: just to be sure, you have no additional dhcpcd services active, do you?

Slowly, we're getting to the bottom of this.
Comment by Nicola Mori (snack) - Friday, 14 August 2020, 10:29 GMT
Thanks Jouke. I have disabled all the netctl services and after boot I have no network connectivity and start dhcpcd@eth0 manually. So I'm pretty sure there's no other dhcpcd instance around.
Comment by Nicola Mori (snack) - Friday, 14 August 2020, 10:34 GMT
@Jouke I read your reply on the dhcpcd mailing list and I think there's actually a small error. systemd does not forcefully kill dhcpcd@eth0 after a timeout: this service is stopped immediately, but with errors so the unit is placed in a failed state. It is netctl-ifplugd@eth0 which is killed after a timeout.
Comment by Alejandro Rodriguez (alejometal) - Friday, 14 August 2020, 15:13 GMT
I managed to fix my issue getting rid of ifplugd and setup a hook for dhcpcd:
Disable all netctl-auto and netctl-ifplugd services. I mean disable from autostart.
Create a file on /usr/lib/dhcpcd/dhcpcd-hooks/99-local or something, with the following:

## Sample dhcpcd hook script
wired=enp0s31f6
wireless=wlp3s0

if [ "${interface}" = $wired ]; then
case "${reason}" in NOCARRIER|BOUND)
if $if_up; then
systemctl stop netctl-auto@$wireless
elif $if_down; then
systemctl start netctl-auto@$wireless
fi
;;
esac
fi

Change wired and wireless variables for your ethernet device and wifi.

next, start the dhcpcd@eth service and try plugging and unplugging just like the behavior with ifplugd

if the outcomes fills your needs, enable the dhcpcd@eth at boot.

Hope this can help.

Alejandro,
Comment by Jouke Witteveen (jouke) - Monday, 17 August 2020, 05:59 GMT
@Alejandro: Good that you fixed it! Unfortunately, your fix does not give any more insight into the issue at hand...

@Nicola: At least once, we saw systemd kill dhcpcd in your logs:
> ago 04 14:20:42 stryke systemd[1]: netctl-ifplugd@eth0.service: Killing process 678 (dhcpcd) with signal SIGKILL.
In that same log, we see that dhcpcd is active at two pids: 678 and 679, and one of them dies before the other is terminated (but doesn't respond).

Are you able to patch and recompile dhcpcd to do some more testing? If so, let me know what the effect of this patch is:
https://roy.marples.name/archives/dhcpcd-discuss/0003151.html

Thanks a lot!
Comment by Gunnar Waterstraat (hansmitdampf) - Tuesday, 18 August 2020, 15:28 GMT
I think, I stumbled upon the same issue.
To autoconfigure proxy settings in my corporate network (ethernet) vs. no proxy in WiFi mode I am using hooks.

I have enabled netctl-auto@wlp8s0.service and netctl-ifplugd@enp2s0.servide
This setup has been working flawlessly for some years, but since several weeks switching between ethernet and wifi does not work and rebooting/ shutting down takes ages because of a hanging process.

Specifically, I switch off wifi when connecting to ethernet and change my Privoxy (transparent anti-tracking/ads proxy) configuration:

/etc/netctl/interfaces/enp2s0:
#!/bin/sh
ExecUpPost="systemctl stop netctl-auto@wlp8s0.service; cp /etc/privoxy/config.employer /etc/privoxy/config"
ExecDownPre="systemctl start netctl-auto@wlp8s0.service; cp /etc/privoxy/config.direct /etc/privoxy/config"

/etc/netctl/interfaces/wlp8s0
#!/bin/sh
ExecUpPost="cp /etc/privoxy/config.direct /etc/privoxy/config"

Now, with:
/etc/netctl/hooks/debug
#!/bin/sh
NETCTL_DEBUG=yes

I looked at the journal log when plugging ethernet in and out:
the interfaces hooks are not executed and the system hangs at shutdown

Step 1: Downgraded netctl to netctl-1.22-1
Same issues, interface-hooks not executed, system hangs at shutdown

Step 2 (additional to Step 1): Downgraded dhcpcd tp 9.0.2-1

When pluggin in cable:
░░ Subject: A stop job for unit netctl-auto@wlp8s0.service has begun execution
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A stop job for unit netctl-auto@wlp8s0.service has begun execution.
░░
░░ The job identifier is 2097.

And later:

Journal log:
Aug 18 17:11:43 ray systemd[1]: netctl-auto@wlp8s0.service: State 'stop-sigterm' timed out. Killing.
Aug 18 17:11:43 ray systemd[1]: netctl-auto@wlp8s0.service: Killing process 1931 (dhcpcd) with signal SIGKILL.
Aug 18 17:11:43 ray systemd[1]: netctl-auto@wlp8s0.service: Failed with result 'timeout'.


During the waiting time for killing dhcpcd:
ps -A | grep dhc
1010 ? 00:00:00 dhcpcd
1011 ? 00:00:00 dhcpcd
1079 ? 00:00:00 dhcpcd
2525 ? 00:00:00 dhcpcd
2526 ? 00:00:00 dhcpcd <defunct>
2600 ? 00:00:00 dhcpcd


After Killing process 1931 (dhcpcd) with signal SIGKILL.
ps -A | grep dhc
1010 ? 00:00:00 dhcpcd
1011 ? 00:00:00 dhcpcd
1079 ? 00:00:00 dhcpcd
2600 ? 00:00:00 dhcpcd

Step 3: downgrading package dhcpcd (9.0.2-1 => 8.1.7-1)
Hooks are correctly executed
ps -A | grep dhc
2338 ? 00:00:00 dhcpcd
Everything is fine

Step 4: Upgrading to netctl-1.23-1 again
Hook is executed once - a not anymore later

ps -A | grep dhc
1308 ? 00:00:00 dhcpcd
1961 ? 00:00:00 dhcpcd

And in Journal log:
Aug 18 17:23:54 ray dhcpcd[2003]: dhcpcd already running on pid 1308 (/run/dhcpcd-enp2s0-4.pid)

So here, I need to downgrade both netctl (to 1.22-1) AND dhcpcd (8.1.7-1) in order to make everything work

I hope that helps
Comment by Nicola Mori (snack) - Sunday, 23 August 2020, 15:15 GMT
@Jouke Sorry for the delay, I was out on holidays. I tried the patch you linked, but it didn't change much:

1) dhcpcd@eth0 still fails when the eth0 interface is removed by removing the USB adapter or by putting the machine to sleep as below:

$ sudo systemctl status dhcpcd@eth0 -n 20
* dhcpcd@eth0.service - dhcpcd on eth0
Loaded: loaded (/usr/lib/systemd/system/dhcpcd@.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2020-08-23 16:44:41 CEST; 26s ago
Process: 1623 ExecStart=/usr/bin/dhcpcd -q -w eth0 (code=exited, status=0/SUCCESS)
Process: 1845 ExecStop=/usr/bin/dhcpcd -x eth0 (code=exited, status=0/SUCCESS)
Main PID: 1625 (code=exited, status=1/FAILURE)

Aug 23 16:44:15 stryke dhcpcd[1625]: eth0: adding route to 192.168.0.0/24
Aug 23 16:44:15 stryke dhcpcd[1625]: eth0: adding default route via 192.168.0.1
Aug 23 16:44:16 stryke systemd[1]: Started dhcpcd on eth0.
Aug 23 16:44:23 stryke dhcpcd[1625]: eth0: no IPv6 Routers available
Aug 23 16:44:40 stryke dhcpcd[1625]: eth0: carrier lost
Aug 23 16:44:40 stryke dhcpcd[1651]: ps_bpf_recvbpf: Network is down
Aug 23 16:44:40 stryke dhcpcd[1651]: ps_bpf_recvbpf: Network is down
Aug 23 16:44:40 stryke systemd[1]: Stopping dhcpcd on eth0...
Aug 23 16:44:40 stryke dhcpcd[1845]: sending signal TERM to pid 1625
Aug 23 16:44:40 stryke dhcpcd[1845]: waiting for pid 1625 to exit
Aug 23 16:44:40 stryke dhcpcd[1845]: sending signal TERM to pid 1625
Aug 23 16:44:40 stryke dhcpcd[1845]: waiting for pid 1625 to exit
Aug 23 16:44:40 stryke dhcpcd[1625]: eth0: deleting address fe80::145e:4e49:22aa:5f8d
Aug 23 16:44:40 stryke dhcpcd[1625]: eth0: deleting route to 192.168.0.0/24
Aug 23 16:44:40 stryke dhcpcd[1625]: eth0: deleting default route via 192.168.0.1
Aug 23 16:44:40 stryke dhcpcd[1625]: eth0: removing interface
Aug 23 16:44:40 stryke dhcpcd[1625]: dhcpcd exited
Aug 23 16:44:40 stryke systemd[1]: dhcpcd@eth0.service: Main process exited, code=exited, status=1/FAILURE
Aug 23 16:44:41 stryke systemd[1]: dhcpcd@eth0.service: Failed with result 'exit-code'.
Aug 23 16:44:41 stryke systemd[1]: Stopped dhcpcd on eth0.

2) netctl-ifplugd@eth0 seemed to work at a first try: I have been able to resume from sleep twice successfully, but then I hade the usual 1m30s timeout on shutdown and al swhen manually stopping the service. Then I experienced the usual failures in resuming from suspend; below the service seems to be correctly restarted after resimng frm suspend, but I had no network connectivity and dhcpcd was not running:

$ sudo systemctl status netctl-ifplugd@eth0 -n 15
* netctl-ifplugd@eth0.service - Automatic wired network connection using netctl profiles
Loaded: loaded (/usr/lib/systemd/system/netctl-ifplugd@.service; disabled; vendor preset: disabled)
Active: active (running) since Sun 2020-08-23 16:58:29 CEST; 9min ago
Docs: man:netctl.special(7)
Main PID: 1622 (ifplugd)
Status: "online"
Tasks: 1 (limit: 18770)
Memory: 4.0M
CGroup: /system.slice/system-netctl\x2difplugd.slice/netctl-ifplugd@eth0.service
`-1622 /usr/bin/ifplugd -i eth0 -r /etc/ifplugd/netctl.action -bfIns

Aug 23 16:58:30 stryke dhcpcd[1670]: eth0: rebinding lease of 192.168.0.8
Aug 23 16:58:30 stryke dhcpcd[1670]: eth0: leased 192.168.0.8 for 86400 seconds
Aug 23 16:58:30 stryke dhcpcd[1670]: eth0: adding route to 192.168.0.0/24
Aug 23 16:58:30 stryke dhcpcd[1670]: eth0: adding default route via 192.168.0.1
Aug 23 16:58:30 stryke ifplugd[1622]: client: Started network profile 'ethernet'
Aug 23 17:07:32 stryke dhcpcd[1670]: eth0: carrier lost
Aug 23 17:07:32 stryke ifplugd[1622]: client: ps_bpf_recvbpf: Network is down
Aug 23 17:07:32 stryke dhcpcd[1676]: ps_bpf_recvbpf: Network is down
Aug 23 17:07:32 stryke dhcpcd[1670]: eth0: deleting route to 192.168.0.0/24
Aug 23 17:07:32 stryke dhcpcd[1670]: eth0: deleting default route via 192.168.0.1
Aug 23 17:07:32 stryke dhcpcd[1670]: eth0: removing interface
Aug 23 17:07:32 stryke dhcpcd[1670]: dhcpcd exited
Aug 23 17:07:32 stryke ifplugd[1622]: Program executed successfully.
Aug 23 17:07:33 stryke ifplugd[1622]: Link beat lost.
Aug 23 17:07:35 stryke ifplugd[1622]: Link beat detected.

All in all, the usual random behavior... My feeling is still that the root of all this problems is the bad behavior of dhcpcd when the interface is removed.
Comment by Nicola Mori (snack) - Thursday, 27 August 2020, 13:02 GMT
Another piece for the puzzle: using dhcpcd@eth0, I suspended and resumed two times. The first time the network failed to resume, while the second time it succeeded. This is the log of USB events during resume on the first, failed try:

2307:Aug 27 12:17:52 stryke kernel: usb usb3: root hub lost power or was reset
2308:Aug 27 12:17:52 stryke kernel: usb usb4: root hub lost power or was reset
2310:Aug 27 12:17:52 stryke kernel: usb 4-1: Disable of device-initiated U1 failed.
2311:Aug 27 12:17:52 stryke kernel: usb 4-1: Disable of device-initiated U2 failed.
2313:Aug 27 12:17:52 stryke kernel: usb 1-12: reset high-speed USB device number 6 using xhci_hcd
2314:Aug 27 12:17:52 stryke kernel: usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
2333:Aug 27 12:17:52 stryke upowerd[1841]: treating change event as add on /sys/devices/pci0000:00/0000:00:14.0/usb1/1-0:1.0
2334:Aug 27 12:17:52 stryke kernel: usb 4-1.4: USB disconnect, device number 3
2463:Aug 27 12:17:53 stryke kernel: usb 4-1.4: new SuperSpeed Gen 1 USB device number 4 using xhci_hcd
2464:Aug 27 12:17:53 stryke kernel: usb 4-1.4: New USB device found, idVendor=0bda, idProduct=8153, bcdDevice=30.00
2465:Aug 27 12:17:53 stryke kernel: usb 4-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
2466:Aug 27 12:17:53 stryke kernel: usb 4-1.4: Product: USB 10/100/1000 LAN
2467:Aug 27 12:17:53 stryke kernel: usb 4-1.4: Manufacturer: Realtek
2468:Aug 27 12:17:53 stryke kernel: usb 4-1.4: SerialNumber: 000001
2469:Aug 27 12:17:53 stryke kernel: usb 4-1.4: reset SuperSpeed Gen 1 USB device number 4 using xhci_hcd

You can see that the USB adapter providing the eth0 interface gets disconnected (by upowerd?) and then reconnected. In the middle, the dhcpcd messages are:

2306:Aug 27 12:17:52 stryke dhcpcd[5755]: eth0: carrier lost
2307:Aug 27 12:17:52 stryke dhcpcd[5783]: ps_bpf_recvbpf: Network is down
2308:Aug 27 12:17:52 stryke dhcpcd[5783]: ps_bpf_recvbpf: Network is down
2361:Aug 27 12:17:52 stryke systemd[1]: Stopping dhcpcd on eth0...
2362:-- Subject: A stop job for unit dhcpcd@eth0.service has begun execution
2366:-- A stop job for unit dhcpcd@eth0.service has begun execution.
2369:Aug 27 12:17:52 stryke dhcpcd[36849]: sending signal TERM to pid 5755
2370:Aug 27 12:17:52 stryke dhcpcd[36849]: waiting for pid 5755 to exit
2371:Aug 27 12:17:52 stryke dhcpcd[36849]: sending signal TERM to pid 5755
2372:Aug 27 12:17:52 stryke dhcpcd[36849]: waiting for pid 5755 to exit
2396:Aug 27 12:17:52 stryke dhcpcd[5755]: eth0: deleting address fe80::145e:4e49:22aa:5f8d
2397:Aug 27 12:17:52 stryke dhcpcd[5755]: eth0: deleting route to 192.168.0.0/24
2398:Aug 27 12:17:52 stryke dhcpcd[5755]: eth0: deleting default route via 192.168.0.1
2399:Aug 27 12:17:52 stryke dhcpcd[5755]: eth0: removing interface
2400:Aug 27 12:17:52 stryke dhcpcd[5755]: dhcpcd exited
2401:Aug 27 12:17:52 stryke systemd[1]: dhcpcd@eth0.service: Main process exited, code=exited, status=1/FAILURE
2406:-- An ExecStart= process belonging to unit dhcpcd@eth0.service has exited.
2409:Aug 27 12:17:52 stryke systemd[1]: dhcpcd@eth0.service: Failed with result 'exit-code'.
2414:-- The unit dhcpcd@eth0.service has entered the 'failed' state with result 'exit-code'.
2415:Aug 27 12:17:52 stryke systemd[1]: Stopped dhcpcd on eth0.
2416:-- Subject: A stop job for unit dhcpcd@eth0.service has finished
2420:-- A stop job for unit dhcpcd@eth0.service has finished.
2423:Aug 27 12:17:52 stryke audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dhcpcd@eth0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
2424:Aug 27 12:17:52 stryke kernel: audit: type=1131 audit(1598523472.969:108): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dhcpcd@eth0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'

So once again it seems that the sudden disappearance of the eth0 interface makes dhcpcd exit with a failure code and the dhcpcd@eth0 unit enters a failed state.
This is the same part of the log for the second, successful trial:

2793:Aug 27 14:34:05 stryke kernel: usb usb3: root hub lost power or was reset
2794:Aug 27 14:34:05 stryke kernel: usb usb4: root hub lost power or was reset
2796:Aug 27 14:34:05 stryke kernel: usb 4-1: Disable of device-initiated U1 failed.
2797:Aug 27 14:34:05 stryke kernel: usb 4-1: Disable of device-initiated U2 failed.
2799:Aug 27 14:34:05 stryke kernel: usb 1-12: reset high-speed USB device number 6 using xhci_hcd
2800:Aug 27 14:34:05 stryke kernel: usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
2801:Aug 27 14:34:05 stryke kernel: usb 1-5.3.1: reset high-speed USB device number 7 using xhci_hcd
2802:Aug 27 14:34:05 stryke kernel: usb 4-1.4: reset SuperSpeed Gen 1 USB device number 4 using xhci_hcd

2776:Aug 27 14:34:05 stryke dhcpcd[37091]: eth0: carrier lost
2778:Aug 27 14:34:05 stryke dhcpcd[37091]: eth0: deleting address fe80::145e:4e49:22aa:5f8d
2779:Aug 27 14:34:05 stryke dhcpcd[37091]: eth0: deleting route to 192.168.0.0/24
2780:Aug 27 14:34:05 stryke dhcpcd[37091]: eth0: deleting default route via 192.168.0.1
2872:Aug 27 14:34:07 stryke dhcpcd[37091]: eth0: carrier acquired
2874:Aug 27 14:34:07 stryke dhcpcd[37091]: eth0: IAID 4c:a2:02:4e
2875:Aug 27 14:34:07 stryke dhcpcd[37091]: eth0: adding address fe80::145e:4e49:22aa:5f8d
2876:Aug 27 14:34:07 stryke dhcpcd[37091]: eth0: soliciting an IPv6 router
2877:Aug 27 14:34:09 stryke dhcpcd[37091]: eth0: rebinding lease of 192.168.0.2
2878:Aug 27 14:34:09 stryke dhcpcd[37091]: eth0: probing address 192.168.0.2/24
2891:Aug 27 14:34:14 stryke dhcpcd[37091]: eth0: leased 192.168.0.2 for 86400 seconds
2892:Aug 27 14:34:14 stryke dhcpcd[37091]: eth0: adding route to 192.168.0.0/24
2893:Aug 27 14:34:14 stryke dhcpcd[37091]: eth0: adding default route via 192.168.0.1
2896:Aug 27 14:34:20 stryke dhcpcd[37091]: eth0: no IPv6 Routers available

Now there's no disconnect/reconnect event for the USB adapter, and dhcpcd simply looses and re-acquires the carrier, thus resuming correctly.
Comment by Nicola Mori (snack) - Thursday, 27 August 2020, 13:12 GMT
@Jouke: I read the post of Roy Marples about his tries with tap0, so I did the same with eth0:

$ sudo dhcpcd -dB eth0
dhcpcd-9.1.4 starting
spawned privileged actioneer on PID 46365
spawned network proxy on PID 46366
spawned controller proxy on PID 46367
DUID 00:04:4c:4c:45:44:00:50:4b:10:80:4d:c4:c0:4f:4a:52:32
chrooting to `/var/lib/dhcpcd' as dhcpcd
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' PREINIT
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' NOCARRIER
eth0: waiting for carrier
eth0: carrier acquired
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' CARRIER
eth0: IAID 4c:a2:02:4e
eth0: adding address fe80::145e:4e49:22aa:5f8d
eth0: pltime infinity, vltime infinity
eth0: delaying IPv6 router solicitation for 0.1 seconds
eth0: delaying IPv4 for 0.4 seconds
eth0: carrier lost
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' NOCARRIER
eth0: deleting address fe80::145e:4e49:22aa:5f8d
eth0: carrier acquired
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' CARRIER
eth0: IAID 4c:a2:02:4e
eth0: adding address fe80::145e:4e49:22aa:5f8d
eth0: pltime infinity, vltime infinity
eth0: delaying IPv6 router solicitation for 0.0 seconds
eth0: delaying IPv4 for 0.0 seconds
eth0: reading lease `/var/lib/dhcpcd/eth0.lease'
eth0: rebinding lease of 192.168.0.2
eth0: sending REQUEST (xid 0x7e44c107), next in 3.2 seconds
eth0: soliciting an IPv6 router
eth0: delaying Router Solicitation for LL address
eth0: acknowledged 192.168.0.2 from 192.168.0.1
eth0: probing address 192.168.0.2/24
eth0: probing for 192.168.0.2
eth0: ARP probing 192.168.0.2 (1 of 3), next in 1.5 seconds
eth0: sending Router Solicitation
spawned listener fe80::145e:4e49:22aa:5f8d on PID 46392
eth0: ARP probing 192.168.0.2 (2 of 3), next in 1.7 seconds
eth0: ARP probing 192.168.0.2 (3 of 3), next in 2.0 seconds
eth0: DAD completed for 192.168.0.2
eth0: leased 192.168.0.2 for 86400 seconds
eth0: renew in 43200 seconds, rebind in 75600 seconds
eth0: writing lease `/var/lib/dhcpcd/eth0.lease'
eth0: adding IP address 192.168.0.2/24 broadcast 192.168.0.255
eth0: adding route to 192.168.0.0/24
eth0: adding default route via 192.168.0.1
eth0: ARP announcing 192.168.0.2 (1 of 2), next in 2.0 seconds
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' BOUND
spawned listener 192.168.0.2 on PID 46450
eth0: sending Router Solicitation
eth0: ARP announcing 192.168.0.2 (2 of 2)
eth0: sending Router Solicitation
eth0: sending Router Solicitation
eth0: no IPv6 Routers available
<HERE I DETACH THE USB ADAPTER>
eth0: carrier lost
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' NOCARRIER
ps_bpf_recvbpf: Network is down
eth0: deleting address fe80::145e:4e49:22aa:5f8d
eth0: deleting IP address 192.168.0.2/24
eth0: deleting route to 192.168.0.0/24
eth0: deleting default route via 192.168.0.1
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' EXPIRE
eth0: interface departed
eth0: removing interface
eth0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' DEPARTED
dhcpcd exited
$ echo $?
1

The differences I see with the tap0 case are:
- when executing the NOCARRIER hook the message "ps_bpf_recvbpf: Network is down" appears. This is seen as an error in systemd logs, i.e. it's usually printed in red;
- the exit code is not 0, so it's seen as an error (I guess that with tap0 it exits with 0).

I don't know if this can be useful at all. If it helps I can send this directly to Roy Marple's mailing list. Thanks.
Comment by Jouke Witteveen (jouke) - Friday, 28 August 2020, 12:15 GMT
I'm am sorry that I still haven't found the time to replicate this on my own machine. It shouldn't be too hard and I want to fix this bug.

Anyway, part of what you are seeing can maybe be explained by the
BindsTo=sys-subsystem-net-devices-%i.device
statement in the service file of both dhcpcd@ and netctl-ifplugd@. Due to this statement, systemd terminates a service when the interface disappears. However, it will not restart a service when the interface reappears and I do not think there is a property to achieve this. That's why I suggested using Requires= instead, as that will only have an effect on the initial startup of the service. What I don't understand, though, is that after this line
Aug 23 17:07:35 stryke ifplugd[1622]: Link beat detected.
There is no sign of ifplugd running the netctl script... That could be something that needs to be fixed in netctl.

While I don't fully understand what is going on, the recent dhcpcd patch does seem to improve things, right?
Comment by Nicola Mori (snack) - Friday, 28 August 2020, 12:38 GMT
I tried to replace BindsTo with Requires in dhcpcd@.service but it didn't change anything. In fact, I think that it is dhcpcd itself that reacts badly to the removal of the interface, not systemd. About the patch, it seems that it improved something in the sense that now when the error happens it seems to always be "ps_bpf_recvbpf: Network is down".

About the "Link beat detected" after quitting netctl-ifplugd, it was with the unpatched dhcpcd and actually there was a leftover dhcpcd instance running, if I correctly remember.
Comment by Jouke Witteveen (jouke) - Friday, 04 September 2020, 14:08 GMT
I finally found the time to investigate this a bit on my own machine. What I found was that with dhcpcd 9, ifplugd somehow doesn't correctly get notified of the termination of the netctl action script. I expect that this is the key to fixing this bug, as it might be the final thing that prevents ifplugd from restarting the profile on resume.

The problem is visible after starting netctl-ifplugd@INTERFACE as "[netctl.action] <defunct>" in the output of `ps`.
Comment by Jouke Witteveen (jouke) - Saturday, 05 September 2020, 16:24 GMT
It looks like this may have been an oddity in the behavior of ifplugd, but the next version of dhcpcd will likely contain changes that make it work again. In the next version of netctl, I will make some adjustments to make netctl-ifplugd@ more robust, but already with the next version of dhcpcd the previous level of functionality (netctl 1.22 and dhcpcd 8) should be available again.

Many thanks to Roy Marples, the author of dhcpcd.
Comment by Nicola Mori (snack) - Tuesday, 08 September 2020, 07:42 GMT
I just updated dhcpcd to 9.2.0-1 and now netctl-ifplugd can be stopped correctly and without the 1m30s timeout, so it seems that at least this issue is fixed.

However, resume from suspend is still broken on my system, and I believe it's still due to the fact that my eth0 is on an external USB adapter that gets disconnected on suspend, and on resume it's still not available when dhcpcd is resumed so dhcpcd exits badly:

$ sudo systemctl status netctl-ifplugd@eth0
* netctl-ifplugd@eth0.service - Automatic wired network connection using netctl profiles
Loaded: loaded (/usr/lib/systemd/system/netctl-ifplugd@.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2020-09-08 09:31:24 CEST; 53s ago
Docs: man:netctl.special(7)
Main PID: 10469 (ifplugd)
Status: "online"
Tasks: 1 (limit: 18770)
Memory: 4.9M
CGroup: /system.slice/system-netctl\x2difplugd.slice/netctl-ifplugd@eth0.service
`-10469 /usr/bin/ifplugd -i eth0 -r /etc/ifplugd/netctl.action -bfIns

set 08 09:31:33 stryke ifplugd[10469]: client: Started network profile 'ethernet'
set 08 09:31:33 stryke ifplugd[10469]: Program executed successfully.
set 08 09:32:03 stryke dhcpcd[10528]: eth0: carrier lost <-------------------- Resume starts here
set 08 09:32:03 stryke dhcpcd[10534]: ps_bpf_recvbpf: Network is down
set 08 09:32:03 stryke dhcpcd[10528]: eth0: deleting route to 192.168.1.0/24
set 08 09:32:03 stryke dhcpcd[10528]: eth0: deleting default route via 192.168.1.254
set 08 09:32:03 stryke dhcpcd[10528]: eth0: removing interface
set 08 09:32:03 stryke dhcpcd[10528]: dhcpcd exited
set 08 09:32:04 stryke ifplugd[10469]: Link beat lost.
set 08 09:32:08 stryke ifplugd[10469]: Link beat detected.

The unit itself is still active, and ifplugd still seems to work: it detects the link beat when the eth0 device is added again bit at that point dhcpcd is dead so no dhcp lease is re-acquired. This is just my hypothesis based on the logs, but I might be wrong since I'm not an expert at all. However, if I'm correct then maybe the issue could be fixed by ifplugd spawning a new dhcpcd instance if the old one is no longer alive?
Comment by Alejandro Rodriguez (alejometal) - Tuesday, 08 September 2020, 14:38 GMT
I just tried with dhcpcd-9.2 and is working now as before 9.x, Thank you so much Jouke and Nicola.
Alejandro,
Comment by Nicola Mori (snack) - Friday, 18 September 2020, 12:02 GMT
Hi Jouke, any progress on this? Thanks.
Comment by Jouke Witteveen (jouke) - Monday, 21 September 2020, 16:15 GMT
@Nicola, this is still on my radar, but I can't reproduce it and have very limited time to investigate. I plan to release the next version of netctl once this issue is addressed.

The fact that dhcpcd is not running once the interface is added again is expected. However, it should have been shut down because ifplugd told it to. I guess the reason ifplugd does not forward the loss of a link to netctl is because of its default down delay of 5 seconds. I see two possible mitigations.

1) Lower/disable the down delay of ifplugd (e.g. `ifplugd -d 0`).
2) Disable dhcpcd's internal link management.

The second option you could test by adding
DhcpcdOptions="--nolink"
to your profile. Does it work then?

Another solution that is somewhat appealing to me would be to add this configurable down delay also to dhcpcd.
Comment by Nicola Mori (snack) - Wednesday, 23 September 2020, 12:48 GMT
@Jouke I tried option 2 and so far it seems to work for resume from suspend. I tried twice:

set 23 14:37:02 stryke systemd[1]: Started Automatic wired network connection using netctl profiles.
set 23 14:37:02 stryke ifplugd[61760]: ifplugd 0.28 initializing.
set 23 14:37:02 stryke ifplugd[61760]: Using interface eth0/00:E0:4C:A2:02:4E with driver <r8152> (version: v1.11.11)
set 23 14:37:02 stryke ifplugd[61760]: Using detection mode: SIOCETHTOOL
set 23 14:37:02 stryke ifplugd[61760]: Initialization complete, link beat not detected.
set 23 14:37:03 stryke ifplugd[61760]: Link beat detected.
set 23 14:37:04 stryke ifplugd[61760]: Executing '/etc/ifplugd/netctl.action eth0 up'.
set 23 14:37:04 stryke ifplugd[61760]: client: Starting network profile 'ethernet'...
set 23 14:37:04 stryke dhcpcd[61837]: dhcpcd-9.2.0 starting
set 23 14:37:04 stryke dhcpcd[61839]: DUID 00:04:4c:4c:45:44:00:50:4b:10:80:4d:c4:c0:4f:4a:52:32
set 23 14:37:04 stryke dhcpcd[61839]: eth0: IAID 4c:a2:02:4e
set 23 14:37:05 stryke dhcpcd[61839]: eth0: rebinding lease of 192.168.1.162
set 23 14:37:05 stryke dhcpcd[61839]: eth0: probing address 192.168.1.162/24
set 23 14:37:10 stryke dhcpcd[61839]: eth0: leased 192.168.1.162 for 86400 seconds
set 23 14:37:10 stryke dhcpcd[61839]: eth0: adding route to 192.168.1.0/24
set 23 14:37:10 stryke dhcpcd[61839]: eth0: adding default route via 192.168.1.254
set 23 14:37:10 stryke ifplugd[61760]: client: Started network profile 'ethernet'
set 23 14:37:10 stryke ifplugd[61760]: Program executed successfully.
set 23 14:38:02 stryke ifplugd[61760]: Link beat lost. <------- 1st resume attempt
set 23 14:38:05 stryke ifplugd[61760]: Link beat detected.
set 23 14:41:09 stryke ifplugd[61760]: Link beat lost. <------- 2nd resume attempt
set 23 14:41:12 stryke ifplugd[61760]: Link beat detected.

I'll monitor this in the next days given the random behavior sees in the past.
There are still strange errors when unplugging the USB-C adapter with the network port:

* netctl-ifplugd@eth0.service - Automatic wired network connection using netctl profiles
Loaded: loaded (/usr/lib/systemd/system/netctl-ifplugd@.service; disabled; vendor preset: disabled)
Active: active (running) since Wed 2020-09-23 14:37:02 CEST; 6min ago
Docs: man:netctl.special(7)
Main PID: 61760 (ifplugd)
Tasks: 1 (limit: 18770)
Memory: 4.6M
CGroup: /system.slice/system-netctl\x2difplugd.slice/netctl-ifplugd@eth0.service
`-61760 /usr/bin/ifplugd -i eth0 -r /etc/ifplugd/netctl.action -bfIns


set 23 14:43:23 stryke dhcpcd[61852]: ps_bpf_recvbpf: Network is down <------ Adapter unplugged
set 23 14:43:23 stryke dhcpcd[61839]: eth0: pid 0 deleted IP address 192.168.1.162/24
set 23 14:43:23 stryke dhcpcd[61839]: eth0: deleting route to 192.168.1.0/24
set 23 14:43:23 stryke dhcpcd[61839]: eth0: deleting default route via 192.168.1.254
set 23 14:43:23 stryke dhcpcd[61839]: eth0: rebinding lease of 192.168.1.162
set 23 14:43:23 stryke dhcpcd[61839]: eth0: if_getmtu: No such device <------- Error?
set 23 14:43:23 stryke dhcpcd[61839]: eth0: removing interface
set 23 14:43:23 stryke dhcpcd[61839]: dhcpcd exited
set 23 14:43:24 stryke ifplugd[61760]: Link beat lost.
set 23 14:43:30 stryke ifplugd[61760]: Executing '/etc/ifplugd/netctl.action eth0 down'.
set 23 14:43:30 stryke ifplugd[61760]: client: Stopping network profile 'ethernet'...
set 23 14:43:31 stryke ifplugd[61760]: client: /usr/lib/netctl/interface: line 46: /sys/class/net/eth0/flags: No such file or directory <------- ??
set 23 14:43:31 stryke ifplugd[61760]: client: Stopped network profile 'ethernet'
set 23 14:43:31 stryke ifplugd[61760]: Program executed successfully.

However, this is not a big issue since replugging the adapter a link beat is detected so dhcp is restarted correctly.
Comment by Nicola Mori (snack) - Thursday, 24 September 2020, 12:33 GMT
I just got the same issue after resuming from suspend:

set 24 14:29:38 stryke dhcpcd[893]: ps_bpf_recvbpf: Network is down
set 24 14:29:38 stryke dhcpcd[888]: eth0: pid 0 deleted IP address 192.168.1.162/24
set 24 14:29:38 stryke dhcpcd[888]: eth0: deleting route to 192.168.1.0/24
set 24 14:29:38 stryke dhcpcd[888]: eth0: deleting default route via 192.168.1.254
set 24 14:29:38 stryke dhcpcd[888]: eth0: rebinding lease of 192.168.1.162
set 24 14:29:38 stryke dhcpcd[888]: eth0: if_getmtu: No such device
set 24 14:29:38 stryke dhcpcd[888]: eth0: removing interface
set 24 14:29:38 stryke ifplugd[467]: Link beat lost.
set 24 14:29:38 stryke dhcpcd[888]: dhcpcd exited
set 24 14:29:43 stryke ifplugd[467]: Link beat detected.

So I'd say that adding the `DhcpcdOptions="--nolink"` line to the profile does not fix the problem. I'm going to try option 1, i.e. `ifplugd -d 0` by modifying the systemd unit file (@Jouke is this the right way to do that?).
Comment by Jouke Witteveen (jouke) - Thursday, 24 September 2020, 21:59 GMT
Thanks for testing, Nicola! Yup, for now that is the right way...
Comment by Nicola Mori (snack) - Friday, 25 September 2020, 12:00 GMT
@Jouke I think we are at the end of the journey. In my last resume log I see:

* netctl-ifplugd@eth0.service - Automatic wired network connection using netctl profiles
Loaded: loaded (/usr/lib/systemd/system/netctl-ifplugd@.service; disabled; vendor preset: disabled)
Active: active (running) since Fri 2020-09-25 09:58:25 CEST; 3h 54min ago
Docs: man:netctl.special(7)
Main PID: 482 (ifplugd)
Status: "online"
Tasks: 6 (limit: 18770)
Memory: 9.5M
CGroup: /system.slice/system-netctl\x2difplugd.slice/netctl-ifplugd@eth0.service
|- 482 /usr/bin/ifplugd -d 0 -i eth0 -r /etc/ifplugd/netctl.action -bfIns
|-30947 dhcpcd: eth0 [ip4]
|-30948 dhcpcd: [privileged actioneer] eth0 [ip4]
|-30949 dhcpcd: [control proxy] eth0 [ip4]
|-30952 dhcpcd: [BPF ARP] eth0 192.168.1.162
`-31116 dhcpcd: [network proxy] 192.168.1.162

set 25 09:58:34 stryke dhcpcd[701]: eth0: probing address 192.168.1.162/24
set 25 09:58:38 stryke dhcpcd[701]: eth0: leased 192.168.1.162 for 86400 seconds
set 25 09:58:38 stryke dhcpcd[701]: eth0: adding route to 192.168.1.0/24
set 25 09:58:38 stryke dhcpcd[701]: eth0: adding default route via 192.168.1.254
set 25 09:58:38 stryke ifplugd[482]: client: Started network profile 'ethernet'
set 25 09:58:38 stryke ifplugd[482]: Program executed successfully.
set 25 13:52:04 stryke dhcpcd[701]: eth0: carrier lost <---------- Resume begins here
set 25 13:52:04 stryke dhcpcd[712]: ps_bpf_recvbpf: Network is down
set 25 13:52:04 stryke dhcpcd[701]: eth0: deleting route to 192.168.1.0/24
set 25 13:52:04 stryke dhcpcd[701]: eth0: deleting default route via 192.168.1.254
set 25 13:52:04 stryke dhcpcd[701]: eth0: removing interface
set 25 13:52:04 stryke dhcpcd[701]: dhcpcd exited
set 25 13:52:05 stryke ifplugd[482]: Link beat lost.
set 25 13:52:06 stryke ifplugd[482]: Executing '/etc/ifplugd/netctl.action eth0 down'.
set 25 13:52:06 stryke ifplugd[482]: client: Stopping network profile 'ethernet'...
set 25 13:52:06 stryke ifplugd[482]: client: Stopped network profile 'ethernet'
set 25 13:52:06 stryke ifplugd[482]: Program executed successfully.
set 25 13:52:08 stryke ifplugd[482]: Link beat detected.
set 25 13:52:09 stryke ifplugd[482]: Executing '/etc/ifplugd/netctl.action eth0 up'.
set 25 13:52:09 stryke ifplugd[482]: client: Starting network profile 'ethernet'...
set 25 13:52:09 stryke dhcpcd[30945]: dhcpcd-9.2.0 starting
set 25 13:52:09 stryke dhcpcd[30947]: DUID 00:04:4c:4c:45:44:00:50:4b:10:80:4d:c4:c0:4f:4a:52:32
set 25 13:52:09 stryke dhcpcd[30947]: eth0: IAID 4c:a2:02:4e
set 25 13:52:10 stryke dhcpcd[30947]: eth0: rebinding lease of 192.168.1.162
set 25 13:52:13 stryke dhcpcd[30947]: eth0: probing address 192.168.1.162/24
set 25 13:52:18 stryke dhcpcd[30947]: eth0: leased 192.168.1.162 for 86400 seconds
set 25 13:52:18 stryke dhcpcd[30947]: eth0: adding route to 192.168.1.0/24
set 25 13:52:18 stryke dhcpcd[30947]: eth0: adding default route via 192.168.1.254
set 25 13:52:18 stryke ifplugd[482]: client: Started network profile 'ethernet'
set 25 13:52:18 stryke ifplugd[482]: Program executed successfully.

So it seems that on resume dhcpcd is shut down because eth0 is still not available, ifplugd looses the link beat and stops the profile, then when eth0 is re-added it detects the link beat and restarts dhcpcd. So far so good, no failed resume attempts up to now.
I still refrain to affirm that the issue is fixed, I'd like to test for a couple of days more, but for the first time I'm becoming optimistic.
Comment by Nicola Mori (snack) - Monday, 28 September 2020, 16:07 GMT
The resume-from-suspend issue seems to be fixed by the '-d 0' additional command line parameter for ifplugd. @Jouke thank you very much for your support and efforts!
Comment by Jouke Witteveen (jouke) - Friday, 09 October 2020, 12:05 GMT Comment by Nicola Mori (snack) - Monday, 12 October 2020, 16:27 GMT
Hi Jouke, I just manually patched the service file and for now it looks good. I did a couple of suspend/resume tests and found no problems. If you prefer me to do some more testing then I'll keep tou updated in the next days. Thanks.
Comment by Nicola Mori (snack) - Tuesday, 13 October 2020, 13:56 GMT
@Jouke The patched service file is ok for me. Thanks for your support, again.
Comment by Jouke Witteveen (jouke) - Friday, 16 October 2020, 11:50 GMT
@Nicola: Sorry for taking so long to get this fixed and sorry also for forgetting to acknowledge your help explicitly in the commit message of 833c18a. Thanks!

Loading...