FS#77571 - [systemd] 253-1 changes behaviour of systemd-networkd-wait-online.service, fail to start services
Attached to Project:
Arch Linux
Opened by Joakim Ekblad (aio) - Sunday, 19 February 2023, 13:27 GMT
Last edited by Toolybird (Toolybird) - Thursday, 15 June 2023, 05:51 GMT
Opened by Joakim Ekblad (aio) - Sunday, 19 February 2023, 13:27 GMT
Last edited by Toolybird (Toolybird) - Thursday, 15 June 2023, 05:51 GMT
|
Details
Description:
If you have services listening to a specific IP-address e.g. "ListenAddress" with sshd or any other service, they will probably fail to start at boot as of systemd-253-1. There is a change of behavior in systemd-networkd-wait-online, where you will need to add either --any or specific --interface to maintain the logical and old way wait-online worked. Services listening on 0.0.0.0 or "any" will probably start just fine. It's only the ones with a specific listen address that fails, since network-online.target will be reached regardless if the network is up or not due to changes in systemd. Maybe this is worthy of a "manual intervention" notification or some generic change to the systemd-neworkd-wait-online.service file? Additional info: * systemd-253-1 * network configured with systemd-networkd (no netctl, no NetworkManager etc.) * Info: https://github.com/systemd/systemd/releases (see the part about systemd-networkd-wait-online) Steps to reproduce: Upgrade systemd, see your hardened servers fail with all its glory :D |
This task depends upon
Closed by Toolybird (Toolybird)
Thursday, 15 June 2023, 05:51 GMT
Reason for closing: Upstream
Additional comments about closing: See comments
Thursday, 15 June 2023, 05:51 GMT
Reason for closing: Upstream
Additional comments about closing: See comments
"network configured with systemd-networkd (no netctl, no NetworkManager etc.)"
I have no un-fixed system currently. I changed my systemd-networkd-wait-online.servcice, on all 20 or so systems, to have the --interface option to the binary, as suggested in the initial report. So i can't really provide much more info without breaking one of the systems.
What I believe happens is: network-online.target becomes available directly (not waiting for any network-devices to come online) and services depending on this obviously try to start as expected when network-online.target is reached.
For example i have sshd listening on a specific IP (NIC) with "ListenAddress" set to my Ethernet cards IP, same with sockd and tinyproxy. They all listen on a specific IP. All of them failed. But services listening on any interface started. Since that address is not resolvable until the nics are up the daemons fail.
Sooner or alter systemd-networkd are done, and the services could be manually restarted successfully. This stuff just affects the dependencies between services during boot.
I suppose many people will wonder why services won't start anymore (if listening on specific IP's) though.
Adding options to the call to "systemd-networkd-wait-online" like -4 or -i with the particular interface seem not to have any effect and/or change.
I can see that on a system running 252-5.1 the following targets are reached (in this order):
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Reached target Local Encrypted Volumes.
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Reached target Local Integrity Protected Volumes.
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Reached target Path Units.
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Reached target Remote File Systems.
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Reached target Slice Units.
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Reached target Swaps.
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Reached target Local Verity Protected Volumes.
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Reached target Preparation for Local File Systems.
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Reached target Local File Systems.
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Reached target System Time Set.
Feb 25 06:25:09 ip-172-31-50-96 systemd[1]: Reached target Host and Network Name Lookups.
Feb 25 06:25:09 ip-172-31-50-96 systemd[1]: Reached target System Initialization.
Feb 25 06:25:09 ip-172-31-50-96 systemd[1]: Reached target Timer Units.
Feb 25 06:25:09 ip-172-31-50-96 systemd[1]: Reached target Socket Units.
Feb 25 06:25:09 ip-172-31-50-96 systemd[1]: Reached target Basic System.
Feb 25 06:25:16 ip-172-31-50-96 systemd[1]: Reached target Preparation for Network.
Feb 25 06:25:16 ip-172-31-50-96 systemd[1]: Reached target Network.
Feb 25 06:25:20 ip-172-31-50-96 systemd[1]: Reached target Cloud-config availability.
Feb 25 06:25:20 ip-172-31-50-96 systemd[1]: Reached target Network is Online.
Feb 25 06:25:20 ip-172-31-50-96 systemd[1]: Reached target Login Prompts.
Feb 25 06:25:20 ip-172-31-50-96 systemd[1]: Reached target Multi-User System.
Feb 25 06:25:21 ip-172-31-50-96 systemd[1]: Reached target Cloud-init target.
After upgrading the system to 253 we only get these targets:
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Reached target Local Encrypted Volumes.
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Reached target Local Integrity Protected Volumes.
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Reached target Path Units.
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Reached target Slice Units.
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Reached target Swaps.
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Reached target Local Verity Protected Volumes.
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Reached target Preparation for Network.
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Reached target Preparation for Local File Systems.
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Reached target Local File Systems.
Feb 25 06:28:17 ip-172-31-50-96 systemd[1]: Reached target System Time Set.
Feb 25 06:28:17 ip-172-31-50-96 systemd[1]: Reached target Network.
Feb 25 06:28:17 ip-172-31-50-96 systemd[1]: Reached target Host and Network Name Lookups.
Feb 25 06:28:17 ip-172-31-50-96 systemd[1]: Reached target System Initialization.
Feb 25 06:28:17 ip-172-31-50-96 systemd[1]: Reached target Timer Units.
Feb 25 06:28:17 ip-172-31-50-96 systemd[1]: Reached target Socket Units.
Feb 25 06:28:17 ip-172-31-50-96 systemd[1]: Reached target Basic System.
Feb 25 06:28:17 ip-172-31-50-96 systemd[1]: Reached target Multi-User System.
In essence these targets are never reached after the upgrade:
Remote File Systems.
Cloud-config availability.
Network is Online.
Login Prompts.
Cloud-init target.
Services started on 252.5-1
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Create List of Static Device Nodes...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Load Kernel Module configfs...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Load Kernel Module drm...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Load Kernel Module fuse...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Load Kernel Modules...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Generate network units from Kernel command line...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Remount Root and Kernel File Systems...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Coldplug All udev Devices...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Load/Save Random Seed...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Apply Kernel Variables...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Create System Users...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Create Static Device Nodes in /dev...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Rebuild Dynamic Linker Cache...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Journal Service...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Rule-based Manager for Device Events and Files...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Flush Journal to Persistent Storage...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Create Volatile Files and Directories...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Security Auditing Service...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Rebuild Journal Catalog...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Network Name Resolution...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Network Time Synchronization...
Feb 25 06:25:08 ip-172-31-50-96 systemd[1]: Starting Update is Completed...
Feb 25 06:25:09 ip-172-31-50-96 systemd[1]: Starting Record System Boot/Shutdown in UTMP...
Feb 25 06:25:09 ip-172-31-50-96 systemd[1]: Starting Initial cloud-init job (pre-networking)...
Feb 25 06:25:09 ip-172-31-50-96 systemd[1]: Starting D-Bus System Message Bus...
Feb 25 06:25:09 ip-172-31-50-96 systemd[1]: Starting Name Service Cache Daemon...
Feb 25 06:25:09 ip-172-31-50-96 systemd[1]: Starting User Login Management...
Feb 25 06:25:16 ip-172-31-50-96 systemd[1]: Starting Network Configuration...
Feb 25 06:25:17 ip-172-31-50-96 systemd[1]: Starting Wait for Network to be Configured...
Feb 25 06:25:17 ip-172-31-50-96 systemd[1]: Starting Hostname Service...
Feb 25 06:25:19 ip-172-31-50-96 systemd[1]: Starting Initial cloud-init job (metadata service crawler)...
Feb 25 06:25:20 ip-172-31-50-96 systemd[1]: Starting Apply the settings specified in cloud-config...
Feb 25 06:25:20 ip-172-31-50-96 systemd[1]: Starting Permit User Sessions...
Feb 25 06:25:20 ip-172-31-50-96 systemd[1]: Starting Execute cloud user/final scripts...
Feb 25 06:25:44 ip-172-31-50-96 systemd[1]: Starting User Runtime Directory /run/user/1000...
Feb 25 06:25:44 ip-172-31-50-96 systemd[1]: Starting User Manager for UID 1000...
Services Started after upgrade to 253.1 (no changes where made to the systemd settings):
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Create List of Static Device Nodes...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Load Kernel Module configfs...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Load Kernel Module dm_mod...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Load Kernel Module fuse...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Load Kernel Module loop...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Load Kernel Modules...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Generate network units from Kernel command line...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Remount Root and Kernel File Systems...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Coldplug All udev Devices...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Load/Save OS Random Seed...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Apply Kernel Variables...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Create System Users...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Create Static Device Nodes in /dev...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Rebuild Dynamic Linker Cache...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Journal Service...
Feb 25 06:28:14 ip-172-31-50-96 systemd[1]: Starting Rule-based Manager for Device Events and Files...
Feb 25 06:28:15 ip-172-31-50-96 systemd[1]: Starting Flush Journal to Persistent Storage...
Feb 25 06:28:15 ip-172-31-50-96 systemd[1]: Starting Network Configuration...
Feb 25 06:28:15 ip-172-31-50-96 systemd[1]: Starting Create Volatile Files and Directories...
Feb 25 06:28:16 ip-172-31-50-96 systemd[1]: Starting Rebuild Journal Catalog...
Feb 25 06:28:16 ip-172-31-50-96 systemd[1]: Starting Network Name Resolution...
Feb 25 06:28:16 ip-172-31-50-96 systemd[1]: Starting Network Time Synchronization...
Feb 25 06:28:16 ip-172-31-50-96 systemd[1]: Starting Record System Boot/Shutdown in UTMP...
Feb 25 06:28:17 ip-172-31-50-96 systemd[1]: Starting Update is Completed...
Feb 25 06:28:17 ip-172-31-50-96 systemd[1]: Starting D-Bus System Message Bus...
Feb 25 06:28:17 ip-172-31-50-96 systemd[1]: Starting Hostname Service...
Reported this with systemd: https://github.com/systemd/systemd/issues/26613
And I am not even sure the issue described by zee99 is the same as initially reported here.
I removed my work-around with "--interface ens0" on one of my qemu-guests again (like it was before all this started). Still making sshd only listen on a specific (ens0) IP address. With "ListenAddress". And the problem remains (or the new behavior remains, rather). systemd-networkd-wait-online.service just falls through, not waiting for anything and the boot-up process will after a while try to start sshd.service, and sshd can not bind to the specified interface, since it does not exist yet. Resulting in "Failed to start OpenSSH Daemon", obviously.
If I after a while manually login (using VNC to the QEMU instance in this case, since the machines do not have any physical keyboard/monitor), i can se that the NIC is eventually brought up, and i can ofc manually start/restart sshd.service once they are.
So I will go back to manually editing an override with "--interface ens0" in the systemd-networkd-wait-online.service. One can wonder... Why systemd-project did not provide systemd-networkd-wait-really-online.service since this new "feature" was implemented. :D
Quoting the release notes of systemd:
"systemd-networkd-wait-online exits successfully when all interfaces
are ready or unmanaged. Previously, if neither '--any' nor
'--interface=' options were used, at least one interface had to be in
configured state. This change allows the case where systemd-networkd
is enabled, but no interfaces are configured, to be handled
gracefully. It may occur in particular when a different network
manager is also enabled and used."
My /etc/systemd/network/ens0.network has "Virtualization=true" ... And I have static IP assignment and such. So it should come up quite quickly. Which I believe is correct (i may believe wrong) if that makes any difference in the grans scheme of things.
Re
$ networkctl
IDX LINK TYPE OPERATIONAL SETUP
1 lo loopback carrier unmanaged
2 ens3 ether routable configured
(on a system that is booted up)
systemd-networkd-wait-online.service unit file example of the above:
[Service]
ExecStart=/usr/lib/systemd/systemd-networkd-wait-online --timeout=0
Does this make a difference? As --timeout=0 from the manual page states:
'Fail the service if the network is not online by the time the timeout elapses. A timeout of 0 disables the timeout. Defaults to 120 seconds.'
My guess: Your system boots and the services start, including `systemd-networkd-wait-online.service` which waits for all (currently none) available network interfaces. Some services fail due to missing address on interface. Some time later the network interface (ens0? or ens3? - both are mentioned above, probably different systems) becomes available and is configured.
Any chance you verify this from log messages?
And what does your device configuration in Qemu look like? Would be interesting that know what virtual hardware is used... Perhaps selecting something like "virtio-net-pci-non-transitional" (which is non-hotplug if I understand correctly) helps?
Possibly related [1]? (probably not..)
[1] https://gitlab.archlinux.org/archlinux/arch-boxes/-/issues/158#note_92582
@eworm : the same thing happened on non-virtual machines too.
Has no-one else set-up their services to listen on a specific IP (NIC) and experienced the same issue?
I see that there has come quite a lot of systemd releases since I reported this, tho.
[1] https://github.com/systemd/systemd/issues/27822