FS#56812 - [systemd] does not start enabled units for no reason

Attached to Project: Arch Linux
Opened by Noel Kuntze (thermi) - Thursday, 21 December 2017, 21:18 GMT
Last edited by Dave Reisner (falconindy) - Tuesday, 02 January 2018, 16:21 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To No-one
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

systemd does not start units, despite them being enabled and the dependencies already having started successfully.
The target that the unit wants is the target systemd is configured to boot into.
I have reproduced this problem by disabling and enabling a unit. It seems to happen randomly to services, too. E.g. named.service (belongs to bind9).
This happens on at least two distinct machines (x86_64).

The unit starts normally and works, if it's started using `systemctl start <unit>`.

Additional info:
* package version(s)
systemd version 235
* config and/or log files etc.


Steps to reproduce:
1. Enable a unit/have a unit enabled
2. Verify it's actually pulled in by a target
3. reboot
4. check the status
This task depends upon

Closed by  Dave Reisner (falconindy)
Tuesday, 02 January 2018, 16:21 GMT
Reason for closing:  Not a bug
Additional comments about closing:  User configuration error
Comment by Noel Kuntze (thermi) - Thursday, 21 December 2017, 21:19 GMT
Addition: There is no error from the unit in the journal, nor an indication that systemd ever tried to run it.
Comment by Daniel M. Capella (polyzen) - Thursday, 21 December 2017, 22:40 GMT
sheep from IRC asked, "does his unit have an Install section with a WantedBy directive?" Please provide more details, such as the units involved.
Comment by Noel Kuntze (thermi) - Thursday, 21 December 2017, 22:49 GMT
Yes, of course it does. It enables fine, there's no error. This even occurs with units that were enabled before, as is mentioned in the description.
Units I encountered this so far:
lightdm.service
named.service
strongswan-swanctl.service


The units are only running because I started them manually.
As you can see, they're enabled and hence should start automatically.
multi-user.target is pulled in.

```
systemctl status -n0 named
● named.service - Internet domain name server
Loaded: loaded (/usr/lib/systemd/system/named.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/named.service.d
└─override.conf
Active: active (running) since Thu 2017-12-21 22:11:44 CET; 1h 32min ago
Main PID: 5300 (named)
Tasks: 5 (limit: 4915)
Memory: 33.3M
CGroup: /system.slice/named.service
└─5300 /usr/bin/named -f -u named

systemctl cat named
# /usr/lib/systemd/system/named.service
[Unit]
Description=Internet domain name server
After=network.target

[Service]
ExecStart=/usr/bin/named -f -u named
ExecReload=/usr/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/named.service.d/override.conf
[Unit]
Before=dnsmasq.service
```

```
systemctl cat lightdm
# /usr/lib/systemd/system/lightdm.service
[Unit]
Description=Light Display Manager
Documentation=man:lightdm(1)
Conflicts=getty@tty1.service
After=getty@tty1.service systemd-user-sessions.service plymouth-quit.service acpid.service

[Service]
ExecStart=/usr/bin/lightdm
Restart=always
IgnoreSIGPIPE=no
BusName=org.freedesktop.DisplayManager

[Install]
Alias=display-manager.service

systemctl status lightdm
● lightdm.service - Light Display Manager
Loaded: loaded (/usr/lib/systemd/system/lightdm.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2017-12-21 17:25:59 CET; 6h ago
Docs: man:lightdm(1)
Main PID: 838
Tasks: 13 (limit: 4915)
CGroup: /system.slice/lightdm.service
├─838 /usr/bin/lightdm
└─953 /usr/lib/xorg-server/Xorg :7 -seat seat0 -auth /run/lightdm/root/:7 -nolisten tcp vt7 -novtswitch

Dez 21 17:25:57 thermi-pc.thermicorp.lan systemd[1]: Starting Light Display Manager...
Dez 21 17:25:59 thermi-pc.thermicorp.lan systemd[1]: Started Light Display Manager.
Dez 21 17:26:14 thermi-pc.thermicorp.lan lightdm[838]: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
Dez 21 17:26:14 thermi-pc.thermicorp.lan lightdm[3729]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
Dez 21 17:26:46 thermi-pc.thermicorp.lan lightdm[838]: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
Dez 21 17:26:46 thermi-pc.thermicorp.lan lightdm[5949]: pam_unix(lightdm:session): session opened for user thermi by (uid=0)

```

```
┌─[root][thermi][/home/thermi]
└──╼ systemctl status strongswan-swanctl -n0
● strongswan-swanctl.service - strongSwan IPsec IKEv1/IKEv2 daemon using swanctl
Loaded: loaded (/etc/systemd/system/strongswan-swanctl.service; enabled; vendor preset: disabled)
Active: inactive (dead)
┌─[root][thermi][/home/thermi]
└──╼ systemctl cat strongswan-swanctl -n0
# /etc/systemd/system/strongswan-swanctl.service
[Unit]
Description=strongSwan IPsec IKEv1/IKEv2 daemon using swanctl
Requires=network.target
After=network.target

[Service]
Type=notify
ExecStartPre=-/usr/bin/ip xfrm policy flush
ExecStartPre=-/usr/bin/ip xfrm state flush
ExecStartPre=-/usr/bin/ip route flush table 220
ExecStart=/usr/bin/charon-systemd
ExecStartPost=-/usr/bin/swanctl --load-all --noprompt
ExecReload=-/usr/bin/swanctl --reload
Restart=on-failure

PermissionsStartOnly=true
#User=strongswan
#Group=strongswan

CapabilityBoundingSet=CAP_NET_ADMIN CAP_NET_RAW CAP_NET_BIND_SERVICE
NoNewPrivileges=true
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX AF_NETLINK
MemoryDenyWriteExecute=true
ProtectSystem=full
ProtectHome=true
PrivateTmp=true
PrivateDevices=true
RestrictRealtime=true
ReadWritePaths=/var/log/charon.log
[Install]
WantedBy=multi-user.target
┌─[root][thermi][/home/thermi]
└──╼ systemctl status multi-user.target
● multi-user.target - Multi-User System
Loaded: loaded (/usr/lib/systemd/system/multi-user.target; indirect; vendor preset: disabled)
Active: active since Fri 2017-12-08 10:46:44 CET; 1 weeks 6 days ago
Docs: man:systemd.special(7)

Dez 08 10:46:44 thermi.consulting systemd[1]: Reached target Multi-User System.
```
Comment by Doug Newgard (Scimmia) - Saturday, 23 December 2017, 08:03 GMT
There's nothing usable here. No logs besides ones that show everything working as it should.
Comment by Doug Newgard (Scimmia) - Wednesday, 27 December 2017, 18:13 GMT
Ping? Are we going to get usable logs?
Comment by Noel Kuntze (thermi) - Friday, 29 December 2017, 01:50 GMT
Doug, there are no usable logs anywhere. There is neither
* any error regarding the units in the logs
* Nor any indication it ever tried starting it

The units' status is "dead". systemd itself says it never tried starting it.
The units run fine though if they are started manually, *which is what I did and also wrote in the details*.
Comment by Dave Reisner (falconindy) - Friday, 29 December 2017, 01:57 GMT
> * Nor any indication it ever tried starting it
> The units' status is "dead". systemd itself says it never tried starting it.

You can't have it both ways...

Booting with debug level logging will produce useful logs.
Comment by Noel Kuntze (thermi) - Tuesday, 02 January 2018, 13:44 GMT
Dave, that is wrong.
systemctl says exactly that for units that it never started. They are dead.

systemd detected a dependency cycle and deleted a start job. That somehow caused itself to delete the start jobs for units that are enabled themselves. It only logs that if debug logging is enabled. The complete log is 13016 lines long.

Jan 02 14:35:33 thermi.consulting systemd[1]: network.target: Found ordering cycle on iptables.service/start
Jan 02 14:35:33 thermi.consulting systemd[1]: network.target: Found dependency on dns_ipset.service/start
Jan 02 14:35:33 thermi.consulting systemd[1]: network.target: Found dependency on network.target/start
Jan 02 14:35:33 thermi.consulting systemd[1]: network.target: Job iptables.service/start deleted to break ordering cycle start
ing with network.target/start
Jan 02 14:35:33 thermi.consulting systemd[1]: smbd.service: Deleting job smbd.service/start as dependency of job iptables.serv
ice/start
Jan 02 14:35:33 thermi.consulting systemd[1]: network.target: Deleting job network.target/start as dependency of job iptables.
service/start
Jan 02 14:35:33 thermi.consulting systemd[1]: znc.service: Deleting job znc.service/start as dependency of job network.target/
start
Jan 02 14:35:33 thermi.consulting systemd[1]: privrouteserver.service: Deleting job privrouteserver.service/start as dependenc
y of job network.target/start
Jan 02 14:35:33 thermi.consulting systemd[1]: strongswan-swanctl.service: Deleting job strongswan-swanctl.service/start as dependency of job network.target/start

Comment by Dave Reisner (falconindy) - Tuesday, 02 January 2018, 14:15 GMT
"dead" does not mean "never started". Do not conflate the two.

The dependency cycle is nearly always shown (not just in debug logging) since it's done at the warning level:

https://github.com/systemd/systemd/blob/master/src/core/transaction.c#L411-L416

I suppose because there's an ordering cycle, there's some amount of undefined behavior and you see random units not included in the initial transaction. The fix is clear -- you need to resolve your ordering cycle, and I'd suggest looking at both dns_ipset.service and privrouteserver.service as the first suspects.
Comment by Noel Kuntze (thermi) - Tuesday, 02 January 2018, 15:36 GMT
In this case, it means exactly that. Shown in this example with the unused rsyslog.service.

systemctl status rsyslog
● rsyslog.service - System Logging Service
Loaded: loaded (/etc/systemd/system/rsyslog.service; disabled; vendor preset: disabled)
Active: inactive (dead)

I fixed the dependency cycles now. The problems were not with those units. The problems were with the ordering in iptables.service and some other units.
Comment by Dave Reisner (falconindy) - Tuesday, 02 January 2018, 16:20 GMT
"never ran" and "ran at some time in the past" are both described as dead.

Loading...