FS#72614 - [lxd] ExecStop= defined in lxd.service interferes with lxd.socket, causing shutdown to hang

Attached to Project: Community Packages
Opened by Marcel Menzel (WRMSR) - Tuesday, 02 November 2021, 14:07 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:02 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To George Rawlinson (rawlinsong)
Morten Linderud (Foxboron)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description: LXD being stuck on shutdown because of lxd.socket systemd unit not closing the socket for "lxd shutdown" defined in ExecStop=


Additional info:
* package version(s) LXD 4.19
* config and/or log files etc. Default installation
* link to upstream bug report, if any N/A

Steps to reproduce:

- Install LXD (pacman -S lxd)
- Enable & start it (systemctl enable --now lxd)
- Try to stop lxd (systemctl stop lxd)

It's going to hang, because lxd shutdown (defined in ExecStop) is going to wait till the socket closes, but it's being held open by the lxd.socket file.
Masking the lxd.socket (systemctl mask lxd.socket) fixes this problem for me. Also, overriding the lxd.service to have an empty ExecStop= is fixing this problem for me aswell, because LXD does a proper shutdown on receiving a SIGTERM aswell:

^CINFO[11-02|15:07:28] Received signal signal=interrupt
INFO[11-02|15:07:28] Starting shutdown sequence


# /usr/lib/systemd/system/lxd.service
[Unit]
Description=LXD Container Hypervisor
After=network-online.target lxcfs.service
Requires=network-online.target lxcfs.service lxd.socket
Documentation=man:lxd(1)

[Service]
Environment=LXD_OVMF_PATH=/usr/share/ovmf/x64
ExecStart=/usr/bin/lxd --group=lxd --logfile=/var/log/lxd/lxd.log
ExecStartPost=/usr/bin/lxd waitready --timeout=600
ExecStop=/usr/bin/lxd shutdown
TimeoutStartSec=600s
TimeoutStopSec=30s
Restart=on-failure
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
Delegate=yes
KillMode=process

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/lxd.service.d/override.conf
[Service]
ExecStop=

This task depends upon

Closed by  Buggy McBugFace (bugbot)
Saturday, 25 November 2023, 20:02 GMT
Reason for closing:  Moved
Additional comments about closing:  https://gitlab.archlinux.org/archlinux/p ackaging/packages/lxd/issues/1
Comment by Marcel Menzel (WRMSR) - Wednesday, 03 November 2021, 08:35 GMT
Unfortunately, I just found out that the "lxd shutdown" is needed in order to shutdown containers gracefully. So the masking of lxd.socket is needed for me to get a proper shutdown procedure.
Comment by Michal Potrzebicz (elevendroids) - Sunday, 07 November 2021, 02:20 GMT
Hi Marcel,

I was looking into the same issue, as delayed shutdowns are a bit annoying.
After investigating it a bit, it seems that the "lxd shutdown" was broken in 4.19 - it reactivated the lxd daemon (via the Unix socket) just after it had shut down as requested.

This seems to be fixed in 4.20 - which has just been released:
https://github.com/lxc/lxd/pull/9334
https://github.com/lxc/lxd/releases/tag/lxd-4.20

Updated Arch package has been released two days ago (2021-11-05) - shutdown now works fine.
Comment by Morten Linderud (Foxboron) - Sunday, 07 November 2021, 11:12 GMT
I'm not going to close this even if lxd shutdown was broken. There are some issues with the current service which is similar in vein to the raised issue. This is mostly because the LXD services are not distribute upstream.

The plan is to get these services/sockets more inline with how OpenSUSE does it, but I have just forgotten it inbetween upgrades.

https://build.opensuse.org/package/show/Virtualization:containers/lxd
Comment by Michal Potrzebicz (elevendroids) - Sunday, 07 November 2021, 12:37 GMT
OK, fair enough.

Just a side note:
OpenSUSE repo does not seem to use the socket activation / graceful container shutdown (no lxd.socket, no lxd shutdown invocation in the service).

I'd suggest looking at the lxd COPR package for Fedora - which follows the behaviour of original Debian/Ubuntu packages (https://github.com/lxc/lxd-pkg-ubuntu/tree/dpm-xenial/debian):
https://github.com/ganto/copr-lxc4/tree/master/lxd

The idea is to use an auxiliary "lxd-containers" service which activates the main daemon when needed (if there are any containers/vms to start) and stops all containers on shutdown with some generous timeout to make sure everything has cleanly exited (as noted by the reporter in the first comment here).
Note that they're using an extra "shutdown" script which calls "lxd shutdown" only when the main daemon has been activated (otherwise "lxd shutdown" would start up the daemon via socket activation only to tell it to shutdown).

Loading...