FS#72611 - Loop during systemd-shutdown with root on ZFS. 'Failed to unmount /oldroot: Device or resource busy'

Attached to Project: Arch Linux
Opened by Sean Champ (spchamp) - Tuesday, 02 November 2021, 12:42 GMT
Last edited by Toolybird (Toolybird) - Thursday, 14 September 2023, 02:50 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Christian Hesse (eworm)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

I've installed Arch Linux to ZFS under a virtual machine environment, using an Arch Linux installation on an Ext4 partition under a second virtual machine environment. The Arch Linux installation on ZFS now boots from the volumes under the ZFS boot and root pools successfully. However, I'm noticing a persistent loop during systemd-shutdown.

I've enabled serial logging with the Linux kernel, via Grub configuration. Subsequently, I've configured the virtual machine to send all ttyS0 output to a file on the virtual machine host. From the logging output that I've captured there, the following console output appears to represent the main section of the loop during systemd-shutdown:

----
[ 1561.324253] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[ 1561.336930] systemd-shutdown[1]: Unmounting file systems.
[ 1561.341650] [563]: Remounting '/var' read-only in with options 'noxattr,noacl'.
[ 1561.783555] [564]: Unmounting '/var'.
[ 1561.790194] [565]: Remounting '/usr' read-only in with options 'noxattr,noacl'.
[ 1561.796853] [566]: Remounting '/' read-only in with options 'noxattr,noacl'.
[ 1561.803420] systemd-shutdown[1]: All filesystems unmounted.
[ 1561.809423] systemd-shutdown[1]: Deactivating swaps.
[ 1561.813672] systemd-shutdown[1]: All swaps deactivated.
[ 1561.818688] systemd-shutdown[1]: Detaching loop devices.
[ 1561.824354] systemd-shutdown[1]: All loop devices detached.
[ 1561.829085] systemd-shutdown[1]: Stopping MD devices.
[ 1561.833346] systemd-shutdown[1]: All MD devices stopped.
[ 1561.837982] systemd-shutdown[1]: Detaching DM devices.
[ 1561.842299] systemd-shutdown[1]: All DM devices detached.
[ 1561.847501] systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
[ 1561.856852] systemd-shutdown[1]: Successfully changed into root pivot.
[ 1561.863478] systemd-shutdown[1]: Returning to initrd...
[ 1562.306958] shutdown[1]: Syncing filesystems and block devices.
[ 1562.311377] shutdown[1]: Sending SIGTERM to remaining processes...
[ 1562.316903] shutdown[1]: Sending SIGKILL to remaining processes...
[ 1562.329374] shutdown[1]: Unmounting file systems.
[ 1562.334319] sd-remou[569]: Remounting '/' read-only in with options 'noxattr,noacl'.
[ 1562.346902] sd-umoun[570]: Unmounting '/oldroot/sys/kernel/config'.
[ 1562.354595] sd-umoun[571]: Unmounting '/oldroot/sys/fs/fuse/connections'.
[ 1562.361525] sd-umoun[572]: Unmounting '/oldroot/sys/kernel/tracing'.
[ 1562.368232] sd-umoun[573]: Unmounting '/oldroot/sys/kernel/debug'.
[ 1562.373715] sd-umoun[574]: Unmounting '/oldroot/dev/mqueue'.
[ 1562.380105] sd-umoun[575]: Unmounting '/oldroot/dev/hugepages'.
[ 1562.386843] sd-umoun[576]: Unmounting '/oldroot/sys/fs/bpf'.
[ 1562.391984] sd-umoun[577]: Unmounting '/oldroot/sys/fs/pstore'.
[ 1562.397417] sd-umoun[578]: Unmounting '/oldroot/sys/fs/cgroup'.
[ 1562.404234] audit: type=1334 audit(1635363440.667:81): prog-id=8 op=UNLOAD
[ 1562.410100] sd-umoun[579]: Unmounting '/oldroot/dev/pts'.
[ 1562.414579] audit: type=1334 audit(1635363440.667:82): prog-id=6 op=UNLOAD
[ 1562.420101] sd-umoun[580]: Unmounting '/oldroot/dev/shm'.
[ 1562.426852] sd-umoun[581]: Unmounting '/oldroot/sys/kernel/security'.
[ 1562.432165] sd-umoun[582]: Unmounting '/oldroot/usr'.
[ 1562.443869] sd-remou[583]: Remounting '/oldroot/run' read-only in with options 'noxattr,noacl'.
[ 1562.456905] sd-umoun[584]: Unmounting '/oldroot/run'.
[ 1562.461323] sd-umoun[585]: Unmounting '/oldroot'.
[ 1562.466800] sd-umoun[585]: Failed to unmount /oldroot: Device or resource busy
[ 1562.473458] sd-umoun[586]: Unmounting '/oldroot/dev'.
[ 1562.478395] sd-umoun[587]: Unmounting '/oldroot/sys'.
[ 1562.483288] sd-umoun[588]: Unmounting '/oldroot/proc'.
[ 1562.488082] sd-umoun[589]: Unmounting '/oldroot'.
[ 1562.493279] shutdown[1]: All filesystems unmounted.
[ 1562.498851] shutdown[1]: Deactivating swaps.
[ 1562.502456] shutdown[1]: All swaps deactivated.
[ 1562.506839] shutdown[1]: Detaching loop devices.
[ 1562.511505] shutdown[1]: All loop devices detached.
[ 1562.515404] shutdown[1]: Stopping MD devices.
[ 1562.520083] shutdown[1]: All MD devices stopped.
[ 1562.523666] shutdown[1]: Detaching DM devices.
[ 1562.527617] shutdown[1]: All DM devices detached.
[ 1562.531822] shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
[ 1562.540145] shutdown[1]: Successfully changed into root pivot.
[ 1562.544940] shutdown[1]: Returning to initrd...
----

I believe that the error message,"Failed to unmount /oldroot: Device or resource busy'" might be indicative of why the loop is occurring?

Trying to develop any workaround for the issue of this loop during shutdown, I've added a systemd service that exports all of the online ZFS pools on the system, after filesystems are unmounted. The console output shows that that service is reached, so I assume that the boot and root pools are being cleanly exported before the shutdown process reaches systemd-shutdown. The loop still occurs during shutdown.

Subsequently, I've configured journalctl for volatile logging and added a service definition that runs '/usr/bin/journalctl --flush' after boot. The '/var' filesystem is now successfully unmounted during shutdown, and I don't believe I'm loosing any lot of log data with this configuration change. The loop still occurs.

I've located the source code for systemd-shutdown.
https://github.com/systemd/systemd/blob/v249/src/shutdown/shutdown.c

I'm not certain if I'd need to patch and rebuild systemd, to try to determine what's hanging up the unmount for `/oldroot` in that instance?

This is occurring with systemd versions 249.5-2 and 249.5-3, linux pkg version 5.14.14.arch-1, zfs-linux pkg version 2.1.1_5.14.14.arch-1.1-1, mkinitcpio version 30-2

I'm not certain if this issue may be specific to ZFS systems. I've not seen it with the Arch Linux installation under ext4, under the same virtual machine host system, namely VirtualBox. I am seeing this loop with the Arch Linux installation on ZFS, in a VritualBox VM.

This loop during shutdown does not prevent normal system usage, up to the point of the loop, when the system then loops indefinitely. It does appear to require attention throughout the shutdown process, however. I'm only able to power off the virtual machine with hard poweroff, once that loop is entered.

Steps to reproduce:

I'm not certain of how reproducible this bug may be. I've not as yet created any second root-on-ZFS installation with Arch, although I assume that it would arrive at the same loop. In the present root-on-ZFS installation, I believe it was a fairly straightforward install -- I'd created the ZFS pools with a VM running FreeBSD, subsequently activated the same virtual disk image in the Arch VM again, then installed Arch with 'pacstrap' and subsequent tools under Arch. After working out the Grub configuration for the Arch Linux installation on ZFS - using separate boot and root pools, with the boot pool compatible with Grub - the system has booted successfully and appears to be normally usable, except for that loop during shutdown.

I'm not certain what's hanging up the unmount for `/oldroot'` in that instance. I believe systemd-shutdown is running under the initramfs system, by then?
This task depends upon

Closed by  Toolybird (Toolybird)
Thursday, 14 September 2023, 02:50 GMT
Reason for closing:  Not a bug
Additional comments about closing:  ZFS unsupported here. Likely been fixed anyway, see  FS#63697 
Comment by Sean Champ (spchamp) - Tuesday, 02 November 2021, 12:47 GMT
Maybe there's a dangling mount under /oldroot at that point? The error message resembles what 'umount' shows when trying to unmount a filesystem that still has a volume mounted under the same.

and yet, from the earlier console output, it would seem that everything was successfully unmounted before systemd-shutdown was reached:
----
[ OK ] Stopped Create Volatile Files and Directories.
[ OK ] Stopped target Local File Systems.
Unmounting /boot/grub...
Unmounting /home...
Unmounting /opt...
Unmounting Temporary Directory /tmp...
Unmounting /usr/local/libexec...
Unmounting /usr/local/src...
Unmounting /usr/pkg...
Unmounting /usr/src...
Unmounting /var/cache/ccache...
Unmounting /var/cache/pacman...
Unmounting /var/crash...
Unmounting /var/lib...
Unmounting /var/service...
Unmounting /var/spool...
[ OK ] Stopped Network Configuration.
[ OK ] Unmounted /boot/grub.
[ OK ] Closed Network Service Netlink Socket.
Unmounting /boot...
[ OK ] Stopped Apply Kernel Variables.
[ OK ] Stopped Load Kernel Modules.
[ OK ] Unmounted /boot.
[ OK ] Unmounted /home.
[ OK ] Unmounted /opt.
[ OK ] Unmounted Temporary Directory /tmp.
[ OK ] Unmounted /usr/local/libexec.
[ OK ] Unmounted /usr/local/src.
[ OK ] Unmounted /usr/pkg.
[ OK ] Unmounted /usr/src.
[ OK ] Unmounted /var/cache/ccache.
[ OK ] Unmounted /var/cache/pacman.
[ OK ] Unmounted /var/crash.
[ OK ] Unmounted /var/lib.
[ OK ] Unmounted /var/service.
[ OK ] Unmounted /var/spool.
[ OK ] Stopped target Swaps.
Unmounting /usr/local...
Unmounting /var/cache...
[ OK ] Unmounted /usr/local.
[ OK ] Unmounted /var/cache.
Unmounting /var...
[ OK ] Unmounted /var.
[ OK ] Stopped target Preparation for Local File Systems.
[ OK ] Reached target Unmount All Filesystems.
Stopping Monitoring of LVM…meventd or progress polling...
[ OK ] Stopped Remount Root and Kernel File Systems.
----
Comment by Sean Champ (spchamp) - Tuesday, 02 November 2021, 13:15 GMT
For information purposes, the systemd services I've added:

/usr/lib/systemd/system/zfs-export.service
----
[Unit]
Description=ZPool Export
DefaultDependencies=no
After=local-fs-pre.target

[Service]
ExecStart=/bin/true
ExecStop=/usr/bin/zpool export -a
RemainAfterExit=yes
Type=oneshot

[Install]
WantedBy=halt.target
WantedBy=poweroff.target
WantedBy=reboot.target
WantedBy=kexec.target
----

The previous service may serve to ensure that all online ZFS pools are exported during shutdown. Perhaps it may be redundant, beside any functionality that may be provided in ZFS on Linux.


Secondly /usr/lib/systemd/system/journalctl/journalctl-pivot.service
----
[Unit]
Description=Journalctl var pivot
DefaultDependencies=no
After=local-fs.target

[Service]
ExecStart=/usr/bin/journalctl --flush
ExecStop=/bin/true
ExecStopPost=/usr/bin/journalctl --relinquish-var
RemainAfterExit=yes
Type=oneshot

[Install]
WantedBy=local-fs.target
----

Albeit, I'm not certain if the 'ExecStop' or 'ExecStopPost' actions in the journalctl-pivot service are ever evaluated. The 'ExecStart' action does show up in the console log.

The journalctl-pivot.service was installed after setting 'Storage=volatile' in /etc/systemd/journald.conf.

Together with that configuration change for journald, the journalctl-pivot service was created to ensure that the /var volume would be successfully unmounted during shutdown. With `journalctl --flush` being called during boot -- ideally, to ensure that the volatile log data will start synching to the filesystem and continue to do so until shutdown -- it does not appear to have resulted in a significant loss of logging data, not at least for local log messages produced with logger(1)

Unfortunately, the loop still begins, even with these configuration changes. Could it be due to any dangling mount point under /oldroot at that time?

Maybe there's a virtual filesystem still mounted, by then (??)

Notably (??) it does not ever say, "Unmounted /usr." in the log.

Incidentally, most of the unmounted filesystems are from separate ZFS volumes under the Arch Linux root filesystem (ZFS). There's also a volume for '/usr' such that contains all the volumes mounted under '/usr'. I wonder why it doesn't ever show up as being unmounted, the '/usr' volume? It's a normal ZFS filesystem volume with a mount point, '/usr'. The volume's full ZFS volume path is 'irpool/ROOT/archlinux/usr' with the 'irpool/ROOT/archlinux' volume providing the root filesystem for this Arch Linux installation on ZFS.

Similar to the other ZFS volumes being used under this root filesystem, the 'irpool/ROOT/archlinux/usr' volume seems to have been mounted automatically, during system boot. It's a subvolume of the root filesystem volume and does not have a property e.g 'canmount=off' or alternatrely, 'mountpoint=legacy'. The volume has a mountpoint=/usr (inherited in a manner of the mountpoint property in ZFS filesystem volumes) with canmount=noauto inherited from the irpool/ROOT/archlinux volume. The filesystem contents of the volume would represent any files and directories not placed in the filesystem of some managed subvolume, e.g with a mount point `/usr/local'. I'd assumed it was being unmounted by systemd, along with the rest of the mounted volumes, but it's not showing up as such.

It may seem that systemd has unmounted every physical mount point -- whether a ZFS volume, ext2 filesystem, or otherwise -- under the root FS, every one except '/usr'?

As a workaround, maybe a oneshot service could be used to unmount '/usr' explicitly, at some certain point in time during the shutdown, such as immediately before the 'zpool export -a' call? But how peculiar it may seem, if that single mount point is being missed during automatic unmount.

If this may be what's causing the error during systemd-shutdown, but ideally that volume would be unmounted at least during the `zpool export' call, assuming that no active process has a file descriptor still open on that volume's filesystem or on the filesystem of any subvolume. Even if this may not be the actual cause, I wonder why that mount point is being missed?

I've now created a systemd service that unmounts /usr directly, immediately before the 'zpool export' service. The loop is still being entered.

Reviewing that section of the log output, approximately at when the loop is entered in shutdown, the /usr volume was probably at least being unmounted as '/oldroot/usr'

----
[ 2940.240932] systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
[ 2940.280752] systemd-shutdown[1]: Successfully changed into root pivot.
[ 2940.285888] systemd-shutdown[1]: Returning to initrd...
[ 2940.753284] shutdown[1]: Syncing filesystems and block devices.
[ 2940.759803] shutdown[1]: Sending SIGTERM to remaining processes...
[ 2940.770848] shutdown[1]: Sending SIGKILL to remaining processes...
[ 2940.780813] shutdown[1]: Unmounting file systems.
[ 2940.790503] sd-remou[767]: Remounting '/' read-only in with options 'noxattr,noacl'.
[ 2940.803309] sd-umoun[768]: Unmounting '/oldroot/sys/kernel/config'.
[ 2940.810250] sd-umoun[769]: Unmounting '/oldroot/sys/fs/fuse/connections'.
[ 2940.819951] sd-umoun[770]: Unmounting '/oldroot/sys/kernel/tracing'.
[ 2940.825322] sd-umoun[771]: Unmounting '/oldroot/sys/kernel/debug'.
[ 2940.834464] sd-umoun[772]: Unmounting '/oldroot/dev/mqueue'.
[ 2940.843471] sd-umoun[773]: Unmounting '/oldroot/dev/hugepages'.
[ 2940.850006] sd-umoun[774]: Unmounting '/oldroot/sys/fs/bpf'.
[ 2940.854552] sd-umoun[775]: Unmounting '/oldroot/sys/fs/pstore'.
[ 2940.860437] sd-umoun[776]: Unmounting '/oldroot/sys/fs/cgroup'.
[ 2940.870038] audit: type=1334 audit(1635838737.612:120): prog-id=8 op=UNLOAD
[ 2940.876863] sd-umoun[777]: Unmounting '/oldroot/dev/pts'.
[ 2940.882780] audit: type=1334 audit(1635838737.612:121): prog-id=6 op=UNLOAD
[ 2940.888128] sd-umoun[778]: Unmounting '/oldroot/dev/shm'.
[ 2940.897821] sd-umoun[779]: Unmounting '/oldroot/sys/kernel/security'.
[ 2940.903434] sd-umoun[780]: Unmounting '/oldroot/usr'.
[ 2940.923332] sd-remou[781]: Remounting '/oldroot/run' read-only in with options 'noxattr,noacl'.
[ 2940.936629] sd-umoun[782]: Unmounting '/oldroot/run'.
[ 2940.943450] sd-umoun[783]: Unmounting '/oldroot'.
[ 2940.948601] sd-umoun[783]: Failed to unmount /oldroot: Device or resource busy
[ 2940.954064] sd-umoun[784]: Unmounting '/oldroot/dev'.
[ 2940.959963] sd-umoun[785]: Unmounting '/oldroot/sys'.
[ 2940.966777] sd-umoun[786]: Unmounting '/oldroot/proc'.
[ 2940.973281] sd-umoun[787]: Unmounting '/oldroot'.
[ 2940.981882] shutdown[1]: All filesystems unmounted.
[ 2940.986610] shutdown[1]: Deactivating swaps.
[ 2940.990046] shutdown[1]: All swaps deactivated.
[ 2940.994642] shutdown[1]: Detaching loop devices.
[ 2941.000044] shutdown[1]: All loop devices detached.
[ 2941.004413] shutdown[1]: Stopping MD devices.
[ 2941.009299] shutdown[1]: All MD devices stopped.
[ 2941.013326] shutdown[1]: Detaching DM devices.
[ 2941.017743] shutdown[1]: All DM devices detached.
[ 2941.023299] shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
[ 2941.030297] shutdown[1]: Successfully changed into root pivot.
[ 2941.036123] shutdown[1]: Returning to initrd...
----

Still, even with /usr explicitly unmounted before systemd-shutdown is called, the shutdown process enters the loop

Of course, while it fails to unmount '/oldroot' once but then it tries to unmount '/oldroot' again - after the rest of the filesystems are unmounted. That time, it seemingly manages to do so...

The unmount initially fails in sd-umount. This process seems to be managed by systemd-shutdown, with implementation under umount_with_timeout in umount.c
https://github.com/systemd/systemd/blob/v249/src/shutdown/umount.c
included from systemd shutdown.c
https://github.com/systemd/systemd/blob/v249/src/shutdown/shutdown.c


Why, then is the shutdown process not continuing after systemd-shutdown?
Comment by Christian Hesse (eworm) - Tuesday, 02 November 2021, 19:40 GMT
Well, zfs is not officially supported by use...
On the other hand it could be a duplicate of  FS#72499 .

Loading...