FS#63697 - [systemd] 243.0-1 poweroff/reboot umount errors
Attached to Project:
Arch Linux
Opened by Scott K. (Pillgar) - Monday, 09 September 2019, 00:36 GMT
Last edited by Toolybird (Toolybird) - Thursday, 15 June 2023, 07:38 GMT
Opened by Scott K. (Pillgar) - Monday, 09 September 2019, 00:36 GMT
Last edited by Toolybird (Toolybird) - Thursday, 15 June 2023, 07:38 GMT
|
Details
Description: after upgrading from systemd 242.84-2 to
243.0-1 (lib32 included), I've noticed two errors during
shutdown or reboot.
sd-umount[1636]: Failed to umount /oldroot: Device or resource busy sd-umount[1637]: Failed to umount /oldroot/sys: Device or resource busy Might be a regression from https://bbs.archlinux.org/viewtopic.php?id=233820&p=2. I vaguely remember seeing this in a previous version awhile ago. Additional info: * package version(s) systemd 243.0-1 Steps to reproduce: upgrade to systemd 243.0-1 and reboot |
This task depends upon
Closed by Toolybird (Toolybird)
Thursday, 15 June 2023, 07:38 GMT
Reason for closing: Fixed
Additional comments about closing: Fixed upstream in systemd 252. See comments
Thursday, 15 June 2023, 07:38 GMT
Reason for closing: Fixed
Additional comments about closing: Fixed upstream in systemd 252. See comments
printk: shutdown: 4 output lines suppressed due to ratelimiting
If I add the "debug" parameter to the kernel command line, I see the "Failed to umount /oldroot" message as well, as reported in this bug.
HOWEVER, If I downgrade to systemd 242.84-2, I no longer see the "lines suppressed due to ratelimiting" message, but I still see the "Failed to umount /oldroot" message. So, it appears that at least in my system, the "Failed to umount /oldroot" always appeared with the "debug" kernel parameter. But the "lines suppressed due to ratelimiting" message started appearing with this upgrade.
Now, if on 243.0-1 I run "sudo sysctl -w kernel.printk_devkmsg=on -w kernel.printk_ratelimit=0", the "lines suppressed due to ratelimiting" message disappears, with no apparent ill effect (I tried various debugging parameters like "debug", "printk_devkmsg=on", etc. and didn't find any further error).
My current hypothesis is:
* The "Failed to umount /oldroot" message is not a regression from 242.84-2 -> 243.0-1, it was there but I simply didn't see it because it only appears with the "debug" kernel parameter.
* The "lines suppressed due to ratelimiting" message is a regression in 243.0-1. But it's possible that it's simply that systemd is too chatty in the shutdown process and this triggers a ratelimiting message even though there is really no error condition.
Can someone else reproduce this behaviour as well?
If I stop RLS myself and shutdown it works without errors.
It seems systemd is not following proper shutdown logic and isn't waiting for all processes to kill before trying the unmount.
sd-umount[1334]: Failed to unmount /oldroot: Device or resource busy
shutdown[1]: Failed to finalize file systems, ignoring.
I already asked upstream: https://github.com/systemd/systemd/issues/14298 and Lennart Poettering thinks, this must have something to do with Archs initrd.
I would like to investigate this further but, I don't know how. Debugging systemd during shutdown following https://freedesktop.org/wiki/Software/systemd/Debugging/#index2h1 lead to the following log:https://pastebin.com/raw/Ws6RRmNM
At least I cannot identify anything problematic in the log.
As a sidenote, I don't have rust language server installed.
[some_number] sd-umount[some_number]:Failed to unmount /oldroot: Device or resource busy
[some_number] sd-umount[some_number]:Failed to unmount /oldroot/sys: Device or resource busy
[some_number] shutdown[[1]: Could not detach DM /dev/dm-0: Device or resource busy
[some_number] shutdown[[1]: Failed to finalize file systems, DM devices, ignoring
I am using Simple dm-crypted root (so, NO LVM) +EFISTUB booting, and no swap or any other partitions mounted
I guess you have 'shutdown' in there? Any side effects removing this?
my hooks are: `HOOKS=(base udev autodetect modconf block keyboard keymap encrypt resume filesystems fsck)`
No impact on shutdown time, but I'm worried that filesystem may become corrupted if system is not shutdown properly.
EDIT: Also, cwd=/ so that isn't the culprit it seems.
EDIT 2: Also, something that looks interesting as well is that if one boots into an emergency shell and reboots, no message happens. If one boots into a rescue shell and reboots, then the 'Failed to unmount /oldroot' message is printed. One can also boot into a rescue shell and then run 'systemctl emergency' to go into an emergency shell, in which case the message is also printed upon reboot. So, it looks like something in between emergency and rescue is getting stuck.
How can a normal user-level process cause this is another question I guess..
* If I remove those two qemu AUR packages, the error goes away (even though I also have wine installed which also deploys files in /lib/binfmt.d/, so systemd-binfmt.service is still started).
* If I mask systemd-binfmt.service, the error also goes away. On an emergency shell the error doesn't happen normally, but it does if I start systemd-binfmt.service.
I also managed to replicate this in a virtual machine as well.
As a simple test case I create a dummy file in /lib/binfmt.d/ containing just:
:TestFormat:M::XX::/usr/bin/true:F
Then the problem starts to reproduce. Removing the last "F" in the line fixes it. The meaning of the "F" flag is explained here: https://www.kernel.org/doc/html/latest/admin-guide/binfmt-misc.html and seems to imply that it keeps a reference to the interpreter binary, so that's probably the file that's still open and causes the unmount to fail.
If I clean the entry from /proc/sys/fs/binfmt_misc/ manually by echoing -1 to it, it allows the unmount to work correctly. However systemd-binfmt does not do this cleanup itself: https://github.com/systemd/systemd/issues/13129 . And unmounting the binfmt_misc filesystem also doesn't make the kernel deregister the formats. So this should explain why the unmount is blocked, the interpreter binary is still registered and with a reference open.
You could ask for additional special handling of binfmt_misc.
[1] https://github.com/systemd/systemd/blob/v244/src/shutdown/umount.c#L105
In my machine it's the same qemu story.
So what'd the best way to fix this? It it systemd that needs to clean up the binfmt entries?
@jpegxguy: If you are experiencing this due to the binfmt problem I mentioned, this is what I'm testing right now and seems to work. Run 'systemctl edit systemd-binfmt' and paste this (then run systemctl daemon-reload or reboot for the changes to apply):
[Service]
# Workaround for
FS#63697/ systemd GitHub issue #13129# See: https://bugs.archlinux.org/task/63697, https://github.com/systemd/systemd/issues/13129
ExecStop=/usr/bin/bash -c "echo -1 > /proc/sys/fs/binfmt_misc/status"
I tried the ExecStop=/usr/bin/bash -c "echo -1 > /proc/sys/fs/binfmt_misc/status". Didn't remedy for me. This just creates an override.conf in /etc/systemd/system/systemd-binfmt.service.d??? Though ls -l /proc/sys/fs/binfmt_misc didn't show any qemu entries either. And I'm just not smart enough to find the culprit. At least it's getting narrowed down now. TY
What files do you have on /proc/sys/fs/binfmt_misc other than 'register' and 'status'? (If you don't have any more, then this definitely won't help)
The problem with this bug is that anything that leaves a file open until shutdown can cause it. So it's possible your case hasn't anything to do with binfmt. But it's good that we try to find typical causes for it..
CLR:
enabled
interpreter /usr/bin/mono
flags:
offset 0
magic 4d5a
and
DOSWin:
enabled
interpreter /usr/bin/wine
flags:
offset 0
magic 4d5a
If you or someone else wants to try to find out what it is, I suggest disabling all autostart application, systemd services you can until it doesn't reproduce and you find the culprit. If you run out of things to disable, then try booting into the systemd rescue and/or emergency shells and see if you can reproduce it from those, and once you can't try to do a "gradual" bootup.
For example in my case where the problem reproduced in rescue but not emergency, I looked into 'systemctl list-dependencies rescue.target' and started services/targets/mounts one by one until I found the culprit. It's a bit tricky due to some services refusing to start unless some dependencies have already started and some auto-starting a whole chain of dependencies themselves (you can see this on journalctl, after starting systemd-journald). So expect it to require a lot of trial and error and be time-consuming, but once you find the exact service/application that's causing the problem, that narrows down things a lot.
I first encounter that my rootfs is not cleanly unmounted, when I first enabled caching.
It seems like there is a dependency problem with the services (not every service stopped before shutdown.target). However I cannot spot it.
I can confirm the observation that the rescue.target of systemd leads to the same error, while the (very minimalistic) emergency.target does not.
Mr Poettering says, that systemd cannot unmount the rootfs if it is living on a LV (https://github.com/systemd/systemd/issues/15004) and suggest using a special initrd created with dracut.
dracut will execute the symlinked systemd-shutdown/reboot binary. This again will try to unmount things and we have the same situation again.
Without the compat package installed, one gets this error during dracut image generation:
dracut: *** Including module: shutdown ***
dracut-install: ERROR: installing 'poweroff'
dracut-install: ERROR: installing 'reboot'
dracut-install: ERROR: installing 'halt'
dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.kR65CE/initramfs -a umount poweroff reboot halt losetup stat sleep
and one is being trapped in a plain emergency shell when doing a reboot,
BUT the rootfs was unmounted successfully.
However there is no way to properly reboot/shutdown anymore (only reset/power button will the job).
Does anyone know how to get the (presumably c-written) shutdown command of sysv back into play?
```
systemd[1]: Shutting down.
systemd-shutdown[1]: Syncing filesystems and block devices.
systemd-shutdown[1]: Sending SIGTERM to remaining processes...
systemd-shutdown[1]: Waiting for process: systemd-udevd, systemd-udevd
systemd-udevd[1250]: hdaudioC1D0: Worker [1317] processing SEQNUM=4814 killed
systemd-udevd[1250]: Worker [1317] terminated by signal 9 (KILL)
systemd-udevd[1250]: hdaudioC1D0: Worker [1317] failed
systemd-shutdown[1]: Sending SIGKILL to remaining processes...
systemd-shutdown[1]: Unmounting file systems.
[1592]: Remounting '/' read-only in with options '(null)'.
EXT4-fs (dm-2): re-mounted. Opts: (null)
systemd-shutdown[1]: All filesystems unmounted.
systemd-shutdown[1]: Deactivating swaps.
systemd-shutdown[1]: All swaps deactivated.
systemd-shutdown[1]: Detaching loop devices.
systemd-shutdown[1]: All loop devices detached.
systemd-shutdown[1]: Detaching DM devices.
systemd-shutdown[1]: Detaching DM /dev/dm-1 (254:1).
systemd-shutdown[1]: Detaching DM /dev/dm-0 (254:0).
systemd-shutdown[1]: Could not detach DM /dev/dm-0: Device or resource busy
systemd-shutdown[1]: Not all DM devices detached, 2 left.
systemd-shutdown[1]: Detaching DM devices.
systemd-shutdown[1]: Detaching DM /dev/dm-0 (254:0).
systemd-shutdown[1]: Could not detach DM /dev/dm-0: Device or resource busy
systemd-shutdown[1]: Not all DM devices detached, 2 left.
systemd-shutdown[1]: Can not finalize remaining DM devices, continuing
systemd-shutdown[1]: Successfully changed into root pivot.
systemd-shutdown[1]: Returning to initrd...
Unmounting all devices.
Detaching loop devices.
Disassembling stacked devices.
kvm: exiting hardware virtualization
```
As @gamezelda suggested I started dependencies from rescue.target in emergency mode.
When I boot into rescue mode the system produces the log above and does not shutdown properly.
In emergency mode there are no errors. When I start services until systemd-udevd / systemd-udev-trigger produces some udev device
detection output (especially NVIDIA stuff) I can reproduce the system hang.
Since the log contains hdaudioC1D0 I tried to disable the Audio device GM204
HDA (device ID: 0x0fbb, vendor ID: 0x10de). First I tried to blacklist the modules
snd_hda_codec_realtek and snd_hda_codec_generic with the following kernel parameters:
modprobe.blacklist=snd_hda_codec_realtek modprobe.blacklist=snd_hda_codec_generic
This did not work, since the device was still detected as snd_hda_codec_generic.
To disable the device I used `lspci | grep Audio` to determine the PCI device address.
With the command `echo 1 > /sys/bus/pci/devices/0000:01:00.1/remove` I removed the PCI device.
The first time this worked and I could power off without issues.
The second time, the ZSH shell got frozen and was not interuptable. The remove file was gone.
I tried to shutdown, but the system got a similar error log, but with ZSH.
The disable the device permanently I created a udev rule /etc/udev/rules.d/01-disable-nivida-hda.rules:
```
ACTION=="add", KERNEL=="0000:01:00.1", SUBSYSTEM=="pci", RUN+="/bin/sh -c 'echo 1 > /sys/bus/pci/devices/0000:01:00.1/remove'"
```
I tried 3 shutdowns and all shutdowns finished properly.
The only drawback of this workaround is, that the Nvidia HDA Audio device is removed.
Since my built in speaker is wired to the Intel audio device, I have no issues with playing audio.
PS: Note that this fix won't do anything for your problem if it's not due to the binfmts issue, like it appears to be your case.
NOTE: the drive for me is NOT root and the issue is not caused by binfmts but by either RLS (rust language server) or rust analyzer, or possibly even just VSCode having a file open in the disk before reboot is requested.
In either case the issue persists.
I know 100% that RLS (rust language server) causes it as well as rust-analyzer (another extension with similar functionality). I'm not sure if VSCode itself (the editor) would cause it too, or if ANY process that keeps a file open in that partition would cause it.
I can try reproducing with a custom script or something to that effect to see if I can get it to happen out of these specific cases.
Also tested the udev rule to disable the nvidia audio device (adopted the pci path of course), without success.
@gamezelda I would be delighted, if you could elaborate a bit your approach on how you achieve to incrementally start services from the emergency target (e.g. your strategy) - you would save me a lot of headache. Thanks a lot in advance!
responsible to unpack the initrd to /run/initramfs starts /usr/lib/dracut/dracut-initramfs-restore.
The resulting unpacked environment in /run/initramfs cannot be chrooted to, e.g. chroot /run/initramfs fails with:
chroot: failed to run command ‘/bin/bash’: No such file or directory
If I extract the image via lsinitrd --unpack (which is also a script provided by dracut), I'm able to chroot manually and during the shutdown process my rootfs gets properly unmounted.
I posted this upstream: https://github.com/dracutdevs/dracut/issues/925
Is it true, that mkinitcpio does not provide a shutdown procedure?
cd /run/initramfs && /usr/bin/lsinitrd --unpack
For some reason dracut-shutdown.service does not get executed at the right time, so systemd-shutdown cannot chroot into /run/initramfs. That is why I came up with lsinitrd workaround.
Anyway, if my memory serves me well, what I did was booting into the lowest mode where the message wasn't printed (in my case it was emergency). Once there I looked into which services are loaded in the mode where the problem happened (rescue) by doing 'systemctl list-dependencies rescue.target'. Then I just picked one or some of those in the list, started them with 'systemctl start whatever' and tried to reboot. If the message wasn't printed that means the service isn't causing the problem, if it was printed, that means that the service or one of its dependencies is causing the problem... so then I did 'systemctl list-dependencies whatever' or looked at the journal 'journalctl' (note that you need to load the journal service beforehand for this to work!), to figure out which are the dependencies, and then tried to reboot again and start one or some of those dependencies and see if the message appears, and if it does then look at the dependencies of the dependencies and so on, until I managed to narrow the problem to one single service (so that if I started all its dependencies it was fine but loading the service itself made the message appear). All of this took many reboots and quite some time unfortunately...
To add to the problem I also remember having problems with some services which were not auto-starting its dependencies and thus failing to load, I don't remember too well what I did, probably just tried to figure out what needed to be started beforehand by trial and error.
So if you're not confident you'll be able to figure out some way to mess around systemd and don't mind having to do a lot of trial and error, I don't recommend doing this... Hopefully someone with some more systemd knowledge can explain the proper way to do this and not this Pepe Silvia way...
any updates on this? I've tried practically every solution offered here.. with NVIDIA i didn't have the modules in the file to begin with.. with binfmt i have DOSWin in the /proc/sys/fs/binfmt_misc, but not the F flag (also when trying to start systemd-binfmt.service in emergency mode i get error code=exited, with no log I could find).. I don't seem to have the rust server in vscode, even tried uninstalling it but didn't help..
and also at the start @gamezelda mentioned that he managed to hide the msg prints without any ill effects, but its seems like in my laptop if I try to shutdown in any way beside just long-press on the power button (unhealthy..) it causes the pc to seriously heat up over a few minutes, sometimes hours, and eventually completely discharge the battery, like its not completely off (currently to avoid that every time I shutdown I have to reboot to windows and shutdown from there). any idea what could cause this?
also for me the exact line points to "umount: /oldroot/dev/pts: target is busy" if that changes anything
What does sudo systemctl poweroff do?
"""
##
## Generated by mhwd - Manjaro Hardware Detection
##
nvidia
nvidia-drm
"""
and modules.conf which is empty.
when i tried to remove the mhwd-gpu.conf, nothing happend, and i still got the umount print (also it notes that its generated by mhwd, but after reboot it didnt recreate it)
sudo systemctl poweroff did a regular shutdown (unless i missed some print in the shutdown process, but i can hardly follow it normaly), and the msg was displayed.
Solution:
1. Copy `shutdown` hook to `/etc/initcpio` and rename them to e.g. `my-shutdown` for clarity
```
cp /usr/lib/initcpio/shutdown /etc/initcpio/my-shutdown
cp /usr/lib/initcpio/install/shutdown /etc/initcpio/install/my-shutdown
cp /usr/lib/initcpio/hooks/shutdown /etc/initcpio/hooks/my-shutdown
```
2. Edit `/etc/initcpio/install/my-shutdown`, change the `add_file` line to `add_file "/etc/initcpio/my-shutdown" "/shutdown" 755`
3. Edit `/etc/initcpio/my-shutdown`, after line `cd /sys/class/block` (no need for the ZFS stuff if you're not using it of cos):
```
cd /sys/class/block
date
printf '%s\n' "Unloading NVIDIA modules."
rmmod nvidia_drm nvidia_uvm nvidia_modeset nvidia
date
printf '%s\n' "Unmounting all devices."
umount --recursive /oldroot
date
printf '%s\n' "Exporting ZFS pools."
zpool export -a
```
4. Add `my-shutdown` to `HOOKS` then mkinitcpio.
So now NVIDIA no longer holds /oldroot/sys and ZFS no longer holds /dev/dm-0.
Debug:
Cases may vary. To help debug, you can first set up a non-encrypted partition (I just used `/boot`), then add something like the following in your `/etc/initcpio/my-shutdown` to store some logs:
```
# after umount, etc.
debug_log() {
mkdir /debug
mount /dev/sdz1 /debug
date | tee -a /debug/log.txt
mount 2>&1 | tee -a /debug/log.txt
fuser -vm "$1" 2>&1 | tee -a /debug/log.txt
umount /debug
}
if findmnt /oldroot/sys; then
debug_log /oldroot/sys
elif findmnt /oldroot; then
debug_log /oldroot
fi
```
I'll add these to https://wiki.archlinux.org/index.php/Mkinitcpio#Troubleshooting if you guys find it helpful.
(As usual, this is just _another_ thing that can make this problem happen. So, it's possible that you don't have the NVIDIA modules in mkinitcpio and it still happens to you).
that's my case.
confirmed that after shutdown lightdm and
modprobe -r nvidia_drm nvidia_modeset nvidia_uvm && modprobe -r nvidia
the issue is disappeared.
it sound like nvidia module creates something on disk
/usr/lib/systemd/system-shutdown/nvidia.shutdown
#!/bin/sh
# remove nvidia modules
/usr/bin/modprobe -r nvidia_drm nvidia_modeset nvidia_uvm && /usr/bin/modprobe -r nvidia
and make it executable. helped me to resolve the issue.
MODULES=(intel_agp i915 vfat nls_ascii nls_cp437 nls_iso8859-1 f2fs)
(and the issue does appear to be caused by the nvidia modules: unloading them with a `/usr/lib/systemd/system-shutdown/*` script does prevent it).
If the module was loaded before the root fs. Then provided another instance of /sys was opened on top of the root fs you that might avoid the issue.
Cleaner solution would be for nvidia to provide something that runs at shutdown either removing the modules or notifying the module to close the open file.
As an alternative systemd could modify its shutdown logic to move all pseudo filesystems from on top of real filesystems to on top of the pivoted ram disk. So oldroot is not blocked from being unmounted.
The workaround that posted Ivan Lyapunov (dront78) seems to work. I hope it gets fixed sometime. I did some google searches and it seems that this bug has been around for quite some years.
[1] https://github.com/systemd/systemd/commit/4acbe1f6cc7fbab22f7a016db58a035c80e6dd7a