FS#61925 - [Linux] Can't halt system with Linux 5.0

Attached to Project: Arch Linux
Opened by LucaS (luca020400) - Wednesday, 06 March 2019, 07:51 GMT
Last edited by Jan de Groot (JGC) - Monday, 13 May 2019, 08:33 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To No-one
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 7
Private No

Details

Description:
Right after installing Linux 5.0 I noticed my system won't reboot/shutdown
unless I umount /home manually before attempting to halt my system.

/home is a f2fs partition on a SSD device, reverting to old 4.20 f2fs sources
didn't fix the issue.

I tried to get into a root shell and killed every process using resources from /home
before halting with no deal.

Additional info:
* package version(s): linux: 5.0
* cmdline: pti=off l1tf=off spectre_v1=off spectre_v2=off spectre_v2_user=off spec_store_bypass_disable=off module_blacklist=sp5100_tco
* mount options: rw,noatime,lazytime,background_gc=on,discard,no_heap,inline_xattr,inline_data,inline_dentry,flush_merge,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,fsync_mode=nobarrier
* link to upstream bug report: None (yet)?

Steps to reproduce:
reboot
This task depends upon

Closed by  Jan de Groot (JGC)
Monday, 13 May 2019, 08:33 GMT
Reason for closing:  Fixed
Comment by LucaS (luca020400) - Wednesday, 06 March 2019, 09:25 GMT
So I figured out the issue, it's profile sync daemon.
I think the overlayfs isn't unmounted properly and somehow still references /home when systemd triggers partition umounts during the halt.
At this point I'm wondering if it's systemd that doesn't umount the partitions in the correct order ( unlikely since it worked fine before )
or something in the kernel broke recursive umount of partitions.
Comment by LucaS (luca020400) - Wednesday, 06 March 2019, 09:54 GMT
Or it was completely random and it worked by chance 8 times in a row. So there is a new info, sometimes it reboots fine.
Comment by Mortan (Mortan1961) - Thursday, 07 March 2019, 04:30 GMT
I am also afflicted since 5.0. It is inconsistent, but fails most of the time on poweroff or reboot.
Comment by Jeyhun (jeyhunn) - Thursday, 07 March 2019, 07:01 GMT
I have also the same problem with linux 5.0 and f2fs root, f2fs home partitions on SSD device. Just after 5-10 minutes waiting, it reboots or shutdowns. With previous kernel and also with linux lts, I didn't have such kind of problem.
Comment by LucaS (luca020400) - Thursday, 07 March 2019, 07:51 GMT
@Mortan: Are you using f2fs?
I'm now quite sure it's an issue with it, but using the 4.20 fs/f2fs under 5.0 doesn't fix it, so it's likely coming from fs or block stack
Comment by Dark Wav (DarkWav) - Thursday, 07 March 2019, 08:23 GMT
Im not entirly sure if im having the same problem as the other people in here, but I happen to be affected by this bug too, and im using a BtrFS root / ext4 home partition. I did a research and found out that in my case, the Kernel 5.0 for whatever reason causes systemd to unload dbus.service before basic.target has been shut down, causing the shutdown service to get overwritten and the entire shutdown process to hang up. For me, the way to fix it was adding a config file at /usr/lib/systemd/system/dbus.service.d/dbus-load.conf, which contains the following code:
[Unit]
Before=basic.target
This would effectively avoid the glitch and resist an update of the dbus package. What it does is forcing systemd to keep dbus.service running until basic.target has been shut down.
HOWEVER:
This is a "dirty workaround". It would be better if the glitch would be actually fixed by either the developers of the Kernel or Systemd.
You can try out the workaround, its not guaranteed to work tho, so use it at your own risk. Remember to remove the config file when the bug gets fixed or if it does not help you.
Here are my sources: https://bbs.archlinux.org/viewtopic.php?id=170756
EDIT: My DBUS theory is garbage, it just "luckily" fixed the issue improperly. By all means, use the LVM patch instead.
Comment by Mortan (Mortan1961) - Thursday, 07 March 2019, 19:56 GMT
FAT32 /boot, LUKS2 ext4 /, and hidepid=2 /proc all on NVMe.
Comment by Dark Wav (DarkWav) - Friday, 08 March 2019, 06:47 GMT
OK, it does not seem to be file-system specific...
Comment by LucaS (luca020400) - Friday, 08 March 2019, 07:31 GMT
So now we ruled out the filesystem
After trying the dbus hack ( that didn't help ), I now see the real issue.
It fails to umount /run/user/1000
Comment by Dark Wav (DarkWav) - Friday, 08 March 2019, 13:45 GMT
@LucaS It seems like its certainly my usbsticks fault, too... If I unplug it it will sometimes reboot just fine even without my dbus hack. This bug gets super weird right now.
EDIT: It still does happen sometimes, but much less often than with the usbstick pluggen in. very weird. Well, for me, it seems like Systemd 241.7-2 is not ready for kernel 5.0, yet.
Comment by LucaS (luca020400) - Friday, 08 March 2019, 23:11 GMT
The only media I have connected is gnome GFS google drive filesystem and my Android device ( tough only in charge mode and with debug bridge enabled )
But I doubt anything there is the issue since Google drive isn't mounted by default and seems like there is sometimes else keeping dbus stuck
Indeed Linux 5.0 doesn't play nice with systemd
Comment by Mortan (Mortan1961) - Saturday, 09 March 2019, 00:51 GMT
I don't use any external media. Also, apparently it will poweroff after waiting long enough.
Comment by Dark Wav (DarkWav) - Saturday, 09 March 2019, 08:22 GMT
@Mortan yes, it always does after 90 secounds, indicating a problem with systemd.
Comment by Varakh (Varakh) - Monday, 11 March 2019, 16:43 GMT
I can confirm this issue. My system won't power off or reboot with the latest kernel and latest systemd version.
Comment by Dark Wav (DarkWav) - Thursday, 14 March 2019, 21:24 GMT
EDIT: bug still happens, as mentioned, fails to umount /run/user/1000
Comment by LucaS (luca020400) - Friday, 15 March 2019, 10:58 GMT
Sadly 5.0.1 didn't fix it for me.
Comment by Varakh (Varakh) - Monday, 25 March 2019, 20:52 GMT
Still the case for 5.0.4 with my setup.

- Hardware is AMD 1700X, RTX 2070, SSD
- I use startx with openbox and nvidia drivers
- ext4 on / and /home
- FAT32 on /boot

What I've tried which doesn't work/change anything:
- don't mount any network drives
- try to debug it with https://wiki.archlinux.org/index.php/systemd#Shutdown.2Freboot_takes_terribly_long but I haven't found anything useful in those resulting logs; I can still attach them if needed
- create a new user with just my xinitrc and openbox configuration (to rule out user specific services and timers)
- adjust timeout values in systemd.conf file

What did work but feels hacky and is probably not a good solution:
- systemctl force like poweroff --force
- dbus.service basic.target thing mentioned here

Other observations:
- Don't start X at all and go to tty2 where getty is not enabled, login and issue commands there works most of the time but not 100%.
- Using the current LTS kernel (4.19.x) works 100% correctly.

I also have the same problem on my work laptop (T480, Intel graphics), but I only performed the steps above on my desktop machine.
Comment by Mortan (Mortan1961) - Wednesday, 27 March 2019, 03:55 GMT Comment by Varakh (Varakh) - Wednesday, 03 April 2019, 11:37 GMT
The following command shows all affected service files which might be responsible for the long shutdown/reboot:

find /usr/lib/systemd/system/*.service -type f | xargs grep --files-with-matches DefaultDependencies=no | xargs grep --files-with-matches "Conflicts=.*shutdown.target" | xargs grep --files-without-match "Before=.*shutdown.target"

I think the to-implement solution (according to GitHub) is to just show a warning which services are responsible.
Maybe this should be handled by the lvm2 package then? At least on my machine that's the only package which includes such service files. Should we then close it or wait for the lvm2 package to adapt (upstream)?
Comment by loqs (loqs) - Wednesday, 03 April 2019, 12:18 GMT
@Varakh does the override from https://github.com/systemd/systemd/issues/11821#issuecomment-477545885 resolve your issue?
Comment by Varakh (Varakh) - Thursday, 04 April 2019, 17:46 GMT
Yes, seems to solve it.

How do we proceed here? I think the solution cannot be to manually modify the file?
Comment by Dark Wav (DarkWav) - Sunday, 07 April 2019, 10:52 GMT
My DBUS theory is garbage, it just "luckily" fixed the issue improperly. By all means, use the LVM patch instead. Its much, much cleaner. I hope the LVM/Systemd developers fix this, I heared there is an official LVM patch on the way. For now I created a PKGBUILD with creates a package that applies the fix.
If you want me to post the PKGBUILD, let me know.
Comment by Varakh (Varakh) - Sunday, 07 April 2019, 11:40 GMT
I think systemd will not fix it as it's 100% intended as far as I understood it.

LVM needs to fix their service files. In the meantime I think the best approach would be a pacman hook which is automatically applied after upgrading the LVM package as long as it's not fixed upstream. If you have this, I'd be grateful if you share it.
Comment by Dark Wav (DarkWav) - Sunday, 07 April 2019, 11:54 GMT
To my knowledge, an override of a config file is not getting destroyed when updating the Package itself. I created an override for lvm2-lvmetad.service by adding /usr/lib/systemd/system/lvm2-lvmetad.service.d/lvm-patch.conf and did not edit /usr/lib/systemd/system/lvm2-lvmetad.service itself so we should not get problems during updates of systemd or lvm.
Anyways, here's the PKGBUILD:

pkgname=lvm-patch
pkgver=1.0.0
pkgrel=1
pkgdesc="lvm-patch"
arch=('any')
license=('unknown')
package() {
mkdir "${pkgdir}/usr"
mkdir "${pkgdir}/usr/lib"
mkdir "${pkgdir}/usr/lib/systemd"
mkdir "${pkgdir}/usr/lib/systemd/system"
mkdir "${pkgdir}/usr/lib/systemd/system/lvm2-lvmetad.service.d"
echo "[Unit]" > ${pkgdir}/usr/lib/systemd/system/lvm2-lvmetad.service.d/lvm-patch.conf
echo "Before=shutdown.target" >> ${pkgdir}/usr/lib/systemd/system/lvm2-lvmetad.service.d/lvm-patch.conf
}

To install this just put the text into a file named "PKGBUILD" and run "makepkg -rsi" in the same directory where the PKGBUILD file is.
This will install a package called "lvm-patch".
After installing this the problem should be fixed.
If you want to remove the patch just run "sudo pacman -R lvm-patch" and it should be gone.
Comment by Varakh (Varakh) - Sunday, 07 April 2019, 12:53 GMT
Oh, yes. You're right. I like that, thanks. You could add this to the AUR maybe? Just for the time when this is not fixed upstream.
Comment by Dark Wav (DarkWav) - Sunday, 07 April 2019, 14:21 GMT Comment by Dark Wav (DarkWav) - Monday, 08 April 2019, 13:01 GMT
I created an issue at lvm2's github repo: https://github.com/lvmteam/lvm2/issues/17
EDIT: Good news, there is an Upstream patch for this issue on the way. Just wait for the newest version of lvm2 to pass testing (Shouldn't take long since we're on Arch) and the issue will be resolved.
Comment by Dark Wav (DarkWav) - Wednesday, 10 April 2019, 11:42 GMT
Fixed with lvm 2.02.184-2. (currently in testing repos)
Comment by Varakh (Varakh) - Thursday, 11 April 2019, 21:08 GMT
I'm afraid that once I uninstall your patch and have the -3 version of the package you mentioned, I still have the same error.
Comment by loqs (loqs) - Thursday, 11 April 2019, 21:14 GMT
@varakh lvm2 2.02.184-3 is equivalent to lvm2 2.02.184-1 it reverted the patch due to  FS#62302 
lvm2 2.02.184-4 currently in testing has the patch and the fix for  FS#62302 
Comment by Varakh (Varakh) - Thursday, 11 April 2019, 21:38 GMT
Thanks for the heads up
Comment by Dark Wav (DarkWav) - Thursday, 18 April 2019, 07:30 GMT
Issue fixed upstream with lvm2 2.02.184-4. Bug report can be closed.
Comment by LucaS (luca020400) - Saturday, 27 April 2019, 11:04 GMT
Issue can be closed, lvm2 2.02.184-4 fixed the issue.

Thanks!

Loading...