FS#28110 - [initscripts] shutdown hangs on Unmounting Swap-backed filesystems

Attached to Project: Arch Linux
Opened by Heinrich Siebmanns (Harvey) - Thursday, 26 January 2012, 10:18 GMT
Last edited by Eric Belanger (Snowman) - Friday, 16 November 2012, 17:57 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tom Gundersen (tomegun)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description:
I use a central storage for my pacman packages on my home server. It is distibuted via NFS4. When doing 'pacman -Syu' I mount this share to /var/cache/pacman/pkg by executing
'mount server:/srv/pacman/cache/x86_64 /var/cache/pacman/pkg/'

I use the same scenario for i686-boxes (of course with another share for this architecture) and it's all the same there.

If I forget to unmount this share manually and the system shuts down the process hangs forever saying 'Unmounting Swap-backed filesystems'.

If I unmount this manually by 'umount /var/cache/pacman/pkg' before the shutdown process all is well.

I use multilib and all testing repos and upgrade on a daily basis. Actual version of
initscripts: 2012.01.3-1
This task depends upon

Closed by  Eric Belanger (Snowman)
Friday, 16 November 2012, 17:57 GMT
Reason for closing:  Won't fix
Comment by Heinrich Siebmanns (Harvey) - Monday, 27 February 2012, 07:53 GMT
From the latest version of initscripts on I can't even mount my nfs share from fstab without the shutdown process hanging. Come on, this is ridiculous! There must be a way to catch this long standing bug!
Comment by Heinrich Siebmanns (Harvey) - Monday, 27 February 2012, 08:01 GMT
Ok, after the first anger is gone I have to clarify a bit more. I use this setup on two medium networks (~ 10 - 20 boxes) each of them sharing their pacman-cache on a nfs4 share. They all mount it via fstab. When I remotely login to update the installations I am 600km away. And if they don't reboot properly this is a no-go. Manually mounting and unmounting is not really an option because of the danger to forget to unmount. I have to get someone on the phone to press the reset button for me... Not very professional.
Comment by Tom Gundersen (tomegun) - Sunday, 04 March 2012, 18:28 GMT
I have tried to figure this one out, but as I'm not able to reproduce the problem, I have not had much luck.

My only guess so far is that this is some sort of kernel bug (as umount should never hang, even if it fails).

We at least need your rc.conf and fstab to be able to make sense of this.

Please also attach some relevant logs/output so we can see what is going on (one way to do this would be to add a call to 'bash' just before the umount call in rc.shutdown, that should give you a terminal so you can call umount manually with verbose output to see what happens).
Comment by Heinrich Siebmanns (Harvey) - Monday, 05 March 2012, 08:36 GMT
Ok, some configs:
First, /etc/exportfs of the fileserver
____________
/srv/pacman/cache/x86_64 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
/srv/pacman/cache/i686 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
____________
fairly trivial in my eyes.

Then, my /etc/fstab on the client
_____________
#
# /etc/fstab: static file system information
#
# <file system> <dir> <type> <options> <dump> <pass>
devpts /dev/pts devpts defaults 0 0
shm /dev/shm tmpfs nodev,nosuid 0 0
tmpfs /tmp tmpfs nodev,nosuid,noatime,size=1000M,mode=1777 0 0
#/dev/fd0 /media/fl auto user,noauto 0 0
/dev/sda2 /boot ext2 defaults 0 1
/dev/mapper/root / ext4 defaults 0 1
/dev/mapper/home /home ext4 defaults 0 1
/dev/mapper/swapDevice swap swap defaults 0 0
# NFS4-shares auf Server
teefax:/srv/pacman/cache/x86_64 /var/cache/pacman/pkg nfs4 defaults 0 1
___________________________
This seems to be complicated, but it also happens on systems that only have the last line additional to the default fstab.

The rc.conf on my system:
_______________________
#
# /etc/rc.conf - Main Configuration for Arch Linux
#
# See 'man 5 rc.conf' for more details
#

# LOCALIZATION
# ------------
#
HARDWARECLOCK="UTC"
TIMEZONE="Europe/Berlin"
KEYMAP="de-latin1"
CONSOLEFONT=
CONSOLEMAP=
LOCALE="de_DE.UTF-8"
DAEMON_LOCALE="yes"
USECOLOR="yes"

# HARDWARE
# --------
#
MODULES=(floppy vboxdrv vboxnetflt vboxnetadp)
USEDMRAID="no"
USEBTRFS="no"
USELVM="no"

# NETWORKING
# ----------
#
HOSTNAME="obelix"

interface=eth0
address=
netmask=
broadcast=
gateway=

NETWORK_PERSIST="no"

# DAEMONS
# -------
#
DAEMONS=(dbus acpid syslog-ng network rpcbind nfs-common netfs crond cupsd pcscd openntpd bluetooth sensors)
______________________
Again, this also happens on systems with rather uncomplicated rc.conf. acpid, pcscd, bluetooth and sensors are only running on my system but not on other affected boxes. virtualbox-modules and floppy are also only loaded on my system .

BTW, the shutdown process doens't hang any longer when this is mounted through fstab. I can only confirm this when the nfs share is mounted manually. After that shutdown will hang at the line 'Unmounting Swap-backed filesystems'. When the share is unmounted manually before shutdown all is well.

Hope this helps a bit more. I will provide some logs later this day when I have some spare time.
Comment by Tom Gundersen (tomegun) - Monday, 05 March 2012, 09:03 GMT
What would be interesting would be:

1) comment out the nfs entry from fstab
2) restart
3) mount your nfs entry manually
4) # rc.d stop netfs
5) check "findmnt" to see if your netfs is still mounted or not
Comment by Tom Gundersen (tomegun) - Monday, 05 March 2012, 09:03 GMT
(and if there are no nfs mounts left, check if shutdown still hangs).
Comment by Heinrich Siebmanns (Harvey) - Monday, 05 March 2012, 12:04 GMT
Well, now I am completely puzzled. I can't reproduce it anymore. I even tried another machine to be sure. While I think of it I remember that it had been gone before and came back. I know this is not the kind of error report you want to hear :( I have stumbled over this almost only when I used pacman. I ssh into a box, do my updates and restart it via ssh. Could it be that pacman's kernel update process is to blame? Next time this bites me I will pay more attention on the circumstances. For now, invest your precious time in something more 'fleshy' ;)
Comment by Tom Gundersen (tomegun) - Monday, 05 March 2012, 12:19 GMT
Thanks for reporting back. Sadly, several people are experiencing the same; when they start investigating the bug disappers. A real Heisenbug ;-)

Lokking forward to more info if it reoccurs.
Comment by Robin Smith (toplard) - Friday, 09 March 2012, 10:01 GMT
I've been seeing this for some time now. Its persistent. I've tried all the above.

What do you suggest to investigate further please?
Comment by Tom Gundersen (tomegun) - Wednesday, 21 March 2012, 13:49 GMT
Please confirm whethre or not this is still an issue with initscripts-2012.03.1-1.
Comment by Hannes (archannes) - Wednesday, 28 March 2012, 22:17 GMT
I've also experienced the forementioned issue for a long time now but, after the last pacman -Syu, which also included initscripts 2012.03.2-1, the problem seems to be solved. I tried this twice, but not more often.
So if you experienced this bug in the past, please try again to shutdown your box with manually mounted nfs-share(s) and write down your result here.
Comment by Heinrich Siebmanns (Harvey) - Thursday, 29 March 2012, 07:43 GMT
I have updated ~15 Boxes during the last days using manually mounted nfs shares and had no errors anymore. Seems to be solved (Fingers crossed).
Comment by Tom Gundersen (tomegun) - Thursday, 29 March 2012, 10:24 GMT
Thanks for the feedback guys! I'll close this as solved :-)
Comment by Willem van Asperen (wasperen) - Monday, 28 May 2012, 00:31 GMT
  • Field changed: Percent Complete (100% → 0%)
I have this issue, even with a box I installed only yesterday. There is an nfs mount to another box which I did manually... (so, not through fstab)...
Comment by Splith (splith) - Saturday, 21 July 2012, 09:08 GMT
I have this bug too, though my situation is a bit different... I do not have ANY network filesystems in use at all, it's running on a server and the kernel is a custom one I compiled using grsec and selinux, it doesn't hang but stays on this task for a good 8 minutes before finally turning off.
Comment by Tom Gundersen (tomegun) - Sunday, 04 November 2012, 16:21 GMT
If anyone wants this fixed, post a patch to the projects ml. I'm not going to be working on this.
Comment by Heinrich Siebmanns (Harvey) - Sunday, 04 November 2012, 16:40 GMT
If you think this is only occurring on initscripts systems I have to disappoint you. I have seen this on at least one newly installed pure systemd system.
Comment by Tom Gundersen (tomegun) - Sunday, 04 November 2012, 16:52 GMT
That would be a very different-looking bug, so please open a separate bug against systemd.
Comment by Heinrich Siebmanns (Harvey) - Sunday, 04 November 2012, 17:48 GMT
O.K, will do so as soon as I have one of these systems again. Can't be more clear now, as this specific box is not here anymore.

Loading...