FS#25472 - [initscripts] Processes still running when unmount initiated

Attached to Project: Arch Linux
Opened by jason ryan (jasonwryan) - Tuesday, 09 August 2011, 04:54 GMT
Last edited by Tom Gundersen (tomegun) - Thursday, 20 October 2011, 23:04 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Dave Reisner (falconindy)
Tom Gundersen (tomegun)
Architecture i686
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 7
Private No

Details

Description: During shutdown from TTY, laptop-mode tools is still shutting down (from a script in /etc/rc.local.shutdown) when the umount command is invoked, resulting in the root filessytem not being cleanly unmounted/remounted.

If I log into X, the longer shutdown process allows for LMT to be shutdown in time and the filesystem is unmounted/remounted cleanly.


Additional info:
* package version(s) Initscripts: 2011.07.3-1 Kernel: 3.0 Laptop-mode-tools 1.59-1
* config and/or log files etc. Log files here: https://bbs.archlinux.org/viewtopic.php?pid=972383#p972383


Steps to reproduce: Log into TTY and either reboot or shutdown with laptop-mode-tools being separately shutdown from /etc/rc.local.shutdown.
This task depends upon

Closed by  Tom Gundersen (tomegun)
Thursday, 20 October 2011, 23:04 GMT
Reason for closing:  No response
Comment by Asher Higgs (alphaniner) - Tuesday, 09 August 2011, 17:10 GMT
I believe I am affected by the same bug, but the trigger for me is the existence of an LVM snapshot.

I have a system with /, /var, /home and swap on LVs. All filesystems unmount cleanly when no snapshots exist. But the existence of a snapshot of any LV causes 'Unmounting...' to fail to unmount at shutdown/restart. Upon reboot, the LV containing /var is checked for consistency. I initially encountered the problem on a 64-bit installation to a physical machine, but have duplicated it on a 32-bit installation to a VBox VM.

Further, I simplified the configuration by copying the contents of the /var and /home LVs to the / LV. In this case, 'Unmounting...' and 'Remounting Root...' fail when a snapshot exists, even a snapshot of the (now unused) /home or /var LVs.
Comment by Tom Gundersen (tomegun) - Tuesday, 09 August 2011, 20:04 GMT
Hi guys, thank's for reporting. I'm not able to reproduce this problem, but I'm very interested in getting to the bottom of it.

There seems to be several problems gonig on. First of all, no processes should stay around until the "kill_all" phase, but even if they do, they should, well..., die when they are killed.

@jasonwryan:

Could you post rc.local.shutdown (and any non-standard scripts it might call)?

@alphaniner:

Could you post the output of ps and lsof, obtained like jason did in the above forum thread? Also, could you post your fstab and rc.conf?
Comment by Tom Gundersen (tomegun) - Tuesday, 09 August 2011, 20:13 GMT
@alphaniner: I just found your thread on the forums, so no need to post the ps output (dmeventd is the culprit), just fstab and rc.conf.

One quick suggestion: could you try to see if the problem is solved by replacing


# stop monitoring of lvm2 groups before unmounting filesystems
[[ $USELVM = [Yy][Ee][Ss] && -x $(type -P lvm) && -d /sys/block ]] &&
status "Deactivating monitoring of LVM2 groups" \
vgchange --monitor n &>/dev/null


by simply:

vgchange --monitor n

?
Comment by jason ryan (jasonwryan) - Tuesday, 09 August 2011, 20:40 GMT
Thanks Tom. Script is a basic one:
--
#!/bin/bash

lm="rc.d list | grep laptop"
ofl="$(grep "off-line" <(acpi -V))"

if [ -n "$ofl" ] && [ -n "$lm" ]; then
rc.d stop laptop-mode
fi
--

# edit: also, it ocassionally does fail when shutting down for X...
Comment by Asher Higgs (alphaniner) - Tuesday, 09 August 2011, 21:42 GMT
Tom, the change you suggested to rc.shutdown did not solve the problem. Though FWIW I was able to confirm that all LVs become unmonitored.

Also, I don't know if this is useful but the 'Sending SIGTERM...' step takes much longer when a snapshot exists. I tried to redirect the output of the killall5 commands in the kill_all function from /etc/rc.d/functions:

stat_busy "Sending SIGTERM To Processes"
local i
killall5 -15 ${omit_pids[@]/#/-o } &>/kill_all
for (( i=0; i<20 && $?!=2; i++ )); do
sleep .25 # 1/4 second
killall5 -18 ${omit_pids[@]/#/-o } &>/kill_all-$i
done
stat_done


but all I got was empty files. Interestingly (maybe) I had files named kill_all-0 through kill_all-19, so it looks like the step is timing out. But then you may have already gathered that.

In any case, thank you.

fstab and rc.conf attached.
   fstab (0.7 KiB)
   rc.conf (3.8 KiB)
Comment by Tom Gundersen (tomegun) - Tuesday, 09 August 2011, 21:51 GMT
@alphaniner: thanks for the feedback. Do you still observe the problem if you "unmonitor" the devices manually before shutting down? And if you unmonitor them, do you still see dmeventd in the output of ps? If so, when you try to kill it manually (first using SIGTERM, then SIGKILL), does it die eventually?
Comment by Asher Higgs (alphaniner) - Tuesday, 09 August 2011, 22:50 GMT
Yes, problem still occurs when I manually unmonitor, and dmeventd is still in ps output.

During normal operation, I have one /sbin/dmeventd proc:

root 378 0.0 1.3 14212 13988 ? S<Lsl 18:45 0:00 /sbin/dmeventd

After unmonitor, a second one appears:

root 378 0.0 1.3 14212 13988 ? S<Lsl 18:45 0:00 /sbin/dmeventd
root 610 0.0 0.0 2304 524 ? Ss 18:45 0:00 /sbin/dmeventd

610 goes down with a TERM; 378 took a KILL. But unmounting still failed.
Comment by 6arms1leg (6arms1leg) - Thursday, 11 August 2011, 14:08 GMT
i have exactly the same problem as the opener of this bug report on my thinkpad t61p but without any script in /etc/rc.local.shutdown.
Comment by Dave Reisner (falconindy) - Thursday, 11 August 2011, 14:11 GMT
@6arms1leg: Then you need to go through exactly the same procedure and post the output of ps and lsof prior to shutdown as well as any other information that you might have noticed being pertinent while reading the rest of this bug report.
Comment by 6arms1leg (6arms1leg) - Thursday, 11 August 2011, 14:48 GMT
sorry.

/etc/rc.local.shutdown is empty, other files are in attachment.

# edit:
packages:
kernel 3.0.1-1
initscripts 2011.07.3-1
laptop-mode-tools 1.59-1
   fstab (0.8 KiB)
   rc.conf (3.9 KiB)
   ps.out (5.5 KiB)
   lsof.out (15.8 KiB)
Comment by Tom Gundersen (tomegun) - Sunday, 09 October 2011, 12:42 GMT
Can this problem still be reproduced by running initscripts from [testing]? I refactored the killall logic a bit there, so the timeouts should be longer and hopefully the error reporting better so it becomes easier to tell when there is a problem or not. I'm still unable to reproduce this myself.

Loading...