FS#16368 - [initscripts] remount all filesystems readonly on shutdown

Attached to Project: Arch Linux
Opened by André Fettouhi (A.Fettouhi) - Sunday, 27 September 2009, 09:35 GMT
Last edited by Tom Gundersen (tomegun) - Sunday, 27 March 2011, 17:58 GMT
Task Type Bug Report
Category Initscripts
Status Closed
Assigned To Dale Blount (dale)
Aaron Griffin (phrakture)
Thomas Bächler (brain0)
Tom Gundersen (tomegun)
Architecture i686
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 9
Private No

Details

Edit: Altered title to reflect current issue - Allan



Original report:
[glibc] nscd shutdown causes errors on root filesystem

Description:
I'm running Arch i686 with all the latest updates (27.09.2009) and with KDEmod4. I'm running my network on dhcp and when I shutdown my machine I see that the nscd is also shut down but for some reason this causes my superblock on my root partition to be corrupted. I get a write time in the future error at the next boot and I have to do a scan of my root partition to fix the errors. Why is nscd being run at all and why does it appear in the shutdown process? I recently switched to dhcp, I was running static ip before.


Additional info:
* package version(s)
* config and/or log files etc.

glibc-2.10.1-4


Steps to reproduce:

PS. The error only appears if I log into my KDE4 session. If I shutdown/reboot directly from KDM then nscd does not appear in the shutdown process list.
This task depends upon

Closed by  Tom Gundersen (tomegun)
Sunday, 27 March 2011, 17:58 GMT
Reason for closing:  None
Additional comments about closing:  See my last comment. This FS is conflating many issues, mostly bugs in other packages and a feature request.
Comment by Thomas Bächler (brain0) - Sunday, 27 September 2009, 11:48 GMT
nscd is only run when you explicitly run it in rc.conf. If you didn't do that, you'll have to ask the kdemod people why they launch nscd without you knowing it, KDE/KDM doesn't do that by itself.

About the superblock problems, I'll run nscd manually and reboot and see what happens. It sound weird though.
Comment by André Fettouhi (A.Fettouhi) - Sunday, 27 September 2009, 11:58 GMT
No, nscd is not in my rc.conf. I'll post in the kdemod forums and ask about it becasue it is very weird.
Comment by Thomas Bächler (brain0) - Sunday, 27 September 2009, 12:22 GMT
Btw, why do you think that your superblock being corrupted has anything to do with nscd being started? It doesn't make any sense. What is the exact message that leads you to believe there is "corruption"?
Comment by André Fettouhi (A.Fettouhi) - Sunday, 27 September 2009, 12:32 GMT
When I shutdown/reboot my machine I see the nscd shutting down in the list and when it unmounts the filesystems I get the "device is busy" warning for the root partition. Then when the machine reboots and reaches the filesystem checkpoint it fails and get the error "write time is in the future". I then scan the root partition reboot and everything is ok. In the cases where nscd doesn't appear in the shutdown list then there is no problem, i.e. no device is busy and my machine boots fine.
Comment by Thomas Bächler (brain0) - Sunday, 27 September 2009, 12:52 GMT
The "device is busy" should be unrelated to the "write time in the future", at least as far as I see it. It all doesn't make any sense to me.
Comment by Allan McRae (Allan) - Sunday, 27 September 2009, 12:55 GMT
+1 confused...

Does starting nscd in /etc/rc.conf cause this issue too?
Comment by André Fettouhi (A.Fettouhi) - Sunday, 27 September 2009, 14:02 GMT
Yes it does but it seems like that if the device is busy issue only appears if I log into KDE. If I only am in KDM and reboot/shutdown then I don't get the device is busy problem and therefore no "superblock: last write time is in the future" error. So this is most likely a KDE problem (KDEmod).

Regards

André
Comment by Jan de Groot (JGC) - Sunday, 27 September 2009, 18:25 GMT
Shouldn't we remount all unmountable filesystems readonly on shutdown or reboot?
Comment by André Fettouhi (A.Fettouhi) - Tuesday, 29 September 2009, 18:52 GMT
For some reason nscd is running on my system even though the daemon is not present in my rc.conf. So far the only solution I found to removing the error I get with my filesystem is to manually issue the command at shutdown

sudo /etc/rc.d/nscd STOP

then shutdown/reboot preceeds as normal and booting up gives no problems at filesystem checking stage.

Regards

André
Comment by Dale Blount (dale) - Wednesday, 30 September 2009, 15:57 GMT
Jan, yes we should. I've seen this happening on systems without kde/kdemod/nscd even.
Comment by Thomas Bächler (brain0) - Wednesday, 30 September 2009, 16:05 GMT
But how is this caused here? killall5; killall5 -9 should have killed everything that keeps us from unmounting.
Comment by Jan de Groot (JGC) - Wednesday, 30 September 2009, 21:44 GMT
What about systems that use nss modules like nss_ldap? Those systems have a /usr that is unmountable because libldap is still in use by nss_ldap.
Comment by Thomas Bächler (brain0) - Wednesday, 30 September 2009, 22:01 GMT
Patches welcome :)
Comment by André Fettouhi (A.Fettouhi) - Thursday, 22 October 2009, 10:38 GMT
I have done a complete re-installation of my machine (running 64 bit and ext4 now) and I don't have the problem anymore with nscd, meaning it doesn't run.

Regards

André
Comment by Mukul (mukul_s) - Thursday, 05 November 2009, 14:52 GMT
I am also facing this issue. I am running fully updated Archlinux i686 and KDEmod. The symptoms are same but I don't know whether it's related to nscd or not.
This bug is keeping me away from Arch Linux :(
Comment by Thomas Bächler (brain0) - Thursday, 05 November 2009, 16:09 GMT
Again, patches welcome.

However, I have to repeat this again for Mukul: This bug seems to be caused by KDEmod in some obscure fashion which I cannot understand. What I can not and will not do is support KDEmod in any way. So unless you reproduce this bug without KDEmod, don't expect any help here. This bug is actually not keeping you away from Arch, but from KDEmod.


As for fixing the corruption problem itself: I cannot see a way to reproduce it so that we can understand it, so I cannot fix it. Remounting everything read-only might be a workaround (patches welcome, again ...), but doesn't fix the underlying problem.
Comment by Phlogi (Phlogiston) - Thursday, 05 November 2009, 16:21 GMT
I just had this issue. Is there anything I should have a look at or check to find out whats causing it?
Comment by Phlogi (Phlogiston) - Thursday, 05 November 2009, 16:31 GMT Comment by Stefan Hermansen (scorpyn) - Thursday, 19 November 2009, 01:38 GMT
Attaching a remount-ro patch.

It appears to be working on my system, but otoh I don't have the problem mentioned here. Everything should probably be remounted ro at shutdown anyway though.
Comment by solsTiCe (zebul666) - Monday, 23 November 2009, 13:25 GMT
the bug i opened  FS#17247  has been closed (for no reason ? because i don't see this as a duplicate). mine is random

i DO NOT RUN kde or kdemod. i use gnome. as far as i know, nscd is not running, i have switched from dhcp to static ip recently. the bugs i reported happened with the 2.
i have not seen a nscd shutdown at shutdown.
i have applied the above patch.
i wait and see.
Comment by Stefan Hermansen (scorpyn) - Tuesday, 24 November 2009, 11:22 GMT
If the issue is that / can't be remounted ro because it's busy, then the patch I made will not help.

The only difference is that with the patch I made, it's looking for more than just / and tries to remount the rest of the mounted partitions read-only aswell. In most cases, however, everything except / should already have been unmounted at that point in the shutdown script.

I should probably modify the patch and add a lsof command or something to make it easier to see which process is causing problems.
Comment by Thomas Bächler (brain0) - Wednesday, 09 June 2010, 17:53 GMT Comment by Gerardo Exequiel Pozzi (djgera) - Tuesday, 13 July 2010, 22:50 GMT
Since all filesystems are umounted on shutdown, and if fails are remounted as read-only. What is the status of this task?
Comment by Alex Merry (pippin) - Tuesday, 27 July 2010, 23:40 GMT
This appears to be the same or a similar issue to  FS#20292 , which I just added (before I found this one). In my case, nscd wasn't running (as far as I'm aware), but bash (executing rc.shutdown) had /var/db/nscd/passwd open. Remounting read-only didn't work for me either.
Comment by Alex Merry (pippin) - Tuesday, 27 July 2010, 23:43 GMT
Oh, it only started happening for me a few days ago - I didn't change anything notable, other than just doing pacman -Suy, and installing openvpn.
Comment by Alex Merry (pippin) - Tuesday, 27 July 2010, 23:50 GMT
It seems something starts nscd shortly after booting finishes (someone on the forums suggested wicd as a possible culprit). If I kill it _before I shut down_ (ie: before the bash instance running rc.shutdown starts), the file systems are unmounted cleanly.
Comment by Alex Merry (pippin) - Wednesday, 28 July 2010, 12:08 GMT
It appears that wicd is starting nscd when I connect to the wireless network.

Note: I have a script that makes wicd start openvpn at my office, but this also happens when I'm at home, in which case the vpn isn't started. However, the problems started happening when I changed wicd's configuration, adding scripts to start and stop openvpn.
Comment by Alex Merry (pippin) - Wednesday, 28 July 2010, 13:48 GMT
Incidentally, I figured why the problem suddenly appeared for me: I installed openresolv from AUR, which was configured incorrectly (at build time) and started nscd every time it was told to change /etc/resolv.conf.

Of course, that doesn't solve the problem that having nscd running when the shutdown script starts prevents whatever partition /var/db/nscd/passwd is on from being unmounted.

One possible approach is to split rc.shutdown into two parts. The first part stops all daemons and kills all processes, then execs the second part, which finishes everything else off.
Comment by Tom Gundersen (tomegun) - Sunday, 27 March 2011, 17:56 GMT
I guess this is no longer a big problem as things seem to have quieted down. Here is anyway my take on the situation:

I guess remounting read-only, will not necessarily work (if a file is opened for writing), so a robust shutdown algorithm would certainly be welcome. I guess a loop that is alternating between killing processes and attempting to unmount devices would be needed (one process might block a mountpoint from unmounting and at the same time a different mountpoint might block the process from terminating). There are some special cases we have to consider as well. A good place to start looking is in the systemd implementation, as I know it is fairly robust.

If anyone is interested in having a go at this, please email <arch-projects@archlinux.org>, where we can continue the discussion.

About the specific bugs people are experiencing: While a kill/unmount loop would probably fix your problems, and it certainly would be a nice feature, this is not the underlying problem (AFAIU), so I'll close this bug. I suggest opening bugs against the specific programs that are not shutting down properly.

Loading...