FS#17438 - [initscripts] causes fsck: Superblock last write time is in the future.

Attached to Project: Arch Linux
Opened by Tomas Mudrunka (harvie) - Tuesday, 08 December 2009, 12:32 GMT
Last edited by Tom Gundersen (tomegun) - Sunday, 27 March 2011, 16:37 GMT
Task Type Bug Report
Category Initscripts
Status Closed
Assigned To Aaron Griffin (phrakture)
Thomas Bächler (brain0)
Roman Kyrylych (Romashka)
Tom Gundersen (tomegun)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 7
Private No

Details

Description: Almost every restart i get this annoying message. I suppose it´s related to switching UTC and local time during the init, or something similar. and it forces me to do fsck -a /dev/root_device manually. For example this happend even when computer was turned off for about 12 hours (so fs errors are probably not caused before reboot, but i get those errors even when i reboot computer without delay):

EXT3-fs: write access will be enabled during recovery.
usb 2-2: configuration #1 chosen from 1 choice
kjournald starting. Commit interval 5 seconds
EXT3-fs: sda2: orphan cleanup on readonly fs
EXT3-fs: sda2: 9 orphan inodes deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with writeback data mode.
kinit: Mounted root (ext3 filesystem) readonly.
INIT: version 2.86 booting

> Arch Linux

> http://www.archlinux.org
> Copyright 2002-2007 Judd Vinet
> Copyright 2007-2009 Aaron Griffin
> Distributed under the GNU General Public License (GPL)

------------------------------
:: Starting UDev Daemon [DONE]
:: Triggering UDev uevents [DONE]
:: Loading Modules [DONE]
:: Loading standard ACPI modules [DONE]
:: Waiting for UDev uevents to be processed [DONE]
> UDev uevent processing time: 53ms
:: Bringing up loopback interface [DONE]
:: Mounting Root Read-only [DONE]
:: Checking Filesystems [BUSY]
/dev/sda2: Superblock last write time (Tue Dec 8 13:36:39 2009,
now = Tue Dec 8 12:36:52 2009) is in the future. <-- THiS iS THe PRoBLeM
[FAIL]

***************** FILESYSTEM CHECK FAILED ****************
* *
* Please repair manually and reboot. Note that the root *
* file system is currently mounted read-only. To remount *
* it read-write type: mount -n -o remount,rw / *
* When you exit the maintenance shell the system will *
* reboot automatically. *
* *
************************************************************

Give root password for maintenance
(or type Control-D to continue):
[root@(none) ~]# fsck -a /dev/sda2
fsck z util-linux-ng 2.16
/dev/sda2 obsahuje systém souborů s chybami, kontrola vynucena. (<-- errors => check forced)
/dev/sda2: 564860/3571712 souborů (5,2 % nesouvislých), 12816672/14281785 bloků



Additional info:
* package version(s)
sysvinit 2.86-5
e2fsprogs 1.41.9-1

* config and/or log files etc.
/etc/rc.conf:
...
LOCALE="cs_CZ.utf8"
HARDWARECLOCK="localtime"
USEDIRECTISA="no"
TIMEZONE="Europe/Prague"
KEYMAP="cz"
CONSOLEFONT="ter-v14n"
CONSOLEMAP=
USECOLOR="yes"
...
DAEMONS=(syslog-ng nscd hal fam gdm @laptop-init @gpm !network @alsa @crond @acpid @networkmanager @miredo @openntpd @autofs @privoxy @tor @netfs @sshd @lighttpd @ntpd @cups @cpufreq)


Steps to reproduce:
This task depends upon

Closed by  Tom Gundersen (tomegun)
Sunday, 27 March 2011, 16:37 GMT
Reason for closing:  Works for me
Additional comments about closing:  Requested closed by the reporter. Please reopen if someone fits the criteria in my last comment.
Comment by Gerardo Exequiel Pozzi (djgera) - Wednesday, 09 December 2009, 01:01 GMT
  • Field changed: Summary ([sysvinit] causes fsck: Superblock last write time is in the future. → [initscripts] causes fsck: Superblock last write time is in the future.)
  • Field changed: Status (Unconfirmed → Assigned)
  • Field changed: Category (Packages: Extra → Packages: Core)
  • Task assigned to Aaron Griffin (phrakture), Thomas Bächler (brain0)
your version of initscripts is?
Comment by Tomas Mudrunka (harvie) - Wednesday, 09 December 2009, 01:39 GMT
djgera: i am up to date (but not using testing):

0 ;) harvie@harvie-ntb ~ $ pacman -Qs initscripts
local/initscripts 2009.08-1 (base)
System initialization/bootup scripts
Comment by Roman Kyrylych (Romashka) - Friday, 11 December 2009, 11:41 GMT
Why do you have both openntpd and ntpd in DAEMONS?
Please try removing them both and see if your problem disappears.
Do you have some other OS in a dualboot?
After you reboot your system - is the time correct in BIOS?
Comment by Tomas Mudrunka (harvie) - Friday, 11 December 2009, 15:23 GMT
Romashka: it doesn't matter...
LANG=c cat /etc/rc.d/ntpd
cat: /etc/rc.d/ntpd: No such file or directory
Comment by Josef Lusticky (EVRAMP) - Tuesday, 22 December 2009, 09:41 GMT
Comment out USEDIRECTISA, after successfully booting sync your clock using "# ntpdate europe.pool.ntp.org". Ntpdate is part of Ntp package.
I can see this message after every improper shutdown (it wants me to fsck disks). What if this happens on a remote server? :X
Comment by Tomas Mudrunka (harvie) - Tuesday, 22 December 2009, 10:24 GMT
"What if this happens on a remote server?"
problem is in the "/etc/rc.sysinit" which IMHO does not handle fsck return value properly. i have checked "man fsck" and i think the problem is there.
i have arch on "remote server" and it stucks all the time when i need to kill it and it needs to be rebooted until it start working again.

the problem is in the
"Please repair manually and reboot... Give root password for maintenance (or type Control-D to continue)"
message it should timeout after while if no password is given. when i pres ^G, system just reboots and fix filesystems automagically. but IMHO it can be fixed automagicaly without need for reboot. why on the first reboot it waits for root password and on the second reboot it just repair system without any questions?

The exit code returned by fsck is the sum of the following conditions:
0 - No errors
1 - File system errors corrected
2 - System should be rebooted
4 - File system errors left uncorrected
8 - Operational error
16 - Usage or syntax error
32 - Fsck canceled by user request
128 - Shared library error
The exit code returned when multiple file systems are checked is the bit-wise OR of the exit codes for each file system that is checked.
Comment by Tomas Mudrunka (harvie) - Tuesday, 22 December 2009, 10:30 GMT
look:

if [ ${fsckret} -gt 1 -a ${fsckret} -ne 32 ]; then
echo
echo "***************** FILESYSTEM CHECK FAILED ****************"
echo "* *"
echo "* Please repair manually and reboot. Note that the root *"
echo "* file system is currently mounted read-only. To remount *"
echo "* it read-write type: mount -n -o remount,rw / *"
echo "* When you exit the maintenance shell the system will *"
echo "* reboot automatically. *"
echo "* *"
echo "************************************************************"
echo
/sbin/sulogin -p
fsck_reboot
fi

the ${fsckret} -gt 1 conditional is imho absolutely wrong. the return value of fsck is sum so it should be handled in bit different manner (and according to meanings of each number). maybe the problem is also with multiple filesystems, when fsck repairs one filesystem and the other after reboot because it's launched with different arguments.
Comment by Seán Connolly (januszeal) - Friday, 05 February 2010, 17:06 GMT
This has happened on remote machines and has prevented them from coming back up until I could go on location the next morning. IMO this should be given higher priority.
Comment by Thomas Bächler (brain0) - Friday, 05 February 2010, 18:12 GMT
The fsck return values are handled properly. If it says "System should be rebooted" (2 is set in the return value), the system is rebooted. Any other error greater than 1 besides 32 requires manual intervention and cannot be automatically handled.

There are two real problems here:
1) The underlying cause of the problem needs to be found - I have never been able to reproduce the problem, so I cannot do this.
2) It is entirely stupid to make "File system write time in the future" a critical error and this needs to be changed upstream. I have no idea what the reasoning behind this is.

As a workaround, set your hardware clock to UTC, then everything will work as expected. It actually makes no sense to me why anyone would not do this.
Comment by Seán Connolly (januszeal) - Friday, 05 February 2010, 18:15 GMT
@Thomas Bächler
It's set like this now, but a workaround is not a fix. Also the issue doesn't seem to effect people west of GMT.
Comment by Thomas Bächler (brain0) - Friday, 05 February 2010, 18:31 GMT
It doesn't affect everyone east of GMT either, only some people. However, you are the one who could find out what goes wrong. At the point where fsck is executed, your time should already be set properly.
Comment by Tomas Mudrunka (harvie) - Saturday, 06 February 2010, 11:27 GMT
brain0: "requires manual intervention and cannot be automatically handled" = WRONG!
ArchLinux thinks that filesystem have to be rapaired manualy, but when i reboot it, it repairs itself during next boot automatically or without reboot i can manualy launch command "fsck -a /dev/sd??" which fix it also. That means it can be fixed without user intervention even when arch offers us root shell instead of fixing it so arch DEFINETELY NEEDS BETTER HANDLING OF FSCK RETURN CODE. BTW for some strange reasons fsck sometimes needs to be launched several times which arch also does not try.
Comment by Thomas Bächler (brain0) - Saturday, 06 February 2010, 11:29 GMT
Can you please give the exact return code when this happens?
Comment by Tomas Mudrunka (harvie) - Saturday, 06 February 2010, 12:55 GMT
brain0: if [ ${fsckret} -gt 1 -a ${fsckret} -ne 32 ]; then

IMHO first step is to run fsck two times and save only the fsckret from second run...
Comment by Thomas Bächler (brain0) - Saturday, 06 February 2010, 13:14 GMT
We might consider that.

However, please add "echo fsck return value: ${fsckret}" and tell me the value that is returned.
Comment by Thomas Bächler (brain0) - Monday, 15 February 2010, 01:09 GMT
Okay, this error happens so randomly, due to various user errors (I had it when experimenting with several operating systems earlier and once set the time wrong in one of them). And although we handle the time setting in our initscripts in a way that this error shouldn't happen, it still does. I am suggesting that we add the following /etc/e2fsck.conf by default:

[options]
buggy_init_scripts = true

This will still cause an fs check, but not abort the boot process. The underlying cause of the problem is still unknown, as none of the affected users seem to be able to properly investigate why their time is not set correctly during fsck on boot.
Comment by Tomas Mudrunka (harvie) - Monday, 15 February 2010, 12:13 GMT
brain0: we can do this, but there are two problems
1.) we don't want to disable fsck on unclean system halt
2.) this is only problem of UI (can't we just re-run fsck and continue booting instead of waiting for root password) - maybe we can check how other distros are handling this...
Comment by Thomas Bächler (brain0) - Saturday, 20 February 2010, 18:03 GMT
This is not a problem of UI. This is a problem of the time not being set early at boot, a problem which I cannot reproduce at all because our initscripts are supposed to handle it fine.

So, I see two people on this bug report that have this problem. By any chance, are you using a custom kernel? If so, this might be related to  FS#18078 .
Comment by Tomas Mudrunka (harvie) - Saturday, 20 February 2010, 19:07 GMT
brain0: This is both...
1.) problem of UI, which prevents arch from booting even when filesystem can be repaired automaticaly, this can happen for some another reason even when time is set correctly (!crucial on remote servers!)
2.) problem of setting time, which causes fsck to fail
Comment by Roman Kyrylych (Romashka) - Saturday, 20 February 2010, 19:29 GMT
assuming you have initscripts-2010-1 add the following code:
at line 56: echo $RTC_MAJOR
at line 66 (ie. before /sbin/hwclock): echo "clock is setup with $HWCLOCK_PARAMS"
report the results
Comment by Thomas Bächler (brain0) - Sunday, 21 February 2010, 00:45 GMT
harvie, About the UI problem: fsck explicitly tells us that automatic repair is impossible. If that is not true (as you claim), then it's a bug in (e2)fsck.

While I still like to find and fix the time setup bug, the e2fsck.conf option I mentioned above is a valid workaround - it causes e2fsck to make a full file system check instead of bailing out entirely.

In addition to what Roman asked before, the output of "ls -l /dev/rtc*" once the system has booted might be helpful.
Comment by Kevin (unhammer) - Tuesday, 23 February 2010, 08:07 GMT
I'm in GMT+1 and believe I have the same problem (guiding non-tech users to e2fsck manually over the phone is not fun, nor is waiting for that 800GB check just because of a "future" mount date).
Comment by Thomas Bächler (brain0) - Tuesday, 23 February 2010, 08:38 GMT
I've about had it here. I have requested several important pieces of information, as has Roman (the last bit I requested doesn't even require a reboot, just entering a single command) and nobody ever replied with that. It is impossible to find the underlying cause of this problem if everybody is simply whining instead of providing the needed information to allow us to fix it. I am sorry, but my crystall ball is on vacation.

EDIT: And if you are using the Arch kernel, add "lsmod | grep rtc" to the list of things to post.
Comment by Kevin (unhammer) - Tuesday, 23 February 2010, 09:11 GMT
I added initscripts stuff (I assume /etc/rc.sysinit was meant?), but had to reboot some times to catch the output:

253
clock is setup with --hctosys --utc

$ dmesg |grep rtc
rtc_cmos 00:03: RTC can wake from S4
rtc_cmos 00:03: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one month, y3k, 114 bytes nvram, hpet irqs

$ lsmod |grep rtc
rtc_cmos 8904 0
rtc_core 14631 1 rtc_cmos
rtc_lib 1810 1 rtc_core

$ ls -l /dev/rtc*
lrwxrwxrwx 1 root 4 2010-02-23 10:03 /dev/rtc -> rtc0
crw-rw-r-- 1 root 253, 0 2010-02-23 10:03 /dev/rtc0
Comment by Roman Kyrylych (Romashka) - Tuesday, 23 February 2010, 10:01 GMT
> clock is setup with --hctosys --utc
see, there should not be --utc
this means that you don't have HARDWARECLOCK set up properly in rc.conf
it must be HARDWARECLOCK="localtime"
(I assume your hardware clock is actually using localtime, because users with UTC have zero problems)
Comment by Roman Kyrylych (Romashka) - Tuesday, 23 February 2010, 10:05 GMT
Hm, I got confused.
If you have --utc, then you shouldn't have any problems at all.
Do you guys have Windows in dual-boot maybe?
or is something else messing with UTC/localtime etc between boots?
Comment by Kevin (unhammer) - Tuesday, 23 February 2010, 10:12 GMT
No dual-boot here. I have HARDWARECLOCK="UTC" in my rc.conf now yes (if it was not set up properly, rc.sysinit would set that hwclock parameter to empty), but I had localtime earlier and had the same trouble; I'll try switching back to localtime again and see what happens.
Comment by Thomas Bächler (brain0) - Tuesday, 23 February 2010, 11:02 GMT
Trouble is to be expected if you switch between localtime and UTC, but only once. Anyway, I don't see how any problem could occur in your particular case:
- You use UTC. Unless something messes with your hardware clock during system downtime, the Linux system time should always be correct.
- Your rtc device is created as expected (contrary to the case of  FS#18078 , which I thought might be related, but it seems it isn't).
Can we get the output of "date" at the beginning of initscripts (at some point before hwclock) and directly before the (failed) file system check? We need to verify that the date is actually set incorrectly on boot (which really shouldn't happen for UTC).

I hope the other two reporters can also provide the information needed - I suspect the problem reported by you (unhammer) might only be loosely related to theirs.
Comment by Tomas Mudrunka (harvie) - Tuesday, 23 February 2010, 12:34 GMT
lsmod | grep rtc
rtc_cmos 7504 0
rtc_core 12011 1 rtc_cmos
rtc_lib 1450 1 rtc_core
Comment by Tomas Mudrunka (harvie) - Tuesday, 23 February 2010, 12:44 GMT
1.) i am using "localtime" in rc.conf... i think fsck should work for both - UTC and Localtime...
2.) i found interresting informations on this issue: http://lists.openwall.net/linux-ext4/2009/10/12/12
3.) btw rc.sysvinit is using deprecated fsck -a instead of reccomended fsck -p
Comment by Thomas Bächler (brain0) - Tuesday, 23 February 2010, 12:54 GMT
1) Yes, it should.
2) Oh my ... can you confirm that your "now" time is always the correct time when the error occurs? In this case, looking at initscripts won't actually be very helpful, as WE are doing everything right, but e2fsck isn't. I'll look into this.
3) Oh yes, we should change that.
Comment by Tomas Mudrunka (harvie) - Tuesday, 23 February 2010, 12:59 GMT
2.) can't reproduce... i am not rebooting much often (once a week is maximum) ;) but last few times i've rebooted the fsckret was 0. maybe there was some fsck upgrade. btw fsck is blaming distributions when writing about this topic...
Comment by Thomas Bächler (brain0) - Tuesday, 23 February 2010, 13:28 GMT
To be able to reproduce this problem, I think you must reboot twice in a row.
Comment by Thomas Bächler (brain0) - Tuesday, 23 February 2010, 15:11 GMT
Okay, this problem just got even more confusing. To solve it, we must find out whether the time during fsck is incorrect or the last mount/check/write time is incorrect.
Comment by Roman Kyrylych (Romashka) - Friday, 26 February 2010, 14:24 GMT
From the e2fsprogs 1.41.10 changelog:

Add new e2fsck.conf configuration option, default/broken_system_clock
to support systems with broken CMOS hardware clocks. Also, since too
many distributions seem to have broken virtualization scripts now,
e2fsck will by default accept dates which are off by up to 24 hours by
default. (Addresses Debian Bugs: #559776, #557636)

Do we want to find the root of the problem or do we ignore it in case the new version solves the issue for those of you who still experience it?
Comment by Thomas Bächler (brain0) - Friday, 26 February 2010, 14:29 GMT
I'd really like to know how and why this goes wrong.
Comment by Roman Kyrylych (Romashka) - Friday, 26 February 2010, 14:58 GMT
Then I'm asking the people affected by the issue to not update to e2fsprogs-1.41.10-1 until the root of the problem is found.
Comment by Thomas Bächler (brain0) - Friday, 26 February 2010, 15:54 GMT
No, people should upgrade and see what happens.
Comment by Roman Kyrylych (Romashka) - Friday, 26 February 2010, 16:26 GMT
okay, it is worthwile to see if this fixes the problem (I predict it does)
but if it fixes the problem, then it doesn't really tell what caused it (or at least I don't see it),
so it must be downgraded after that to track the problem.
Comment by Roman Kyrylych (Romashka) - Saturday, 06 March 2010, 14:59 GMT
any new information on this?
Comment by Tomas Mudrunka (harvie) - Saturday, 06 March 2010, 16:11 GMT
FSCK should be fixed (according to another bugreport), but i think that we should not close this issue before we find whats wrong with clock...
Comment by Tomas Mudrunka (harvie) - Sunday, 02 May 2010, 18:14 GMT
right oposite problem is possible: I am afraid that filesystem is not checked at all... Even when my kernel crashes or when i pull the power off the pc i am still geting fsckret=0... isn't it strange? only case when i see fsck doing something is when i am mounting filesystem 27th time... Which may mean two things...

1.) filesystem is really OK and it does not need to be repaired after kernel crash (and i was just misleaded by fsck running almost every boot few months earlier)
2.) fsck is ignoring such problems because of some workaround...

did you observed similar behaviour?
Comment by Tomas Mudrunka (harvie) - Sunday, 02 May 2010, 20:07 GMT
oh my dear... i am using data=writeback all the time :-)
Comment by Leonid Isaev (lisaev) - Friday, 13 August 2010, 21:57 GMT
@harvie, what is the config of your ntpd? Does the problem exist, if you shut it down? The only instance, when I had problems like yours, OpenNTPD screwed up my time when DST took effect (I probably deserved it, since I used LOCALTIME).

Otherwise, there is something wrong with your clock...

See also this https://bugzilla.redhat.com/show_bug.cgi?id=522969 and links therein...
Comment by Tomas Mudrunka (harvie) - Saturday, 14 August 2010, 14:08 GMT
lisaev: well i have not experienced this issue again, so i can't tell right now what is the problem...
Comment by Karol Błażewicz (karol) - Saturday, 14 August 2010, 14:11 GMT
I had a similar problem a long time ago, but it somehow went away and I've never had (open)ntpd installed.
Comment by Tomas Mudrunka (harvie) - Sunday, 27 March 2011, 01:51 GMT
once again: i've been using data=writeback mode which disabled the journal. it promissed some performance improvement in some "100% guranteed linux speedup" article but i didn't knew that it will disable journal too. you all should check your fstab for occurence of strings like "data=writeback" or "ext2" too.
Comment by Tomas Mudrunka (harvie) - Sunday, 27 March 2011, 01:53 GMT
btw most of admin prefers to mount /boot as ext2, so if your PC crash after modifying /boot contents before disk was synced this can cause reported error message during next boot. the same applies to data=writeback mode of ext3 (and maybe even ext4).
Comment by Tom Gundersen (tomegun) - Sunday, 27 March 2011, 11:57 GMT
Are people still experiencing this? I'd like to hear from someone who can reproduce the problem and
1) is using UTC
2) is not using dualboot
3) is running up-to-date standard Arch kernel and other packages

Loading...