FS#72806 - [mkinitcpio] Potential root fs corruption on every boot for non-ext{2,3,4}

Attached to Project: Arch Linux
Opened by Alexey Rychkov (nightfog) - Sunday, 21 November 2021, 15:57 GMT
Last edited by Giancarlo Razzolini (grazzolini) - Wednesday, 01 December 2021, 18:28 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Giancarlo Razzolini (grazzolini)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

At every boot non-ext root filesystem is checked with `fsck.ext2 -a ...` (yes, some automatic repair!). This could damage it permanently.

The proposed solution is at https://github.com/archlinux/mkinitcpio/pull/72.

Mkinitcpio version: 30-2.
This task depends upon

Closed by  Giancarlo Razzolini (grazzolini)
Wednesday, 01 December 2021, 18:28 GMT
Reason for closing:  Not a bug
Comment by Giancarlo Razzolini (grazzolini) - Monday, 22 November 2021, 16:34 GMT
You actually have an example of a FS that was damaged by this? The main problem I see with your PR is that rootfstype is not an option that is passed by all boot loaders, if any.
Comment by Alexey Rychkov (nightfog) - Monday, 22 November 2021, 17:40 GMT
No, at the moment I conclude, that my fs corruption is not related to this issue, because fsck.ext2 is not copied to initcpio in my case.
But it still may arise for someone else. fsck is definitely trying to call fsck.ext2.
Another drawback is that fsck.realrootfstype is not called at all.
For bootloaders, I use GRUB and can simply pass rootfstype as a kernel parameter. Yes, I've done this manually. See no problem there.
Comment by Alexey Rychkov (nightfog) - Saturday, 27 November 2021, 16:23 GMT
We can copy the rootfstype parameter provided by the autodetect hook into initramfs as a fallback option. fsck then will check both and give up if none declared.
I've updated a PR. Do you think that's a reasonable way?
Comment by Giancarlo Razzolini (grazzolini) - Sunday, 28 November 2021, 13:25 GMT
I took a look at your PR. It seems reasonable to me, but I'll need to do some more testing. Also, I'd really like to know a case where you saw corruption. Because we detect the root filesystem type at image creation.
Comment by loqs (loqs) - Sunday, 28 November 2021, 16:39 GMT
The fallback image does not use the autodetect hook so in that case fsck would fail unless the rootfstype type is passed by the bootloader?
Comment by Alexey Rychkov (nightfog) - Sunday, 28 November 2021, 17:18 GMT
@Giancarlo Razzolini: I see two scenarios currently, then bad things can happen.
1. autodetect hook is used, but /usr is a separate partition and is ext{2,3,4}.
2. autodetect hook is not used (this includes the fallback image).
In both cases fsck.ext2 is copied into initramfs and is called by default. That's bad if root fs is not ext{2,3,4}.
At image creation we just copy the modules and fsck.{fstype}. Important thing is that fsck didn't know fs type and just fallbacks to ext.
It's designed to get fs type from /etc/fstab, but it's empty at initramfs.

@loqs: yes, seems so (not failed actually, but skipped). Isn't that better than blindly call fsck.ext2? Or some additional logic needed?
Autodetect means use current rootfs type. Without that the kernel can detect fs type automatically in most cases, but fsck - not.
Comment by loqs (loqs) - Sunday, 28 November 2021, 19:41 GMT
The base hook adds blkid. So if blkid is present in the initrd you could use blkid -o value -s TYPE "$root" to find an fstype to try.
Does that seem reasonable as a fallback before skipping?
Comment by Giancarlo Razzolini (grazzolini) - Monday, 29 November 2021, 12:18 GMT
@Alexey

I have tried to "destroy" a non ext{2,3,4} fs (in my case btrfs and a swap, just to make things interesting) and e2fsck not only didn't detect a superblock in either, as it detected it was the other fs and didn't do anything. don't get me wrong, better runtime fs detection is always good, but I'd drop the FUD though.
Comment by Alexey Rychkov (nightfog) - Monday, 29 November 2021, 16:00 GMT
Yes, the bug is not such a critical as I initially thought.
But there is still a bug. fsck is not checking a non-ext fs at all.

Actually I've adopted a tip from loqs, I don't see we can do anything else on this.
Comment by Giancarlo Razzolini (grazzolini) - Monday, 29 November 2021, 16:52 GMT
The bug is that the FS for the root partition only, is detected at image creation time and the corresponding command is added to the initramfs and ran at boot time. Covering separate usr partition is broken, as it calls fsck_device which in turn uses /sbin/fsck which will be the fsck program for the root partition. If the FS for usr is different then the root partition, fsck won't run for it. If there's a bug on the fsck detection of other FS's, then yes, potential corruption could occur. But, since the FS's differ a lot among them, it's very, very unlikely.

So, to sum up, there's room for improvement for detecting other partitions FS's at build time, including their fsck in the image and calling the appropriate fsck for them at initramfs run time.
Comment by Alexey Rychkov (nightfog) - Monday, 29 November 2021, 18:00 GMT
Let's start from the beginning. At image creation time the root fs is detected, the corresponding command (fsck.{rootfstype}) is added to the initramfs, but it's NEVER RAN AT BOOT TIME for non-ext fs!
You could check `man fsck`, it is not magically detects a proper fsck.{our_rootfstype}. In current setup it ALWAYS ran fsck.ext2. NO MATTER WHAT the options are.
This is all about fixing that. If we throw away this fix, that is equivalent to saying: "It's not a problem to call 'fsck.ext2 -a $root' on any filesystem."
Please, take your time, recheck the problem or get an advice about this, as I think we can close our eyes on a serious bug.
Comment by Giancarlo Razzolini (grazzolini) - Wednesday, 01 December 2021, 11:27 GMT
Again, please stop with the FUD.

This line here: https://github.com/archlinux/mkinitcpio/blob/ca7796a27aa62bae94fe180e6f3717a0d6171101/install/fsck#L14 adds fsck for non ext partitions. This line here calls the fsck binary for running fsck on the filesystem: https://github.com/archlinux/mkinitcpio/blob/ca7796a27aa62bae94fe180e6f3717a0d6171101/init_functions#L264

It is NOT always fsck.ext2. fsck is a frontend for the actual fsck binary for the filesystem, which is detected at install time by the fsck hook. Also, even if it was the case that we were running the wrong command on the filesystem, please, show me actual file system corruption.
Comment by Alexey Rychkov (nightfog) - Wednesday, 01 December 2021, 15:32 GMT
Unfortunately, it is always fsck.ext2. And you are absolutely right about that it's detected at install time. The problem is that it's not detected at boot time.
You may simply check this by yourself. Add "break=y" to the bootloader, and run "fsck -N _your_root_device_". It will show the subcommand to be invoked. fsck.ext2 always.

I can't show you a fs corruption, it may never hurt me or you. But there is a risk. If it can be eliminated, why not doing so?
Comment by Giancarlo Razzolini (grazzolini) - Wednesday, 01 December 2021, 18:28 GMT
No, this is not correct. During init, fsck_root is called which calls fsck_device. Which in turns runs the frontend fsck binary. https://github.com/archlinux/mkinitcpio/blob/master/init_functions#L246-261. I tested this on ext2, ext4 and btrfs filesystems, the appropriate fsck is called.

When you have the usr hook, it ALSO calls fsck_device which is going through the same codepath and is going to call the appropriate fsck binary for the usr filesystem. So, I'm closing this for now and also the PR, unless an actual bug and/or harm is demonstrated.

Loading...