FS#72806 - [mkinitcpio] Potential root fs corruption on every boot for non-ext{2,3,4}
Attached to Project:
Arch Linux
Opened by Alexey Rychkov (nightfog) - Sunday, 21 November 2021, 15:57 GMT
Last edited by Giancarlo Razzolini (grazzolini) - Wednesday, 01 December 2021, 18:28 GMT
Opened by Alexey Rychkov (nightfog) - Sunday, 21 November 2021, 15:57 GMT
Last edited by Giancarlo Razzolini (grazzolini) - Wednesday, 01 December 2021, 18:28 GMT
|
Details
At every boot non-ext root filesystem is checked with
`fsck.ext2 -a ...` (yes, some automatic repair!). This could
damage it permanently.
The proposed solution is at https://github.com/archlinux/mkinitcpio/pull/72. Mkinitcpio version: 30-2. |
This task depends upon
Closed by Giancarlo Razzolini (grazzolini)
Wednesday, 01 December 2021, 18:28 GMT
Reason for closing: Not a bug
Wednesday, 01 December 2021, 18:28 GMT
Reason for closing: Not a bug
But it still may arise for someone else. fsck is definitely trying to call fsck.ext2.
Another drawback is that fsck.realrootfstype is not called at all.
For bootloaders, I use GRUB and can simply pass rootfstype as a kernel parameter. Yes, I've done this manually. See no problem there.
I've updated a PR. Do you think that's a reasonable way?
1. autodetect hook is used, but /usr is a separate partition and is ext{2,3,4}.
2. autodetect hook is not used (this includes the fallback image).
In both cases fsck.ext2 is copied into initramfs and is called by default. That's bad if root fs is not ext{2,3,4}.
At image creation we just copy the modules and fsck.{fstype}. Important thing is that fsck didn't know fs type and just fallbacks to ext.
It's designed to get fs type from /etc/fstab, but it's empty at initramfs.
@loqs: yes, seems so (not failed actually, but skipped). Isn't that better than blindly call fsck.ext2? Or some additional logic needed?
Autodetect means use current rootfs type. Without that the kernel can detect fs type automatically in most cases, but fsck - not.
Does that seem reasonable as a fallback before skipping?
I have tried to "destroy" a non ext{2,3,4} fs (in my case btrfs and a swap, just to make things interesting) and e2fsck not only didn't detect a superblock in either, as it detected it was the other fs and didn't do anything. don't get me wrong, better runtime fs detection is always good, but I'd drop the FUD though.
But there is still a bug. fsck is not checking a non-ext fs at all.
Actually I've adopted a tip from loqs, I don't see we can do anything else on this.
So, to sum up, there's room for improvement for detecting other partitions FS's at build time, including their fsck in the image and calling the appropriate fsck for them at initramfs run time.
You could check `man fsck`, it is not magically detects a proper fsck.{our_rootfstype}. In current setup it ALWAYS ran fsck.ext2. NO MATTER WHAT the options are.
This is all about fixing that. If we throw away this fix, that is equivalent to saying: "It's not a problem to call 'fsck.ext2 -a $root' on any filesystem."
Please, take your time, recheck the problem or get an advice about this, as I think we can close our eyes on a serious bug.
This line here: https://github.com/archlinux/mkinitcpio/blob/ca7796a27aa62bae94fe180e6f3717a0d6171101/install/fsck#L14 adds fsck for non ext partitions. This line here calls the fsck binary for running fsck on the filesystem: https://github.com/archlinux/mkinitcpio/blob/ca7796a27aa62bae94fe180e6f3717a0d6171101/init_functions#L264
It is NOT always fsck.ext2. fsck is a frontend for the actual fsck binary for the filesystem, which is detected at install time by the fsck hook. Also, even if it was the case that we were running the wrong command on the filesystem, please, show me actual file system corruption.
You may simply check this by yourself. Add "break=y" to the bootloader, and run "fsck -N _your_root_device_". It will show the subcommand to be invoked. fsck.ext2 always.
I can't show you a fs corruption, it may never hurt me or you. But there is a risk. If it can be eliminated, why not doing so?
When you have the usr hook, it ALSO calls fsck_device which is going through the same codepath and is going to call the appropriate fsck binary for the usr filesystem. So, I'm closing this for now and also the PR, unless an actual bug and/or harm is demonstrated.