FS#77596 - [linux] 6.2 fails to boot on F2FS root
Attached to Project:
Arch Linux
Opened by Gereon Schomber (IncredibleLaser) - Tuesday, 21 February 2023, 09:06 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:22 GMT
Opened by Gereon Schomber (IncredibleLaser) - Tuesday, 21 February 2023, 09:06 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:22 GMT
|
Details
Description:
After installing Linux 6.2 from [testing], my system wouldn't fully boot anymore. My F2FS root was mounted RO (marked RW in /etc/fstab) so no logs were saved. I got a TTY at one point but since the system was RO, not much could be done there. I rebooted to a live USB system and installed 6.1.12 from [core] which continues to work. Additional info: * package version(s): Linux 6.2 * /etc/fstab root entry: # /dev/nvme0n1p2 LABEL=root UUID=15fddbaa-f25b-4a84-b048-607af48664ae / f2fs rw,relatime,lazytime,background_gc=on,no_heap,inline_xattr,inline_data,inline_dentry,flush_merge,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,fsync_mode=posix 0 0 Steps to reproduce: * Install Linux 6.2 on F2FS root * Reboot |
This task depends upon
Closed by Buggy McBugFace (bugbot)
Saturday, 25 November 2023, 20:22 GMT
Reason for closing: Moved
Additional comments about closing: https://gitlab.archlinux.org/archlinux/p ackaging/packages/linux/issues/9
Saturday, 25 November 2023, 20:22 GMT
Reason for closing: Moved
Additional comments about closing: https://gitlab.archlinux.org/archlinux/p ackaging/packages/linux/issues/9
Also the bug happen with a self compiled kernel, so the issue is upstream.
[1] https://wiki.archlinux.org/title/Kernel#Debugging_regressions
I have not bisected the kernel yet.
https://drive.google.com/file/d/1KutGCid-3xO_kBNDNB-txw_5dMEGviWC/view?usp=share_link linux-6.2-1.4-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1m_SGqSnJbis9_3jtoHlCzLNWeVRohvQy/view?usp=share_link linux-headers-6.2-1.4-x86_64.pkg.tar.zst
This is again upstream no additions from the commit before the f2fs pull was merged
https://drive.google.com/file/d/1aEGho2uJBalKr2PpHJ7umYsB1uz_hKYc/view?usp=share_link linux-6.1.r10907.geb67d239f3aa-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1CxweMIFLB66ehOK8RfapxAPvAp-OcR39/view?usp=share_link linux-headers-6.1.r10907.geb67d239f3aa-1-x86_64.pkg.tar.zst
I have attached dmesg output from the non-working boot but I don't see any real hints in there, it complains that the FS is RO but not why it was mounted RO.
Edit: The reason is because my kernel command line did not contain a rw option. Previously, this would lead to the kernel mounting root as RW anyways. It seems this got changed. Also, `/usr/share/systemd/bootctl/arch.conf` does not list the option by default, and this is the template I used when I installed the system.
If you set it rw does the boot succeed or the error change?
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=967eaad1fed5f6335ea97a47d45214744dc57925
Documentation of bootparam(7) says the following about ro & rw [0]:
> The 'ro' option tells the kernel to mount the root
> filesystem as 'read-only' so that filesystem consistency
> check programs (fsck) can do their work on a quiescent
> filesystem. No processes can write to files on the
> filesystem in question until it is 'remounted' as
> read/write capable, for example, by 'mount -w -n -o
> remount /'. (See also mount(8).)
> The 'rw' option tells the kernel to mount the root
> filesystem read/write. This is the default.
Do we know why the default appears to be ro now?
Assuming there's a good reason for that change, I wouldn't consider manually setting rw to be a satisfying fix.
Removing the flush_merge option like I did isn't better either, documentation makes it sound rather useful [1]:
> Merge concurrent cache_flush commands as much as possible
> to eliminate redundant command issues. If the underlying
> device handles the cache_flush command relatively slowly,
> recommend to enable this option.
[0] https://github.com/mkerrisk/man-pages/blob/master/man7/bootparam.7#L173
[1] https://www.kernel.org/doc/Documentation/filesystems/f2fs.txt
The default mode of mkinitcpio is readonly: https://gitlab.archlinux.org/archlinux/mkinitcpio/mkinitcpio/-/blob/master/init_functions#L398
The default mode of systemd-fstab-generator is readonly: https://github.com/systemd/systemd/blob/9200e520adb9acd44e602ee6f80bf21ffed4c969/src/fstab-generator/fstab-generator.c#L901
I think the manpage is just wrong?
- https://github.com/torvalds/linux/commit/967eaad1fed5f6335ea97a47d45214744dc57925
- https://github.com/torvalds/linux/commit/ed8ac22b6b75804743f1dae6563d75f85cfd1483
Both were added in 6.2-rc1. Looks like an upstream bug.
If f2fs can not support FLUSH_MERGE in read-only mode then enforcing such a requirement does not seem to be a bug, the alternative would be to drop the option.
If ro plus rootflags=noflush_merge assuming rootflags=noflush_merge is valid still produces the issue that would appear to be a bug.
This is assuming a workflow of mount read-only with noflush then fsck and the fstab entry can change it to read-write with flush_merge if desired.
So if flush_merge + ro does not work, it seems to default back to noflush_merge if the option is not explicitly set and just fail if flush_merge is set. That sounds sensible. And indeed, rootflags=flush_merge won't work since it defaults to ro, I am with you here.
IMO it would still be more sensible to just disable flush_merge instead of failing to mount the root filesystem...
But for me remounting the root filesystem in rw didn't work since i had flush_merge in fstab. So mount -o rw,remount / would produce the same error message, so it looks like the kernel code checks the current ro/rw state and not the future one.
$ cat /proc/cmdline
initrd=\intel-ucode.img initrd=\initramfs-linux.img root=UUID=df0fb79c-d647-41a5-9284-7c211cd9512c rootfstype=f2fs add_efi_memmap
$ systemctl status systemd-remount-fs.service
[...]
Mar 15 09:18:17 kiste2 systemd-remount-fs[206]: /usr/bin/mount for / exited with exit status 32.
[...]
$ dmesg | grep f2fs
[ 2.703924] F2FS-fs (nvme0n1p2): FLUSH_MERGE not compatible with readonly mode
$ mount -o rw,remount /
mount: /: mount point not mounted or bad option.
dmesg(1) may have more information after failed mount system call.
$ mount -o rw,remount,noflush_merge /
[ no output, it worked! ]
$ mount -o remount,flush_merge /
[ it worked again! ]
$ mount | grep f2fs
/dev/nvme0n1p2 on / type f2fs (rw,relatime,lazytime,background_gc=on,discard,no_heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,barrier,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,checkpoint_merge,fsync_mode=posix,discard_unit=block,memory=normal)
That confirms it, you can't remount with rw and flush_merge at the same time right now, you have to do it in 2 steps. This breaks systemd remounting the root filesystem rw. Maybe this is a different bug, but the root cause seems to be the same kernel change.