Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#80074 - [btrfs-progs] Boot failure after 6.5.3 upgrade

Attached to Project: Arch Linux
Opened by Scott Shawcroft (tannewt) - Tuesday, 24 October 2023, 17:38 GMT
Last edited by Toolybird (Toolybird) - Sunday, 29 October 2023, 20:03 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Sébastien Luttringer (seblu)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
This morning I upgraded btrfs-progs from 6.5.2-2 to 6.5.3-1. I rebooted and the computer wouldn't boot because it couldn't find the root partition by PARTUUID. The root is a btrfs mirror'd partition with a PARTUUID of one of the mirrors. (Not the shared UUID.)

Steps to reproduce:
1. Upgrade to 6.5.3-1
2. Reboot
3. Dropped into emergency shell after `mount: new_root: can't find PARTUUID=foo.` This is after the `:: running hook (udev)` line.

I was able to recovery via PXE boot by downgrading to btrfs-progs 6.5.2-2 and rebooting.
This task depends upon

Closed by  Toolybird (Toolybird)
Sunday, 29 October 2023, 20:03 GMT
Reason for closing:  Works for me
Additional comments about closing:  Please see comments
Comment by loqs (loqs) - Tuesday, 24 October 2023, 18:24 GMT
Can you bisect the issue [1] and report it upstream?

[1]: https://wiki.archlinux.org/title/Bisecting_bugs_with_Git
Comment by Toolybird (Toolybird) - Tuesday, 24 October 2023, 20:17 GMT
Thanks, but this report is missing vital information like used bootloader, boot config, logfiles, etc. Please ensure your bug reports are effective [1].

[1] https://wiki.archlinux.org/title/Bug_reporting_guidelines#Gather_useful_information
Comment by Scott Shawcroft (tannewt) - Tuesday, 24 October 2023, 21:00 GMT
I'm booting via EFI stub. I attached a picture of what shows on the monitor. I doubt this is logged anywhere because the root drive isn't mounted. It is an Asus ROG Dark Hero X570 with a 5950x. Kernel version 6.5.8-arch1-1.

$ efibootmgr -u
BootCurrent: 0002
Timeout: 1 seconds
BootOrder: 0002,0000,0003
Boot0000* Windows Boot Manager HD(1,GPT,db13a694-564d-11ed-975d-b534cf79e5c0,0x800,0x32000)/File(\EFI\MICROSOFT\BOOT\BOOTMGFW.EFI)䥗䑎坏S
Boot0002* Arch Linux NVMe EPP HD(1,GPT,03944fc5-fcc8-4d4e-b944-40a878f10c5c,0x800,0x113000)/File(\vmlinuz-linux)root=PARTUUID=6a2bde8c-ffff-4a72-b5a0-d7d363618c57 rw initrd=\amd-ucode.img initrd=\initramfs-linux.img loglevel=5 amd_pstate=active
Boot0003* UEFI: Generic Flash Disk 8.07 PciRoot(0x0)/Pci(0x1,0x2)/Pci(0x0,0x0)/Pci(0x8,0x0)/Pci(0x0,0x1)/USB(1,0)

Boot0002 is what failed to find PARTUUID=6a2...

$ btrfs filesystem show
Label: 'skhynix' uuid: 81e882c8-7cec-47cb-bd51-a00ed4ed2302
Total devices 2 FS bytes used 839.06GiB
devid 1 size 1.82TiB used 1.17TiB path /dev/nvme0n1p2
devid 2 size 1.82TiB used 1.17TiB path /dev/nvme1n1p2

$ blkid
/dev/nvme0n1p2: LABEL="skhynix" UUID="81e882c8-7cec-47cb-bd51-a00ed4ed2302" UUID_SUB="405073ec-a222-460f-baa8-767fcb5587f9" BLOCK_SIZE="4096" TYPE="btrfs" PARTLABEL="Linux filesystem" PARTUUID="6a2bde8c-ffff-4a72-b5a0-d7d363618c57"
/dev/sdb2: BLOCK_SIZE="512" UUID="FC2836D72836911E" TYPE="ntfs" PARTUUID="db13a695-564d-11ed-975d-b534cf79e5c0"
/dev/sdb3: BLOCK_SIZE="512" UUID="68D26D0BD26CDF36" TYPE="ntfs" PARTUUID="db13a696-564d-11ed-975d-b534cf79e5c0"
/dev/sdb1: UUID="060E-984C" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="db13a694-564d-11ed-975d-b534cf79e5c0"
/dev/nvme2n1p1: LABEL="scratch" UUID="acade821-19ba-458f-a164-d4f40b4c63eb" UUID_SUB="b18f16d4-0352-4309-a429-aaed7174ff77" BLOCK_SIZE="4096" TYPE="btrfs" PARTUUID="94c7f3d1-724a-ab48-9023-189533744674"
/dev/nvme1n1p2: LABEL="skhynix" UUID="81e882c8-7cec-47cb-bd51-a00ed4ed2302" UUID_SUB="6c6c3f69-bb06-408a-aa12-e55fda01042b" BLOCK_SIZE="4096" TYPE="btrfs" PARTLABEL="Linux filesystem" PARTUUID="a650997a-f9eb-4046-849d-69a778212019"
/dev/nvme1n1p1: UUID="8C17-69AB" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI system partition" PARTUUID="03944fc5-fcc8-4d4e-b944-40a878f10c5c"
/dev/sda1: LABEL="streams" UUID="4578aeb4-f786-4a1d-a66c-b31f2b0b7c77" UUID_SUB="8e3a5731-e9bd-44fd-9799-ea90d4edbe5c" BLOCK_SIZE="4096" TYPE="btrfs" PARTUUID="33a2eada-4228-4341-a83e-78774357f944"
/dev/sdc: UUID="D44C-8C99" BLOCK_SIZE="512" TYPE="vfat"
/dev/nvme0n1p1: PARTLABEL="EFI system partition" PARTUUID="0f38e1bc-be09-4ce0-8328-75fd2e4ccb0c"
Comment by Toolybird (Toolybird) - Tuesday, 24 October 2023, 21:50 GMT
Browsing the upstream repo, this [1] is the only thing that jumps out as suspicious. Just brainstorming here, but in theory you could debug this without trashing your system by booting an Arch ISO and upgrading btrfs-progs *on the live RAM system only* then execute btrfs commands (btrfs device scan, etc) to see if anything weird reveals itself. Otherwise, git-bisect as mentioned by @loqs.

[1] https://github.com/kdave/btrfs-progs/issues/630
Comment by Scott Shawcroft (tannewt) - Tuesday, 24 October 2023, 22:06 GMT
Thanks for the non-bisect suggestion. This is my main work computer so a less destructive option is definitely preferred. I'll give that a shot when I have a chance. Thanks!
Comment by loqs (loqs) - Tuesday, 24 October 2023, 22:13 GMT
The package linked below contains 6.5.3 with [1] applied which is the fix for the issue @Toolybird referenced.

https://drive.google.com/file/d/1R_o76eNH4o_d6kVb-s_JH3Ayi_2M5aRA/view?usp=sharing btrfs-progs-6.5.3-1.1-x86_64.pkg.tar.zst

[1]: https://github.com/kdave/btrfs-progs/commit/c1d297d57e1cc6ea184b1fafb2d206cc8570be15
Comment by Scott Shawcroft (tannewt) - Wednesday, 25 October 2023, 19:40 GMT
I rebooted to the live RAM system again and confirmed that 6.5.3 does have the device usage bug where 0 != 2 in my case.

I then restarted and installed the fixed version of 6.5.3 from @loqs and restarted. Unfortunately, it failed to boot. So, I booted to RAM again and downgraded back to 6.5.2 again.

So, it doesn't look like that fixes it. I'm wondering if the issue is with the boot image built after the install or the device search afterwards. I could compare boot images if that'd be helpful.
Comment by loqs (loqs) - Wednesday, 25 October 2023, 20:02 GMT
If you add btrfs-progs-6.5.3-1.1-x86_64.pkg.tar.zst to a thumb drive then boot the Arch ISO and install the package from the thumb drive can you still reproduce the device usage bug?
Comment by Scott Shawcroft (tannewt) - Wednesday, 25 October 2023, 20:06 GMT
I don't have time to try that at the moment. I think I did try that after I installed to my main system and before I rebooted. It worked then I think.
Comment by Toolybird (Toolybird) - Thursday, 26 October 2023, 05:17 GMT Comment by Scott Shawcroft (tannewt) - Thursday, 26 October 2023, 21:10 GMT
  • Field changed: Percent Complete (100% → 0%)
My boot is still broken with 6.5.3. Only the device usage command worked with the patched version.
Comment by Toolybird (Toolybird) - Thursday, 26 October 2023, 21:11 GMT
Well, unless you are prepared to pitch in and do better with the debugging steps, there is not much we can do. Nobody else seems to be having your issue...
Comment by Toolybird (Toolybird) - Thursday, 26 October 2023, 21:26 GMT
PS: I don't use EFISTUB.. but surely it should be possible to keep around a "known good" kernel image for emergency booting? In other words, if testing out changes that fail, you can use the mobo boot menu to boot into your "good" image?
Comment by Tobias Powalowski (tpowa) - Friday, 27 October 2023, 09:13 GMT
Joining the btrfs-devel libera channel can help to get debugging help in real time.
Comment by Toolybird (Toolybird) - Sunday, 29 October 2023, 20:03 GMT
There is no Arch packaging issue here and this report is the only single instance. Please move discussion to the more appropriate support channels (Forum/IRC/Mailing Lists/Reddit/etc) for troubleshooting/debugging assistance.

It's either:
1. an upstream issue that needs to be identified and reported upstream (git bisect to identify the problem)
2. a configuration issue on your machine triggered by 6.5.3
3. some unknown situation that needs to be figured out (in the support channels)

"I don't have time" responses are not helpful in a community based distro.

Loading...