FS#74888 - [linux] kernel 5.18.0: LVM volume on RAID10 based PV does not mount

Attached to Project: Arch Linux
Opened by Sachin Garg (randompie) - Sunday, 29 May 2022, 02:16 GMT
Last edited by Jan Alexander Steffens (heftig) - Monday, 30 May 2022, 19:23 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Jan Alexander Steffens (heftig)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 17
Private No

Details

Description:

I have 5 RAID device.
* 3 RAID 10 devices that server as PVs for an LVM VG.
* 2 RAID 0 devices.

Upon upgrading to kernel 5.18.0, the RAID 10 devices *no* longer get assembled. The RAID 0 devices do get assembled.

RAID 10 assembly works fine with both 5.15.43-1-lts and 5.17.9-arch1-1 *without* any configuration changes.


Additional info:
* package version(s)
Linux Kernel Package: 5.18.0-arch1-1

LVM version: 2.03.16(2) (2022-05-18)
Library version: 1.02.185 (2022-05-18)
Driver version: 4.45.0

mdadm - v4.2 - 2021-12-30


Steps to reproduce:

* Create partitions for use as Linux RAID devices
* Combine these partitions in a software RAID 10.
* Use this Software RAID 10 as a PV for LVM - create a VG and LV
* Set the LV to mount on boot (using UUID scheme in /etc/fstab)
* Add "mdadm_udev" and "lvm2" hooks in mkinitcpio.conf
* Install Linux 5.18.0-arch1-1 package and configure
* Reboot
This task depends upon

Closed by  Jan Alexander Steffens (heftig)
Monday, 30 May 2022, 19:23 GMT
Reason for closing:  Fixed
Additional comments about closing:  linux 5.18.1.arch1-1
Comment by loqs (loqs) - Sunday, 29 May 2022, 04:23 GMT
Related forum issue https://bbs.archlinux.org/viewtopic.php?id=276716

Seems to affect both RAID 10 and RAID 1 but not RAID 0 or RAID 6. LVM2 does not appear to be a common factor.
Comment by Simon Perry (pezz) - Sunday, 29 May 2022, 07:20 GMT
Can confirm, I have RAID 1 with no LVM here, does not assemble.

Sticking with 5.17.9 for now.
Comment by Marc Cousin (cousinm) - Sunday, 29 May 2022, 07:28 GMT
I'm having this on a 5 drives raid5 on top of LVM
Comment by Scott H (scott_fakename) - Sunday, 29 May 2022, 08:55 GMT
Just for clarity's sake I will say that I am the one who chimed in on the forum post who was using raid-6, and this bug is affecting me as well.
Comment by Ken (gtaluvit) - Sunday, 29 May 2022, 12:45 GMT
Same issue here with a non-boot drive RAID1 that works fine in 5.17. I have tried "mdadm" and "mdadm_udev" in mkinitcpio.conf, mdmon in the BINARIES section, and manually modprobing "md" as even with those other configurations /proc/mdstat would not exist until I loaded the module. I have also tried adding the ARRAY line explicitly to mdadm.conf and no luck. mdadm --detail --scan will not find the array even after all of that.
Comment by Todd Harbold (MrToddarama) - Sunday, 29 May 2022, 14:07 GMT
I experienced the same issue with upgrading to 5.18 - one RAID1 array that continued to start cleanly and one RAID10 array that timed out. In my particular case, removing my custom device name from the ARRAY definition and replacing it with the default /dev/md127 device name resolved the issue.

My old ARRAY definition from /etc/mdadm.conf that worked prior to the 5.18 kernel update:
ARRAY /dev/userraid10 metadata=1.2 name=phenom:UserRAID10 UUID=bee9ca99:c9a86e5e:0d3e9c1a:c5473a21

Changing the definition to the follow now works with 5.18:
ARRAY /dev/md127 metadata=1.2 name=phenom:UserRAID10 UUID=bee9ca99:c9a86e5e:0d3e9c1a:c5473a21

Hopefully this gives developers additional info to help pinpoint the root cause.
Comment by Samir (Zgembo) - Sunday, 29 May 2022, 19:46 GMT
Same here, fixed it by changing /etc/mdadm.conf entry from

ARRAY /dev/md/hostname:home devices=/dev/sda1,/dev/sdb1 metadata=1.2 name=hostname:home UUID=...

into

ARRAY /dev/md127 devices=/dev/sda1,/dev/sdb1 metadata=1.2 name=hostname:home UUID=...

Looks like that the detected array device @ 5.18 must match the identifier from the /etc/mdadm.conf.
Comment by Alex Henrie (alex.henrie) - Monday, 30 May 2022, 01:16 GMT
Same problem here with a RAID 5. I previously didn't have to edit /etc/mdadm.conf at all for /dev/md127 to come up automatically on boot.
Comment by Sachin Garg (randompie) - Monday, 30 May 2022, 02:30 GMT
Comments above by @MrToddarama, @Zgembo, and also by @alex.henrie along with the BBS comment by ecruz1986 [https://bbs.archlinux.org/viewtopic.php?pid=2038181#p2038181] are also part of the issue I experienced.

These are the lines in my /etc/mdadm.conf - all working before upgrade to 5.18 and also working now (downgraded to 5.17.9):


## Commented out because mdadm_udev is present
#DEVICE partitions
##ARRAY /dev/md/sysresccd:home metadata=1.2 name=sysresccd:home UUID=60b0eaff:867d1d1d:3b630859:855507ac
##ARRAY /dev/md/triveni:124 metadata=1.2 name=triveni:124 UUID=114bf81d:3ee322d4:24f1733a:c7bab01b
##ARRAY /dev/md/sysresccd:1 metadata=1.2 name=sysresccd:1 UUID=cf9ab491:f2d0050c:0e585f82:31573738
##ARRAY /dev/md125 metadata=1.2 name=triveni.d.navankur.net:125 UUID=0bd12c43:640ea1be:7e7de184:91c6a7fb

A scan of the devices shows up these:

$ sudo mdadm -Esv
ARRAY /dev/md/125 level=raid0 metadata=1.2 num-devices=2 UUID=0bd12c43:640ea1be:7e7de184:91c6a7fb name=triveni.d.navankur.net:125
devices=/dev/sdd3,/dev/sdc3
ARRAY /dev/md/home level=raid10 metadata=1.2 num-devices=4 UUID=60b0eaff:867d1d1d:3b630859:855507ac name=sysresccd:home
devices=/dev/sde1,/dev/sdd1,/dev/sdc1,/dev/sdb1
ARRAY /dev/md/124 level=raid10 metadata=1.2 num-devices=4 UUID=114bf81d:3ee322d4:24f1733a:c7bab01b name=triveni:124
devices=/dev/sde3,/dev/sdd2,/dev/sdc2,/dev/sdb3
ARRAY /dev/md/123 level=raid0 metadata=1.2 num-devices=2 UUID=a8f77e85:46336115:dcbd2489:be540e35 name=triveni.d.navankur.net:123
devices=/dev/sde5,/dev/sdb5
ARRAY /dev/md/122 level=raid10 metadata=1.2 num-devices=4 UUID=3e0c80ef:fca94bda:76259412:b2049262 name=triveni:122
devices=/dev/sde7,/dev/sdd6,/dev/sdc6,/dev/sdb7

With 5.18.0, when I tried to assemble the array /dev/md/home I got the error "mdadm: unexpected failure opening /dev/md127". Replacing "/dev/md/home" with "/dev/md127" - I was able to assemble and strat the array - that is what pointed me in the direction of this beinga potential kernel bug.
Comment by f (bakgwailo) - Monday, 30 May 2022, 05:24 GMT
Also have this issue booting into 5.18 - recovery emergency console with my rad10. Reboot w/ linux-lts and all fine.

if it helps:

# mdadm configuration file
#
# mdadm will function properly without the use of a configuration file,
# but this file is useful for keeping track of arrays and member disks.
# In general, a mdadm.conf file is created, and updated, after arrays
# are created. This is the opposite behavior of /etc/raidtab which is
# created prior to array construction.
#
#
# the config file takes two types of lines:
#
# DEVICE lines specify a list of devices of where to look for
# potential member disks
#
# ARRAY lines specify information about how to identify arrays so
# so that they can be activated
#


# You can have more than one device line and use wild cards. The first
# example includes SCSI the first partition of SCSI disks /dev/sdb,
# /dev/sdc, /dev/sdd, /dev/sdj, /dev/sdk, and /dev/sdl. The second
# line looks for array slices on IDE disks.
#
#DEVICE /dev/sd[bcdjkl]1
#DEVICE /dev/hda1 /dev/hdb1
#
# The designation "partitions" will scan all partitions found in
# /proc/partitions
DEVICE partitions


# ARRAY lines specify an array to assemble and a method of identification.
# Arrays can currently be identified by using a UUID, superblock minor number,
# or a listing of devices.
#
# super-minor is usually the minor number of the metadevice
# UUID is the Universally Unique Identifier for the array
# Each can be obtained using
#
# mdadm -D <md>
#
# To capture the UUIDs for all your RAID arrays to this file, run these:
# to get a list of running arrays:
# # mdadm -D --scan >>/etc/mdadm.conf
# to get a list from superblocks:
# # mdadm -E --scan >>/etc/mdadm.conf
#
#ARRAY /dev/md0 UUID=3aaa0122:29827cfa:5331ad66:ca767371
#ARRAY /dev/md1 super-minor=1
#ARRAY /dev/md2 devices=/dev/hda1,/dev/hdb1
#
# ARRAY lines can also specify a "spare-group" for each array. mdadm --monitor
# will then move a spare between arrays in a spare-group if one array has a
# failed drive but no spare
#ARRAY /dev/md4 uuid=b23f3c6d:aec43a9f:fd65db85:369432df spare-group=group1
#ARRAY /dev/md5 uuid=19464854:03f71b1b:e0df2edd:246cc977 spare-group=group1
#


# When used in --follow (aka --monitor) mode, mdadm needs a
# mail address and/or a program. To start mdadm's monitor mode, enable
# mdadm.service in systemd.
#
# If the lines are not found, mdadm will exit quietly
#MAILADDR root@mydomain.tld
#PROGRAM /usr/sbin/handle-mdadm-events
ARRAY /dev/md/raid10 metadata=1.2 name=desktop:raid10 UUID=2cf5d240:d1576c3b:b59a2e7b:1eb89875
Comment by Luc (iq2luc) - Monday, 30 May 2022, 06:44 GMT
Confirming the same issue, FakeRAID (RAID0) on Intel 82801.
Cannot boot with new kernel, everything OK with previous ones.
Comment by Christian Braun (hcb) - Monday, 30 May 2022, 08:48 GMT
Did not boot with this RAID0:
Personalities : [raid0]
md127 : active raid0 nvme3n1p1[2] nvme1n1p1[0] nvme4n1p1[3] nvme2n1p1[1]
3906514944 blocks super 1.2 512k chunks
Comment by Sid Karunaratne (sakaru) - Monday, 30 May 2022, 11:53 GMT
In my case my `/etc/mdadm.conf` did not contain any `ARRAY` line, and the linux-5.18.0 update also broke something related to the auto discovery. Adding:
`ARRAY /dev/md127 metadata=1.2 name=renegade:media UUID=7f658d31:e75ec0d2:0ba31f5c:be9fccd1`
to the config file, as well as modifying `/etc/fstab`, solved the issue. Though doesn't allow to be use the prettier `/dev/md/media` reference.

As suggested in the linked BBS post, adding `CONFIG_BLOCK_LEGACY_AUTOLOAD=Y` to the kernel config, as well as removing the `ARRAY` line from `/etc/mdadm.conf` got me back to the previous behaviour. Though it's not a viable long term solution as this kernel config option will disappear in 5.19.

So the question remains how to keep the pretty device name (i.e. `/dev/md/media` instead of `/dev/md127`) without using a soon-to-be-dropped config option.
Comment by Luc (iq2luc) - Monday, 30 May 2022, 12:36 GMT
In my case I don't care about pretty device names, I like ugly names too as long I can boot the kernel -- which is not the case with 5.18 no matter what I tried (including CONFIG_BLOCK_LEGACY_AUTOLOAD=y, configuring / not configuring mdadm.conf, whatnot...).

Edit: I tested so many combinations of kernels and configs and I forgot about the simple ones, I confirm it indeed works with the Archlinux vanilla 5.18 with properly configured mdadm.conf. No pretty names, but it boots and I'm good with it.
Comment by Cedric Roijakkers (cedricroijakkers) - Monday, 30 May 2022, 13:34 GMT
> So the question remains how to keep the pretty device name (i.e. `/dev/md/media` instead of `/dev/md127`) without using a soon-to-be-dropped config option.

I use the UUID in my fstab and that works correctly after fixing `/etc/mdadm.conf`.

So long story short, as long as you have the correct `ARRAY` line in your `/etc/mdadm.conf`, it boots just fine. From what I understand, the fallback kernel option `CONFIG_BLOCK_LEGACY_AUTOLOAD=y` will be removed in 5.19, so I'm not entirely sure this is a bug, but sadly a breaking feature.

For now, easiest fix is to :
- boot from an alternative kernel (or boot disk)
- run `mdadm --detail --scan`
- change the name of the device to `/dev/md127` or whatever the "ugly" name of your device is
- add that to `/etc/mdadm.conf`
- I ran `mkinitcpio -P` just for good measure and rebooted, and I could boot via UUID again (no need to use the "ugly" device name in any other config file).

Comment by Ken (gtaluvit) - Monday, 30 May 2022, 15:38 GMT
I decided to test this issue with the creation of arrays as well. In a VM with 5.18 I created two 1 GB disks, marked them with GPT and Linux RAID formats for a single partition and created a RAID 1 with:

mdadm --create --verbose --level=1 --metadata=1.2 --raid-devices=2 /dev/md/myarray /dev/vdb1 /dev/vdc1

This failed. I tried again with:

mdadm --create --verbose --level=1 --metadata=1.2 --raid-devices=2 /dev/md128 /dev/vdb1 /dev/vdc1

This succeeded. I then rebooted to see if the DEVICE=partitions would pick it up, without adding ARRAY to mdadm.conf and it did. mdadm --detail --scan revealed:

ARRAY /dev/md/128 metadata=1.2 name=myhostname:128 UUID=....

So it looks like autodetection DOES work, but you have to create the array without the initial name and just to an initial md device.

Loading...