FS#24272 - lvm2 update won't recognize external drive at boot

Attached to Project: Arch Linux
Opened by Igor Saric (karabaja4) - Saturday, 14 May 2011, 11:02 GMT
Last edited by Tom Gundersen (tomegun) - Sunday, 15 May 2011, 21:09 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Eric Belanger (Snowman)
Thomas Bächler (brain0)
Tom Gundersen (tomegun)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

With update lvm2 2.02.84-1 => 2.02.85-1 my LVM array (a bunch of PVs) broke. At boot, system complained about a missing device with a specific uuid. The device in question was my external USB drive. After boot, I did:

pvscan:
PV /dev/sda4 VG diskovi0 lvm2 [171.88 GiB / 0 free]
PV /dev/sdb1 VG diskovi0 lvm2 [149.05 GiB / 0 free]
PV /dev/sdd1 VG diskovi0 lvm2 [232.88 GiB / 0 free]
PV /dev/sdc1 VG diskovi0 lvm2 [596.17 GiB / 0 free]
Total: 4 [1.12 TiB] / in use: 4 [1.12 TiB] / in no VG: 0 [0 ]

lvscan:
inactive '/dev/diskovi0/media' [1.12 TiB] inherit

vgchange -ay diskovi0:
1 logical volume(s) in volume group "diskovi0" now active
/dev/mapper/diskovi0-media not set up by udev: Falling back to direct node creation.
The link /dev/diskovi0/media should had been created by udev but it was not found. Falling back to direct link creation.

Now this is interesting. /dev/mapper/diskovi0-media and /dev/diskovi0/media not setup by udev?
Ignoring this error, I tried to: mount /dev/diskovi0/media /media/disks/ and everything works like it should, the LV is there and functioning.

So, why the USB drive isn't recognized at boot with the new package, the old lvm2 version worked flawlessly?

All packages are from [testing], fully updated. I tried downgrading lvm2, kernel26, and just in case rebuilt initcpio again, but I got the same error. Looks like something broke permanently...
This task depends upon

Closed by  Tom Gundersen (tomegun)
Sunday, 15 May 2011, 21:09 GMT
Reason for closing:  Won't fix
Comment by Eric Belanger (Snowman) - Sunday, 15 May 2011, 03:05 GMT
It might be caused by the latest initscripts/udev update as the lvm2 version in core has been there since February. What packages have you updated since you noticed that your system wasn't booting anymore?
Comment by Tom Gundersen (tomegun) - Sunday, 15 May 2011, 09:37 GMT
Could you try downgrading your mkinitcpio (and using everything else from testing)? We had a discussion about a change there that might affect lvm, but we were unable to find any bugs.
Comment by Tom Gundersen (tomegun) - Sunday, 15 May 2011, 11:37 GMT
If downgrading mkinitcpio works, could I ask you to try making the following change in /lib/udev/rules.d/10-dm.rules:

After

ACTION!="add|change", GOTO="dm_end"

add

OPTIONS+="db_persist"

Then regenerated your initramfs and see if it works then.

I will try to see if I can reproduce your bug, but I have not been successful so far.
Comment by Igor Saric (karabaja4) - Sunday, 15 May 2011, 13:27 GMT
Ok, so...

downgrading to mkinitcpio 0.6.11-1 alone (and rebuilding initcpio) didn't do anything,
BUT downgrading mkinitcpio 0.6.11-1 + udev 167-2 fixed it (sort of).

I say sort of because, the thing is, the error was there all along, even before this bug, it's just that udev until this update did somehow manage to mount the device.
I made few screen photos to clarify the situation (because I have no idea how to capture all those framebuffer messages):

=> This is what it looked like *BEFORE* the udev 168-1 update:
http://andromeda.kiwilight.com/~dule/upload/success_mount.jpg

Notice the error after "Activating logical volumes..." but below, udev didn't complain and the device was mounted.

=> This is what it looks like *AFTER* the udev 168-1 update:
http://andromeda.kiwilight.com/~dule/upload/failure_mount.jpg

Notice the same error after "Activating logical volumes..." but udev now complains about that same device missing.

This is what I got so far, I'll report back when I test the changes Tom suggested.

Comment by Igor Saric (karabaja4) - Sunday, 15 May 2011, 13:47 GMT
Ok, I tried adding OPTIONS+="db_persist" to /lib/udev/rules.d/10-dm.rules and rebuilding mkinitcpio...

And I got the same error like in the second photo, nothing changed.
Comment by Tom Gundersen (tomegun) - Sunday, 15 May 2011, 13:54 GMT
Thanks for testing. Before going further with this we should try to understand and fix the first problem you are seeing (otherwise it will be too much guessing).

When you look in /dev/disk/by-uuid/ do you see the disk in question? If so my best guess is that the problem is that lvchange -ay is run too soon and that udev is not settled yet.

I still have not done the tests I want to do on my system, but will report back as soon as I know a bit better what happens with then new udev/mkinitcpio.
Comment by Igor Saric (karabaja4) - Sunday, 15 May 2011, 15:33 GMT
Hm, well... This is the /dev/disk/by-uuid/ list after the boot (vgchange -ay diskovi0 not run yet):

lrwxrwxrwx 1 root 10 May 15 17:29 50e64289-3a9a-42c4-a14d-ac0a9e9ac628 -> ../../sda1
lrwxrwxrwx 1 root 10 May 15 17:29 cb17406c-6def-461d-bf72-4db962c89036 -> ../../sda2
lrwxrwxrwx 1 root 10 May 15 17:29 fdff1678-861d-4ad6-bb7e-bc8886a954d7 -> ../../sda3

This is the disk list after LVM is mounted manually with vgchange -ay diskovi0 and mount command:

lrwxrwxrwx 1 root 10 May 15 17:29 50e64289-3a9a-42c4-a14d-ac0a9e9ac628 -> ../../sda1
lrwxrwxrwx 1 root 10 May 15 17:34 61175456-3e12-45b9-aa9e-e327ab58c3ec -> ../../dm-0
lrwxrwxrwx 1 root 10 May 15 17:29 cb17406c-6def-461d-bf72-4db962c89036 -> ../../sda2
lrwxrwxrwx 1 root 10 May 15 17:29 fdff1678-861d-4ad6-bb7e-bc8886a954d7 -> ../../sda3

So, to answer your question, no, I don't see the drive there, but I also don't see any other PV that's included in the LVM array, as displayed by pvscan:

PV /dev/sda4 VG diskovi0 lvm2 [171.88 GiB / 0 free]
PV /dev/sdb1 VG diskovi0 lvm2 [149.05 GiB / 0 free]
PV /dev/sdd1 VG diskovi0 lvm2 [232.88 GiB / 0 free]
PV /dev/sdc1 VG diskovi0 lvm2 [596.17 GiB / 0 free]
Total: 4 [1.12 TiB] / in use: 4 [1.12 TiB] / in no VG: 0 [0 ]
Comment by Tom Gundersen (tomegun) - Sunday, 15 May 2011, 16:18 GMT
Sorry, that was a stupid suggestion.

A couple more questions:
What is the contents of:
/proc/sys/kerne/hotplug
and
/sys/kernel/uevent_helper
?

Which kernel do you use?

Could you edit your rc.sysinit to replace
/sbin/udevadm settle --quiet --timeout=${UDEV_TIMEOUT:-30}
with
/sbin/udevadm settle
?
Does this output anything during boot (if there is a problem at this point your boot might stall for a long time, please wait)?
Comment by Tom Gundersen (tomegun) - Sunday, 15 May 2011, 16:20 GMT
Lastly:
If you keep everything up to date, except downgrade udev, does that solve the problem?
Comment by Tom Gundersen (tomegun) - Sunday, 15 May 2011, 16:44 GMT
And in case I have not asked enough questions: may I have your kernel config (I'm in particular wondering if you are using devtmpfs or not)?
Comment by Tom Gundersen (tomegun) - Sunday, 15 May 2011, 18:23 GMT
Please try to confirm with standard arch kernel.
Comment by Thomas Bächler (brain0) - Sunday, 15 May 2011, 20:31 GMT
This is most likely another one of those annoying timing problems:

The first message is normal: It can't assemble the LVM because no USB support is included in initramfs.

The second message is a weird timing problem: Now, USB is loaded, but USB storage devices take a while to activate. With the old udev, timing was different (it probably took longer to settle), so by the time you reach the vgscan/vgchange, the USB storage was ready. With the new udev, timing is probably a bit faster, so by the time you want to vgscan/vgchange, the USB device is not ready yet. I bet if you try this 20 times, it would actually work a few times.

This never worked reliably with USB devices, and I don't know how to solve it. I wouldn't rely on USB devices to be present on system init.
Comment by Tom Gundersen (tomegun) - Sunday, 15 May 2011, 21:07 GMT
What Thomas said makes sense. You might also be hit by  FS#24288  which might cause settle to finish too early, but that is a separate issue.

The fact is that there is no way of knowing when all your usb devices have been enumerated, so we cannot wait for this to happen. I'm closing as wontfix.
Comment by Tom Gundersen (tomegun) - Sunday, 15 May 2011, 21:07 GMT
PS
If you think you are hit by  FS#24288 , please comment there with answers to the questions I asked.

Loading...