FS#42377 - [lvm2] [mkinitcpio] unable to activate dm-cache root volume due to missing modules/binaries

Attached to Project: Arch Linux
Opened by Wayne Tan (flapjack0811) - Tuesday, 14 October 2014, 15:14 GMT
Last edited by Gerardo Exequiel Pozzi (djgera) - Tuesday, 29 March 2016, 02:03 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Eric Belanger (Snowman)
Thomas Bächler (brain0)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:
I'm testing out LVM + dm-cache on a VM and while setup seemed quite painless, on reboot the cached root volume could not be activated due to missing modules and binaries.

Additional info:
lvm2 2.02.111-1
mkinitcpio 18-2

Steps to reproduce:
1. Create a vanilla install of arch using 2014.10.01 ISO, all on 1 logical volume except /boot. Due to some other issue with installing Grub I left out /boot as a non-LVM partition.
2. Add a 2nd disk and follow lvmcache(7) to cache the root volume. Call mkinitcpio -p linux.
3. Create a VM snapshot and reboot. At this point I'm dropped into a recovery shell with the message "Unable to find root device '/dev/mapper/vg-root'". Manually activating the volume with "lvm lvchange -a y vg/root": "Module dm-cache not found."
4. Roll back and add dm-cache to MODULES in mkinitcpio.conf. On reboot the message is now "/usr/bin/cache_check: execvp failed: No such file or directory".
5. Roll back and add /usr/bin/cache_check binary to BINARIES. This time there's a kernel error: "device-mapper: table: 254:3: cache: Error creating cache's policy".
6. Poke around in https://aur.archlinux.org/packages/dm-cache-rootfs, have an idea to add dm-cache-mq to MODULES. System now boots successfully!

Rough ideas on how to proceed:
Merge some code from dm-cache-rootfs into mkinitcpio's lvm2 hook. I'm not sure exactly which files from upstream LVM are involved in caching, but some examples are `ls /usr/bin/cache_*`. Maybe look at other distros with lvmcache enabled as standard to see how they handle this.
This task depends upon

Closed by  Gerardo Exequiel Pozzi (djgera)
Tuesday, 29 March 2016, 02:03 GMT
Reason for closing:  Won't implement
Additional comments about closing:  OP: So there are some backward compatibility issues but I think it's beyond the scope of this task.
Comment by Wayne Tan (flapjack0811) - Tuesday, 14 October 2014, 15:28 GMT
mkinitcpio.conf prior to step 4.
Comment by Eric Belanger (Snowman) - Friday, 14 November 2014, 22:47 GMT
Should be fixed in lvm2-2.02.112-1 in [testing] repo.
Comment by Wayne Tan (flapjack0811) - Monday, 17 November 2014, 12:17 GMT
Fix works, provided that cache volume was created by lvm2-2.02.112-1. Otherwise, rebooting or even running lvs immediately after upgrading yields the following:

[root@arch ~]# lvs
LV vg/cache0 has uknown feature flags 0
Could not format metadata for VG vg.
Skipping volume group vg
Internal error: Volume Group vg was not unlocked
[root@arch ~]# systemctl daemon-reload
[root@arch ~]# systemctl restart lvm2-lvmetad
[root@arch ~]# lvs
LV vg/cache0 has uknown feature flags 0
Could not format metadata for VG vg.
Update of lvmetad failed. This is a serious problem.
It is strongly recommended that you restart lvmetad immediately.
LV vg/cache0 has uknown feature flags 0
Could not format metadata for VG vg.
Update of lvmetad failed. This is a serious problem.
It is strongly recommended that you restart lvmetad immediately.
Volume group "vg" not found

I'm guessing this may need to be looked at upstream.
Comment by Daniel Wendler (BMRMorph) - Monday, 24 November 2014, 10:02 GMT
oh man...i wish i read this bug in friday bevor i upgrade to lvm2-2.02.112-1...

I have exactly the obove descriped problem and no idea how to resolve this...
I could manage to boot the system with an linux live CD, downgrade lvm2 and rebuild the initcpio.
After that the system boot but is unable to open the cache and/or the cache drive.
I try to delete the cache lv with lvremove but got an seg fault. (it seams that all commands whit the
cache drives seg faults). So i try to upgrade to latest lvm and the obove message reapears.
So at the moment i can not get my /home to work (had to create a home to an replay backup).
So anyone any idea how to remove the cache or access the original lv?

Edit: i have to downgrade device-mapper also and so the seg faults are gone.
Comment by Fabian Zimmermann (devfaz) - Friday, 28 November 2014, 14:26 GMT
same here, install .112 and system is unable to boot.

Downgrade to .111 and everything is working as expected.
Comment by Fabian Zimmermann (devfaz) - Friday, 28 November 2014, 14:53 GMT
looks like upstream detected some big problem in .112, because .112 was removed from ftp:

LVM2.2.02.111.tgz 1453 KB 01.09.2014 00:56:00
LVM2.2.02.111.tgz.asc 1 KB 01.09.2014 00:56:00
LVM2.2.02.113.tgz 1489 KB 24.11.2014 17:49:00
LVM2.2.02.113.tgz.asc 1 KB 24.11.2014 17:49:00

I already created the current PKGBUILD with .111 and it's working, so the reason must be located in .112-upstream. Will now build with .113
Comment by Fabian Zimmermann (devfaz) - Friday, 28 November 2014, 14:56 GMT
no, problem is still there in .113.

--
[root@orange trunk]# lvs --version
LVM version: 2.02.113(2) (2014-11-24)
Library version: 1.02.92 (2014-11-24)
Driver version: 4.27.0
[root@orange trunk]# lvs
Internal error: LV vg_orange/lv_root_cache has uknown feature flags 0.
Could not format metadata for VG vg_orange.
Cannot process volume group vg_orange
Internal error: Volume Group vg_orange was not unlocked
--
Comment by Fabian Zimmermann (devfaz) - Friday, 28 November 2014, 15:36 GMT
ok, looks like they changed the internal format.

I deleted my cachepools in .111, updated to .11(2/3) and created new cachepool, now everything is working.

Looks like we have to do some kind of migration from .111-format to .112+ format.

I also wrote a small howto: http://devfaz.wordpress.com/2014/11/28/create-dm-cache-backed-logical-volume-in-3-easy-steps/
Comment by Eric Belanger (Snowman) - Sunday, 30 November 2014, 02:33 GMT
Is it fixed in lvm2-2.02.114-1 in [testing] repo?
Comment by Fabian Zimmermann (devfaz) - Sunday, 30 November 2014, 08:03 GMT
still there.

--
[root@orange trunk]# lvconvert --type cache --cachepool home_cache vg_orange/home
Configuration setting "global/sparse_segtype_default" unknown.
Logical volume vg_orange/home is now cached.
[root@orange trunk]# pacman -U *.114-1*.xz
Lade Pakete ...
Löse Abhängigkeiten auf...
Suche nach Zwischenkonflikten...

Pakete (2): device-mapper-2.02.114-1 lvm2-2.02.114-1

Gesamtgröße der zu installierenden Pakete: 4,85 MiB
Größendifferenz der Aktualisierung: 0,12 MiB

:: Installation fortsetzen? [J/n]
(2/2) Prüfe Schlüssel im Schlüsselring [###########################################] 100%
(2/2) Überprüfe Paket-Integrität [###########################################] 100%
(2/2) Lade Paket-Dateien [###########################################] 100%
(2/2) Prüfe auf Dateikonflikte [###########################################] 100%
(2/2) Überprüfe verfügbaren Festplattenspeicher [###########################################] 100%
(1/2) Aktualisiere device-mapper [###########################################] 100%
(2/2) Aktualisiere lvm2 [###########################################] 100%
Warnung: /etc/lvm/lvm.conf installiert als /etc/lvm/lvm.conf.pacnew
[root@orange trunk]# lvs
Internal error: LV vg_orange/home_cache has uknown feature flags 0.
Could not format metadata for VG vg_orange.
Cannot process volume group vg_orange
Internal error: Volume Group vg_orange was not unlocked

--

I don't think it will be fixed without pushing this bug upstream. Maybe they changed the internal dataformat?

Who is handling "migration" in such cases? Should the package upgrade the data or should lvm/device-mapper do this internally?

Comment by Kyle (2bluesc) - Friday, 12 December 2014, 06:35 GMT
Just hit this bug too on upgrade from {device-mapper,lvm2}-2.02.111-1 -> 2.02.114-1.

Rolling back to the older versions, remove the caches (writing it out to the slow disk), upgrading to 2.02.114-1 and re-creating the caches worked for me.
Comment by Michael J Evans (mjevans) - Thursday, 08 October 2015, 01:00 GMT
I ran across this bug yesterday with lvm2 2.02.125-1 (base)

The /usr/lib/initcpio/install/sd-lvm2 hook doesn't seem to generate an initramfs with all of the necessary tools/modules. I had to create the following file and add an additional hook to resolve this issue.

#!/bin/bash
# cat /etc/initcpio/install/lvm2-pdata-cache

build() {
# I did not attempt to reduce this list once I had a working initramfs
add_module dm-cache
add_module dm-cache-mq
add_module dm-cache-cleaner
add_module dm-persistent-data
add_module dm-bio-prison
add_module dm-bufio
add_module libcrc32c
add_module dm-mod
add_binary /usr/bin/pdata_tools
add_symlink /usr/bin/cache_check
add_symlink /usr/bin/cache_dump
add_symlink /usr/bin/cache_metadata_size
add_symlink /usr/bin/cache_repair
add_symlink /usr/bin/cache_restore
add_symlink /usr/bin/era_check
add_symlink /usr/bin/era_dump
add_symlink /usr/bin/era_invalidate
add_symlink /usr/bin/era_restore
add_symlink /usr/bin/thin_check
add_symlink /usr/bin/thin_dump
add_symlink /usr/bin/thin_metadata_size
add_symlink /usr/bin/thin_repair
add_symlink /usr/bin/thin_restore
add_symlink /usr/bin/thin_rmap
}

help() {
echo Adds support for dm-cache and pdata-tools, like cache_check
}
Comment by Ruthger (ruthger) - Saturday, 24 October 2015, 17:30 GMT
I have the same problem using LVM thin provisioning. I had to add dm-thin-pool to MODULES and thin_check to BIRANIES in my mkinitcpio.conf
Comment by Stefan (Aentfs) - Sunday, 13 December 2015, 15:15 GMT
It seems like the dm_cache_smq (or dm-cache-smq) module is missing from the LVM2 hook as well. SMQ is the default cache policy so it would be nice if it was included.
Comment by Olaf Leidinger (oleid) - Thursday, 24 December 2015, 14:44 GMT
I'm also hit by this issue. On every lvm2 update, I've to apply the attached patch.

(basically copy + paste of the cache_* binaries and the additional modules from the lvm2 install file in the very same folder).

Loading...