FS#49530 - [lvm2] lvm2-pvscan errors during system boot
Attached to Project:
Arch Linux
Opened by Bernard Michaud (polaris6262) - Tuesday, 31 May 2016, 23:19 GMT
Last edited by freswa (frederik) - Sunday, 06 September 2020, 16:36 GMT
Opened by Bernard Michaud (polaris6262) - Tuesday, 31 May 2016, 23:19 GMT
Last edited by freswa (frederik) - Sunday, 06 September 2020, 16:36 GMT
|
Details
lvm2 versions over 150 report pvscan errors during system
boot. The following failures are reported by systemd:
UNIT LOAD ACTIVE SUB DESCRIPTION ● lvm2-pvscan@259:11.service loaded failed failed LVM2 PV scan on device 259:11 ● lvm2-pvscan@259:14.service loaded failed failed LVM2 PV scan on device 259:14 ● lvm2-pvscan@259:17.service loaded failed failed LVM2 PV scan on device 259:17 ● lvm2-pvscan@259:2.service loaded failed failed LVM2 PV scan on device 259:2 ● lvm2-pvscan@259:20.service loaded failed failed LVM2 PV scan on device 259:20 ● lvm2-pvscan@259:5.service loaded failed failed LVM2 PV scan on device 259:5 ● lvm2-pvscan@259:8.service loaded failed failed LVM2 PV scan on device 259:8 ● lvm2-pvscan@8:12.service loaded failed failed LVM2 PV scan on device 8:12 ● lvm2-pvscan@8:15.service loaded failed failed LVM2 PV scan on device 8:15 ● lvm2-pvscan@8:3.service loaded failed failed LVM2 PV scan on device 8:3 ● lvm2-pvscan@8:6.service loaded failed failed LVM2 PV scan on device 8:6 I have 12 volume groups each comprising one pv, and there is a scan failure for every one of them. The problem started when upgrading to lvm2-2.02.153-1, and immediately disappears when downgrading back to lvm2-2.02.150-1. All of these exist, and as far as I can figure out there is nothing wrong with any of them. This is the output from one failed scan's status: May 31 16:55:07 tovilyis.excom.com systemd[1]: Starting LVM2 PV scan on device 8:6... May 31 16:55:08 tovilyis.excom.com lvm[1013]: Concurrent lvmetad updates failed. May 31 16:55:08 tovilyis.excom.com lvm[1013]: Failed to update cache. May 31 16:55:08 tovilyis.excom.com systemd[1]: lvm2-pvscan@8:6.service: Main process exited, code=exited, status=5/NOTINSTALLED May 31 16:55:08 tovilyis.excom.com systemd[1]: Failed to start LVM2 PV scan on device 8:6. May 31 16:55:08 tovilyis.excom.com systemd[1]: lvm2-pvscan@8:6.service: Unit entered failed state. May 31 16:55:08 tovilyis.excom.com systemd[1]: lvm2-pvscan@8:6.service: Failed with result 'exit-code'. The problem persists with the recently-released lvm2-2.02.154-3. |
This task depends upon
Closed by freswa (frederik)
Sunday, 06 September 2020, 16:36 GMT
Reason for closing: Fixed
Additional comments about closing: https://github.com/archlinux/svntogit-pa ckages/commit/177624fb6c30271e04104df40e 1e772f61fca3e0#diff-8d0411b338c83cd8cd8a d9d9db127101
Sunday, 06 September 2020, 16:36 GMT
Reason for closing: Fixed
Additional comments about closing: https://github.com/archlinux/svntogit-pa ckages/commit/177624fb6c30271e04104df40e 1e772f61fca3e0#diff-8d0411b338c83cd8cd8a d9d9db127101
https://bugzilla.redhat.com/show_bug.cgi?id=1334063#c16
More importantly, it does mention that fixes are forthcoming with commits due to appear in lvm2 v2.02.155. We'll just have to wait and see, won't we?
Good catch Siot!
Should be fixed by these commits (will appear in lvm2 v2.02.155):
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=7fd4119d24eee55323e888dd78c70257d7c97a15
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=591ef307b30c2d828b2a0c59918203f970974bbb
In lvm2-2.02.154-3
7fd4119d24eee55323e888dd78c70257d7c97a15 is included as https://git.archlinux.org/svntogit/packages.git/tree/trunk/02-liblvm_config_settings.patch?h=packages/lvm2
591ef307b30c2d828b2a0c59918203f970974bbb is included as https://git.archlinux.org/svntogit/packages.git/tree/trunk/03-lvm2_activation_generator.patch?h=packages/lvm2
Here are some of the warnings being posted:
Jun 14 16:00:55 tovilyis.excom.com lvm[1293]: Concurrent lvmetad updates failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1293]: WARNING: Not using lvmetad because cache update failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1293]: WARNING: Autoactivation reading from disk instead of lvmetad.
Jun 14 16:00:55 tovilyis.excom.com lvm[1290]: Concurrent lvmetad updates failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1290]: WARNING: Not using lvmetad because cache update failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1290]: WARNING: Autoactivation reading from disk instead of lvmetad.
Jun 14 16:00:55 tovilyis.excom.com lvm[1322]: Concurrent lvmetad updates failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1322]: WARNING: Not using lvmetad because cache update failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1322]: WARNING: Autoactivation reading from disk instead of lvmetad.
Jun 14 16:00:55 tovilyis.excom.com lvm[1316]: Concurrent lvmetad updates failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1316]: WARNING: Not using lvmetad because cache update failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1316]: WARNING: Autoactivation reading from disk instead of lvmetad.
Jun 14 16:00:55 tovilyis.excom.com lvm[1318]: Concurrent lvmetad updates failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1318]: WARNING: Not using lvmetad because cache update failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1318]: WARNING: Autoactivation reading from disk instead of lvmetad.
Jun 14 16:00:55 tovilyis.excom.com lvm[1272]: Concurrent lvmetad updates failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1272]: WARNING: Not using lvmetad because cache update failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1272]: WARNING: Autoactivation reading from disk instead of lvmetad.
Jun 14 16:00:55 tovilyis.excom.com lvm[1321]: Concurrent lvmetad updates failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1321]: WARNING: Not using lvmetad because cache update failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1321]: WARNING: Autoactivation reading from disk instead of lvmetad.
Jun 14 16:00:55 tovilyis.excom.com lvm[1276]: Concurrent lvmetad updates failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1276]: WARNING: Not using lvmetad because cache update failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1276]: WARNING: Autoactivation reading from disk instead of lvmetad.
Jun 14 16:00:55 tovilyis.excom.com lvm[1275]: Concurrent lvmetad updates failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1275]: WARNING: Not using lvmetad because cache update failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1275]: WARNING: Autoactivation reading from disk instead of lvmetad.
Jun 14 16:00:55 tovilyis.excom.com lvm[1286]: Concurrent lvmetad updates failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1286]: WARNING: Not using lvmetad because cache update failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1280]: Concurrent lvmetad updates failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1280]: WARNING: Not using lvmetad because cache update failed.
Jun 14 16:00:55 tovilyis.excom.com lvm[1280]: WARNING: Autoactivation reading from disk instead of lvmetad.
Jun 14 16:00:55 tovilyis.excom.com lvm[1286]: WARNING: Autoactivation reading from disk instead of lvmetad.
Jun 14 16:01:46 tovilyis.excom.com lvm[1280]: 9 logical volume(s) in volume group "archvg" now active
It took almost an entire minute for all of my logical volumes to be processed, with messages about many "jobs" being run to perform scans appearing on the console. No errors, yes, but it added more than a minute to lv processing whereas this used to take only a few seconds. Arch has always booted much faster than most of my many other Linux distros, but this lvm issue has significantly eroded this.
Version lvm2 2.02.156-1 is currently in testing. Hopefully it will fix this remaining problem. Until then, I can't consider the issue resolved.
$ journalctl -b --no-hostname -u *lvm* --no-pager > lvm2-2.02.156-1
I have one LVM cached volume.
These are the messages from journalctl:
Jun 16 14:18:41 systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
Jun 16 14:18:41 systemd[1]: Starting LVM2 PV scan on device 8:12...
Jun 16 14:18:41 systemd[1]: Starting LVM2 PV scan on device 8:15...
Jun 16 14:18:42 systemd[1]: Starting LVM2 PV scan on device 259:2...
Jun 16 14:18:42 systemd[1]: Starting LVM2 PV scan on device 259:5...
Jun 16 14:18:42 systemd[1]: Starting LVM2 PV scan on device 259:8...
Jun 16 14:18:42 systemd[1]: Starting LVM2 PV scan on device 259:11...
Jun 16 14:18:42 systemd[1]: Starting LVM2 PV scan on device 259:14...
Jun 16 14:18:42 systemd[1]: Starting LVM2 PV scan on device 259:17...
Jun 16 14:18:42 systemd[1]: Starting LVM2 PV scan on device 8:3...
Jun 16 14:18:42 systemd[1]: Starting LVM2 PV scan on device 259:20...
Jun 16 14:18:42 systemd[1]: Starting LVM2 PV scan on device 8:6...
Jun 16 14:18:43 systemd[1]: Started LVM2 metadata daemon.
Jun 16 14:18:43 systemd[1]: Starting LVM2 PV scan on device 8:9...
Jun 16 14:18:43 lvmetad[1342]: Cannot lock lockfile [/run/lvmetad.pid], error was [Resource temporarily unavailable]
And here are the scan processes, all apparently frozen:
root 76 1 0 14:18 ? 00:00:00 lvmetad
root 143 1 0 14:18 ? 00:00:00 (lvm2) pvscan --background --cache --activate ay --major 8 --minor 9
root 145 1 0 14:18 ? 00:00:00 (lvm2) pvscan --background --cache --activate ay --major 8 --minor 3
root 147 1 0 14:18 ? 00:00:00 (lvm2) pvscan --background --cache --activate ay --major 8 --minor 6
root 148 1 0 14:18 ? 00:00:00 (lvm2) pvscan --background --cache --activate ay --major 8 --minor 15
root 1311 1 0 14:18 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 8:12
root 1314 1 0 14:18 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 8:15
root 1320 1 0 14:18 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 259:2
root 1321 1 0 14:18 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 259:5
root 1327 1 0 14:18 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 259:8
root 1330 1 0 14:18 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 259:11
root 1331 1 0 14:18 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 259:14
root 1332 1 0 14:18 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 259:17
root 1333 1 0 14:18 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 8:3
root 1334 1 0 14:18 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 259:20
root 1336 1 0 14:18 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 8:6
root 1344 1 0 14:18 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 8:9
So, back to 2.02.150-1. I really hope this stabilises soon.
● Multiple sets of warnings from several concurrent processes on the unavailability of lvmetad for autoactivation;
● Several internal errors from lvmetad referring to a bad format string;
● Two duplicate sets of LVM2 PV scans started messages from systemd;
● Multiple sets of duplicate volume group active messages, apparently one set for each lvm subprocess, thus in this case 12 sets for 12 volume groups.
I do want to stress that version 2.02.157-1 appears to be functional as opposed to the preceding four, and some of the output outlined above may be normal, so it is definitely a step in the right direction. I am reverting back once more to 2.02.150-1 but I am hopeful that the next iteration may be the right one, and I am appreciative of the work being done to fix these issues.
WARNING: Not using lvmetad because cache update failed.
Update after input this in Terminal an Reboot the Problem is solved for me.
sudo /usr/bin/lvm pvscan --cache --activate ay
Persists in 2.02.158-1
root 76 1 0 11:17 ? 00:00:00 lvmetad
root 141 1 0 11:17 ? 00:00:00 (lvm2) pvscan --background --cache --activate ay --major 8 --minor 12
root 144 1 0 11:17 ? 00:00:00 (lvm2) pvscan --background --cache --activate ay --major 8 --minor 9
root 1101 1 0 11:17 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 8:15
root 1103 1 0 11:17 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 259:2
root 1106 1 0 11:17 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 8:12
root 1108 1 0 11:17 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 259:8
root 1109 1 0 11:17 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 259:5
root 1122 1 0 11:17 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 259:11
root 1123 1 0 11:17 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 259:14
root 1143 1 0 11:17 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 259:20
root 1144 1 0 11:17 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 8:3
root 1147 1 0 11:17 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 8:6
root 1184 1 0 11:17 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 259:17
root 1188 1 0 11:17 ? 00:00:00 /usr/bin/lvm pvscan --cache --activate ay 8:9
So, once again I'm backing off 2.02.160-1 and reverting back to 2.02.158-1.
You could contact the upstream lvm2 developers and try to solve the issue with them.
I consider lvm2 and device-mapper to be system-level components as opposed to strictly application software, something that should run unobtrusively in the background, so with all due respect I wish the lvm2 developers could be made to consult this bug report rather than my own humble self having to start the whole bug reporting process all over again with the lvm2 developers. I'm willing to assist by testing new releases but I'm hesitant in taking on the responsibility of having the re-written logical volume manager system software fixed when I have no clue how the new software is even broken in the first place. You would think that if anyone should know what this is all about, it would be the lvm2 developers themselves, right?
Info about my setup (note that I'm currently using 2.02.150-1):
http://pastebin.com/3A4MjTWw
Versions 2.02.153-1 and 2.02.154-3 were outright failures, as the system wouldn't even successfully boot. Version 2.02.155-1 and the version I'm currently running, 2.02.158-1, produce many error messages such as:
Jul 11 21:32:42 lvm[1428]: WARNING: lvmetad is being updated, retrying (setup) for 10 more seconds.
Jul 11 21:32:43 lvmetad[1354]: Internal error: Bad format string at 'update in progress'
Jul 11 21:32:43 lvm[1348]: WARNING: lvmetad is being updated by another command (pid 1340).
Jul 11 21:32:43 lvm[1348]: WARNING: Not using lvmetad because cache update failed.
Jul 11 21:32:43 lvm[1348]: WARNING: Autoactivation reading from disk instead of lvmetad.
Jul 11 21:32:44 lvm[1428]: WARNING: lvmetad is being updated, retrying (setup) for 9 more seconds.
Jul 11 21:32:46 lvm[1428]: WARNING: lvmetad is being updated, retrying (setup) for 7 more seconds.
which probably explain the very long boot times, but they appear to mostly work, just slowly. Version 2.02.156-2 and the newly released version 2.02.160-1 both have issues with hanging pvscan processes and none of the lv management commands work.
I'm hoping that someone more knowledgeable with lvm than myself can report this to the lvm2 developers if they aren't already aware of this situation. True, I can hold off updating the software as long as I'm able to, but I'd prefer for the problem to be fixed as a long-term strategy.
I've been strugling with this bug for day or two. I've never seen this bug before, but I'm using lvm2 + arch for over the half year.
As soon as I've merged snapshot volume with the original one - the bug simply dissapeared.
Maybe it's just a coincidence.
ago 07 10:19:11 localhost lvmetad[75]: unlink failed: No such file or directory
ago 07 10:17:41 localhost lvm[150]: m: refresh before autoactivation failed.
ago 07 10:17:41 localhost lvm[150]: Refusing activation of partial LV m/lvh. Use '--activationmode partial' to override.
ago 07 10:17:41 localhost lvm[150]: 1 logical volume(s) in volume group "m" now active
ago 07 10:17:41 localhost lvm[150]: m: autoactivation failed.
But the lvm process scans and reescans until almost all was activated.
Maybe it was because some pv were raid volumes. But I wish lvm cache and raid :'(
lvm2 2.02.162-1
I'm currently using the latest version, 2.02.164-1. This has apparently stabilised the software to the point where I haven't seen any lvm-related errors at boot time in a while. The problem with the hanging pvscan processes also seems to have been resolved. What I'm left with however is a much slower booting system than before. By the way, I'm not using any "exotic" features such as lvm cache or snapshots.
Tried a simple test; downgrading both lvm2 and device-mapper packages back to 2.02.150-1 (the last *good* release), all else being equal, the *slowest* boot time I've recorded was 23 seconds (Arch has always been one of the fastest booting distros, and that's on a non-SSD 6TB SATA 6G disk). After again upgrading both packages to the latest version, 2.02.164-1, the *fastest* boot time I could manage was 56 seconds; in some cases, it was well over a minute and a half. Again, this was by only changing the lvm software.
I'm not sure what the intent was in making such a sweeping change to the lvm2 software. Usually, changes to such core system components involves making them safer or else improving general performance, or even possibly adding necessary or requested features. Rarely have I seen updated core system software consistently degrade performance like this while seemingly providing no discernable benefits or improvements. It took 12 releases (from 2.02.153-1 on May 12 to 2.02.164-1 on August 15) just to make it usable.
From the way things look, this may well be the best it's going to get. At least, I'm not getting errors any more.
I'm still getting errors during boot like:
Sep 06 20:44:08 davinci systemd[1]: lvm2-pvscan@254:1.service: Failed with result 'exit-code'.
Sep 06 20:44:08 davinci systemd[1]: lvm2-pvscan@254:1.service: Unit entered failed state.
Sep 06 20:44:08 davinci systemd[1]: Failed to start LVM2 PV scan on device 254:1.
Sep 06 20:44:08 davinci systemd[1]: lvm2-pvscan@254:1.service: Main process exited, code=exited, status=5/NOTINSTALLED
Sep 06 20:44:08 davinci lvm[553]: vgdata: autoactivation failed.
Sep 06 20:44:08 davinci lvm[553]: 0 logical volume(s) in volume group "vgdata" now active
Sep 06 20:44:08 davinci lvm[553]: Refusing activation of partial LV vgdata/data. Use '--activationmode partial' to override.
Sep 06 20:44:08 davinci lvm[553]: vgdata: refresh before autoactivation failed.
Sep 06 20:44:08 davinci lvm[553]: Refusing refresh of partial LV vgdata/data. Use '--activationmode partial' to override.
Sep 06 20:44:07 davinci lvm[553]: Refusing refresh of partial LV vgdata/data. Use '--activationmode partial' to override.
Sep 06 20:44:07 davinci lvm[553]: Refusing refresh of partial LV vgdata/data. Use '--activationmode partial' to override.
Sep 06 20:44:07 davinci lvm[553]: Refusing refresh of partial LV vgdata/data. Use '--activationmode partial' to override.
Sep 06 20:44:07 davinci lvm[553]: Refusing refresh of partial LV vgdata/data. Use '--activationmode partial' to override.
Sep 06 20:44:07 davinci lvm[553]: WARNING: Device for PV 1TLQrP-Wk9c-bSDf-F1b2-3kjn-2Zso-ekyk2S not found or rejected by a filter.
Sep 06 20:44:07 davinci systemd[1]: Starting LVM2 PV scan on device 254:1...
This is with the "latest" version 2.02.164-1
I think this might be because my physical volumes are LUKS encrypted partitions. So maybe lvm is trying to assemble something before both partitions are fully decrypted. This error is not fatal as everything works as expected but I don't like to have failed units in my log ;). If I restart the unit later everything works fine, so it has to be something that is different during boot.
mar 16 15:44:31 arch.local lvmetad[541]: Internal error: Bad format string at 'update in progress'
mar 16 15:44:31 arch.local lvm[537]: WARNING: lvmetad is being updated by another command (pid 536).
mar 16 15:44:31 arch.local lvm[537]: WARNING: Not using lvmetad because cache update failed.
mar 16 15:44:31 arch.local lvm[537]: WARNING: Autoactivation reading from disk instead of lvmetad.
mar 16 15:44:31 arch.local lvm[534]: WARNING: lvmetad is being updated by another command (pid 536).
mar 16 15:44:31 arch.local lvm[534]: WARNING: Not using lvmetad because cache update failed.
mar 16 15:44:31 arch.local lvm[534]: WARNING: Autoactivation reading from disk instead of lvmetad.
Recap what is happening:
Basically systemd/lvm2 fails to assemble/detect any LVs having snapshots, being lvm-cached or (possibly also lvm-raid volumes)
The boot is delayed by 1min 30sec if root LV has snapshots. No knowledge if this happens also with cached-root-LV.
Eventually lvm2-pvscan timeouts and gets killed and boot appears to continue as normal.
If I have no snapshots active there is no delay, but I get still some warnings and excessive:
"lvm[553]: Refusing refresh of partial LV vgdata/data. Use '--activationmode partial' to override." messages in journal.
The system also by random fails to assemble the cached vghyper/ProjectData LV.
It is fixed by manually reactivating the LV.
I also had dm-event.service fail with missing executable + shared library files. I was able to fix it myself.
In /usr/lib/initcpio/install/sd-lvm2 and /usr/lib/initcpio/install/lvm2 files
I added following lines to build() function:
add_binary "/usr/bin/dmeventd"
add_binary "/usr/lib/libdevmapper-event-lvm2snapshot.so"
add_binary "/usr/lib/libdevmapper-event-lvm2mirror.so"
add_binary "/usr/lib/libdevmapper-event-lvm2.so"
And the related dmeventd errors were gone after initramfs update.
In my case with a lvm-cache configuration for a SSD/HDD combination, BUT I don't use snapshots at all.