FS#57275 - [lvm2] PV scan in initramfs does not finish

Attached to Project: Arch Linux
Opened by nl6720 (nl6720) - Monday, 29 January 2018, 14:50 GMT
Last edited by Christian Hesse (eworm) - Thursday, 18 April 2019, 06:39 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Christian Hesse (eworm)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 9
Private No

Details

Description:
With lvm2 2.02.177-3 I can't boot with root on a thin LVM volume.

Using lvm2 mkinitcpio hook:
A start job is running for LVM PV scan on device 8:2
A start job is running for Monitorin..td or progress polling

This goes on forever.

Using sd-lvm2 mkinitcpio hook:
A start job is running for LVM PV scan on device 8:2
A start job is running for dev-mapper-VMStorage\x2dthinswap.device
A start job is running for dev-mapper-VMStorage\x2dthinroot.device

Eventually it times out on waiting for root.

Additional info:
* package version(s)
* config and/or log files etc.
lvm2 2.02.177-3

Steps to reproduce:
This task depends upon

Closed by  Christian Hesse (eworm)
Thursday, 18 April 2019, 06:39 GMT
Reason for closing:  Fixed
Additional comments about closing:  lvm2 2.02.184
Comment by Macfly (macfly) - Tuesday, 30 January 2018, 05:14 GMT
Hello,

I had the same issue this morning after upgrading from 2.02.177-1 to 2.02.177-3, startup was stuck on "start job is running...", I'm also using thin LVM volume. Downgrading to 2.02.177-1 fix it.

Regards,
Macfly
Comment by Christian Hesse (eworm) - Tuesday, 30 January 2018, 07:59 GMT
Should be fixed with lvm2 2.02.177-4.
Comment by nl6720 (nl6720) - Wednesday, 31 January 2018, 15:07 GMT
  • Field changed: Percent Complete (100% → 0%)
Can't boot with lvm2 2.02.177-4 and lvm2 mkinitcpio hook.
Comment by Christian Hesse (eworm) - Wednesday, 31 January 2018, 15:18 GMT
Uh, what? Your initial report was about sd-lvm2 hook. Please give more specific information about your setup, what works and what does not.
Comment by nl6720 (nl6720) - Wednesday, 31 January 2018, 16:34 GMT
Read again! My initial report was about "lvm2" AND "sd-lvm2" hooks, both of them.

Now with lvm2 2.02.177-4:

lvm2 hook:
A start job is running for LVM PV scan on device 8:2
A start job is running for Monitorin..meventd or progress polling

The start jobs have "no limit" so this goes on for ever and I can't boot.


sd-lvm2 hook:
A stop job is running for LVM2 metadata daemon
A stop job is running for Device-mapper event daemon

After 1m30s it times out and continues booting.
The same two stop jobs also some times show up on shutdown.
Comment by nl6720 (nl6720) - Wednesday, 31 January 2018, 16:54 GMT
My setup is a VG with two thin volumes, one for root, one for swap.
Comment by Dave Reisner (falconindy) - Wednesday, 31 January 2018, 17:16 GMT
Combining both the lvm2 and sd-lvm2 hooks makes no sense at all. Please do not do this.
Comment by nl6720 (nl6720) - Wednesday, 31 January 2018, 17:33 GMT
I didn't combine them, I'm not that dumb.

I tested with:
HOOKS=(base udev autodetect modconf keyboard keymap consolefont block lvm2 filesystems resume fsck)
that didn't work (see previous replies), changed it to:
HOOKS=(base systemd autodetect modconf keyboard sd-vconsole block sd-lvm2 filesystems fsck)
and regenerated initramfs. That eventually booted but I spent time waiting for the timeout (see previous replies).
Comment by Macfly (macfly) - Thursday, 01 February 2018, 05:55 GMT
Hello,

I also have the issue, I tried the same test done by nl6720 and have got the same result (lvm2 and sd-lvm2 hook work with sd-lvm2 after timeout of 1m29s).

My root and swap are on a classic LV, only a docker mount point is on thinpool.

The lvm start job is blocked on 8:4 and 8:5 which are the thinpool on my laptop.

If that can help.

Regards,
Comment by Christian Hesse (eworm) - Monday, 05 February 2018, 20:28 GMT
Any change if you include libgcc_s.so.1 into the initramfs? You can do so by adding this to your /etc/mkinitcpio.conf:

FILES=('/usr/lib/libgcc_s.so.1')
Comment by nl6720 (nl6720) - Tuesday, 06 February 2018, 09:09 GMT
Nothing changed with '/usr/lib/libgcc_s.so.1' in initramfs.
Comment by Christian Hesse (eworm) - Wednesday, 07 February 2018, 20:33 GMT
Can you add the option '-d' (for debug) to ExecStart= in /usr/lib/systemd/system/dm-event.service, rebuild your initramfs and see if that gives any clues?
Comment by Christian Hesse (eworm) - Wednesday, 07 February 2018, 20:36 GMT
Oh, this support increasing the detail of debug messages, so better add '-ddd'. ;)
Comment by nl6720 (nl6720) - Thursday, 08 February 2018, 16:32 GMT
This seems to be the relevant part:

feb 08 16:22:29 archvm3 lvm[193]: device-mapper: waitevent ioctl on LVM-yqgaHQyzPZ2MxsGXAe52FrBwprydojKZzIT96d25yHS9Ui9Slnek5hR2Er1n0Tid-tpool failed: Interrupted system call
feb 08 16:22:29 archvm3 lvm[193]: dm status LVM-yqgaHQyzPZ2MxsGXAe52FrBwprydojKZzIT96d25yHS9Ui9Slnek5hR2Er1n0Tid-tpool [ opencount noflush ] [16384] (*1)
feb 08 16:22:29 archvm3 lvm[193]: dm waitevent LVM-yqgaHQyzPZ2MxsGXAe52FrBwprydojKZzIT96d25yHS9Ui9Slnek5hR2Er1n0Tid-tpool [ opencount flush ] [16384] (*1)
Comment by aqua (aqua) - Friday, 09 March 2018, 21:43 GMT
After setting up lvm-cache in my arch install I had a similar problem where pvscan would hang during boot and I figured that it was due to something in mkinitcpio because I didn't have the same problem when I used an initramfs generated with dracut. Eventually I came up with a way to fix my problem.
Now, I'm fairly new to linux so I don't know if my solution will help you or even if it was the proper solution to my problem, but since it worked for me maybe it will work for you. What my solution does is add an override to the lvm2-lvmetad.service in the initramfs so that it runs only after systemd-udevd.service has finished. (If anyone has any feedback to give me regarding my fix/hack? I would highly appreciate it)

In order to use this fix extract my archive in /etc and then add lvm2-fix after sd-lvm2 in the hooks list in mkinitcpio.conf and rebuild your initramfs
Comment by nl6720 (nl6720) - Monday, 12 March 2018, 11:50 GMT
So a broken package has landed in core...


Unfortunately aqua's fix didn't work for me, nothing changed.
Comment by Noel Kuntze (thermi) - Monday, 12 March 2018, 13:15 GMT
Additionally, with 2.02.177-4, there are errors when the inicpio is built:
-> Running build hook: [sd-lvm2]
/usr/lib/initcpio/install/sd-lvm2: Zeile 14: add_systemd_unit: Kommando nicht gefunden.
/usr/lib/initcpio/install/sd-lvm2: Zeile 15: add_systemd_unit: Kommando nicht gefunden.
/usr/lib/initcpio/install/sd-lvm2: Zeile 16: add_systemd_unit: Kommando nicht gefunden.

(Command not found)
Comment by nl6720 (nl6720) - Tuesday, 13 March 2018, 10:09 GMT
With lvm2 2.02.177-5 the issue is gone. Yay!

lvm2 2.02.177-5 removed dmeventd and libdevmapper-*.so from the hooks which AFAIK were added because of  FS#49530  , so this might affect that bug report.
Comment by Macfly (macfly) - Tuesday, 13 March 2018, 13:07 GMT
Yes, same for me, it works now :)
Comment by Noel Kuntze (thermi) - Tuesday, 13 March 2018, 14:56 GMT
The problem I mentioned (add_systemd_unit not being found) persists in -5.
Comment by Christian Hesse (eworm) - Tuesday, 13 March 2018, 15:16 GMT
Do you use hook 'sd-lvm2' without 'systemd'?
Comment by Noel Kuntze (thermi) - Tuesday, 13 March 2018, 15:35 GMT
Yes and up to now, that worked just fine. It only gives that error after upgrading the lvm2 package.
Comment by Christian Hesse (eworm) - Tuesday, 13 March 2018, 15:48 GMT
And systemd comes first, sd-lvm2 later? The function add_systemd_unit() is defined in systemd hook.
Comment by eaut (eaut) - Thursday, 22 March 2018, 21:51 GMT
Aqua's fix works fine for my lvm-cache related boot problem. Thank you Aqua!
Comment by Stefan Haller (fgrsnau) - Friday, 21 December 2018, 09:59 GMT
Since upgrading to lvm2 2.02.183-1 I have exactly the same problem. The stop job for pvscan is not running successfully and I have to wait 1:30 minutes during bootup.

Downgrading to 2.02.182-1 fixes the problem. It seems that an upstream change introduces the problem for me (maybe it is caused by something else on my system). I can reproduce this on ~4 systems that use LVM and use a systemd-initrd environment.

After applying aquas workaround I restarted 5 times and it works flawlessly with lvm2 2.02.183-1 (thanks aqua!).
Comment by Tyler Foo (ghfujianbin) - Wednesday, 09 January 2019, 06:30 GMT
Having the same issue after upgrading to lvm2 2.02.183-1. Also have a lvm-cache setup.
Comment by Christian Wolf (christianlupus) - Wednesday, 23 January 2019, 14:11 GMT
I think I have the same problem here. I will look into things on the weekend.

One thing that might have helped in my case: I used the fallback initrd image. This allowed me to boot successfully. Can you verify this in your case?
Comment by Alexander Meshcheryakov (Self-Perfection) - Friday, 25 January 2019, 01:58 GMT
I was saved by aqua's fix ( https://bugs.archlinux.org/task/57275#comment167331 ) as well.

Here is my case. For several months I've experienced 90seconds lags during boot on lvmetad.service which eventually was stopped by timeout. I did not pay much attention as I rarely reboot and it was not big deal. But several days ago after pacman -Su system did not boot! I had to manually lvhcnage -an / -ay several LVs in emergency console to be able to boot. I'm not sure what caused boot failure in my case: probably creation on softraid and moving several LVs to PV on softraid mirror (this PV was not found somehow: "WARNING: Device for PV g7iq...Rs not found or rejected by a filter"), maybe enabling lvmcache on SSD or probably LVM update.

But after applying awua's fix system boots again and 90 seconds lag on lvmetad is gone.
Comment by Tyler Foo (ghfujianbin) - Thursday, 11 April 2019, 11:25 GMT
lvm2 2.02.184-1 fixed the problem for me, but 2.02.184-2 breaks it again. This time I cannot even boot into my system. Had to use the arch iso to downgrade.
Comment by Christian Hesse (eworm) - Thursday, 11 April 2019, 11:36 GMT
The change was reverted in 2.02.184-3 and 2.02.184-4 reintroduces it with a fix on top.
Comment by nl6720 (nl6720) - Friday, 12 April 2019, 10:21 GMT
I do not experience this bug with lvm2 2.02.184-4. I think this bug can be closed.

Loading...