FS#32403 - [lvm2] lvm-monitoring.service seems to start too early (sometimes fails, group not ready)
Attached to Project:
Arch Linux
Opened by Andrzej Giniewicz (Giniu) - Friday, 02 November 2012, 18:05 GMT
Last edited by Gerardo Exequiel Pozzi (djgera) - Wednesday, 12 February 2014, 01:59 GMT
Opened by Andrzej Giniewicz (Giniu) - Friday, 02 November 2012, 18:05 GMT
Last edited by Gerardo Exequiel Pozzi (djgera) - Wednesday, 12 February 2014, 01:59 GMT
|
Details
Description:
It seems that (in lvm2 from testing) the lvm-monitoring.service sometimes starts too early: http://pastebin.com/9mgp67eX (cut parts I believe were interesting, I can submit full log if needed), but if I load it manually after boot, it works: Nov 02 18:57:49 gauss systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... Nov 02 18:57:49 gauss lvm[672]: 4 logical volume(s) in volume group "lvm" monitored Nov 02 18:57:49 gauss systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. |
This task depends upon
Closed by Gerardo Exequiel Pozzi (djgera)
Wednesday, 12 February 2014, 01:59 GMT
Reason for closing: No response
Additional comments about closing: No response in more than one year.
Wednesday, 12 February 2014, 01:59 GMT
Reason for closing: No response
Additional comments about closing: No response in more than one year.
Nov 08 11:10:25 gauss systemd-udevd[257]: timeout: killing '/usr/sbin/lvm pvscan --cache --activate ay --major 9 --minor 1' [380]
Nov 08 11:10:26 gauss systemd-udevd[192]: worker [257] /devices/virtual/block/md1 timeout; kill it
Nov 08 11:10:26 gauss systemd-udevd[192]: seq 2026 '/devices/virtual/block/md1' killed
Nov 08 11:10:26 gauss systemd-udevd[192]: worker [257] terminated by signal 9 (Killed)
now, I had boots whit 4 dom0-* devices, also had boots with 5 devices dom0-* + one other, or 6 devices (all). It might be, that it is not lvm-monitoring that was at fault. Any idea how to proceed with this one? I cannot find exact moment when this message started to show, but it wasn't long ago, maybe few days.
And here goes lsblk: http://pastebin.com/VuHbNpkw
those extra, not dom0-* lvms were made by libvirt. This output is from after I made vgchange -ay from console.
When I boot and the detection hangs, lsblk looks like this: http://pastebin.com/u07Q4327
First of all, the journal output suggests two different types of failures.
Second, are there any other udev rules in play besides the one from the LVM package? Are these simply LVM snapshots, or is there more going on? (The lsblk output is really confusing to me). Maybe there is a problem with monitoring after all: Can you disable lvm-monitoring.service and start it manually once the system booted, and see if the problem disappears?
LV snapshot status active destination for sage-fs_vorigin
and it has enabled COW table:
LV Size 100,00 GiB
Current LE 25600
COW-table size 20,00 GiB
COW-table LE 5120
it looks like this probably because virtual disc in libvirt were created with option to preallocate 20GB but reserve up to 100GB.
Can you determine what triggers this failure? You say this works most of the time. Also, these volumes should be activated in initramfs already (otherwise you couldn't boot from /), so why do these failures appear so late in the boot process?
Another idea: Can you get more debug output? Copy /usr/lib/systemd/systemd/lvmetad.service to /etc/systemd/system/ and add '-l all' to the arguments of lvmetad? Also, copy /usr/lib/udev/rules.d/69-dm-lvm-metad.rules to /etc/udev/rules.d/ and add the '-d' option to the pvscan command. This should generate much more output.
Doing vgchange -ay results in
6 logical volume(s) in volume group "lvm" now active
but takes quite long time:
time vgchange -ay
6 logical volume(s) in volume group "lvm" now active
real 0m26.103s
user 0m0.017s
sys 0m0.297s
This time does not look good, seems to be related.
Those informations about monitoring/not being able to close are from when I disabled lvm-monitoring.service and rebooted. Anyway, when checking dmeventd info for single boot:
journalctl -b --no-pager _SYSTEMD_UNIT=dmeventd.service + _SYSTEMD_UNIT=dmeventd.socket
gives absolutely no output (other than "logs begin at ...")
[root@gauss ~]# journalctl -b --no-pager _SYSTEMD_UNIT=systemd-udevd.service
-- Logs begin at czw, 2012-09-20 11:03:36 CEST, end at czw, 2012-11-29 10:18:15 CET. --
lis 29 10:17:55 gauss systemd-udevd[245]: conflicting device node '/dev/mapper/lvm-dom0--fs' found, link to '/dev/dm-0' will not be created
lis 29 10:17:55 gauss systemd-udevd[256]: conflicting device node '/dev/mapper/lvm-dom0--root' found, link to '/dev/dm-2' will not be created
lis 29 10:17:55 gauss systemd-udevd[257]: conflicting device node '/dev/mapper/lvm-dom0--swap' found, link to '/dev/dm-3' will not be created
lis 29 10:17:55 gauss systemd-udevd[258]: conflicting device node '/dev/mapper/lvm-dom0--virtuals' found, link to '/dev/dm-4' will not be created
lis 29 10:17:56 gauss systemd-udevd[254]: conflicting device node '/dev/mapper/lvm-dom0--home' found, link to '/dev/dm-1' will not be created
I noticed, that those /dev/dm-* are in system, looks like they are created earlier:
[root@gauss ~]# journalctl -b --no-pager | grep dm-0
lis 29 10:17:54 gauss kernel: EXT4-fs (dm-0): mounting ext3 file system using the ext4 subsystem
lis 29 10:17:54 gauss kernel: EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
lis 29 10:17:54 gauss kernel: EXT4-fs (dm-0): re-mounted. Opts: data=ordered
lis 29 10:17:55 gauss systemd-udevd[245]: conflicting device node '/dev/mapper/lvm-dom0--fs' found, link to '/dev/dm-0' will not be created
Anyway, what do you think, could this somehow cause issues when instead of only mount device, lvm would also have to setup snapshot and origin like in devices created by libvirt?
The errors you are seeing should not happen, and I haven't seen them with non-broken LVM builds for years. Other than a broken or very outdated initramfs (especially outdated lvm/device-mapper tools), I have no idea where these errors would come from.