FS#39896 - [lvm2] regression (v2.02.105->v2.02.106): systemd fails to detect swap

Attached to Project: Arch Linux
Opened by Ronald (Rexilion) - Wednesday, 16 April 2014, 10:10 GMT
Last edited by Doug Newgard (Scimmia) - Monday, 06 July 2015, 04:34 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Eric Belanger (Snowman)
Thomas Bächler (brain0)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 4
Private No

Details

I have swap residing in LVM and it fails to work with lvm2 2.02.106. The previous version (2.02.105) works. The author suggests some more options are enabled to make it work (I have not verified it yet).

https://bugzilla.redhat.com/show_bug.cgi?id=1087586#c12

I tried to rebuild the package with the mentioned buildflag. But it seems that ARCH is using *a lot* of specific config files to make everything work.

It seems that the lvm2 PKGBUILD is used to build two packages. Hence, they are seperated. On top of that, many critical config files have modifications not residing in the upstream packages.

I'm seeing this issue on two machines. One with stock kernel and one with a custom kernel. One has a 64bit kernel and 32bit userspace. The other is pure 32bit. Former is hyperthreading, the latter is UP. So I think we can rule out any race conditions or 32/64 bit bugs.
This task depends upon

Closed by  Doug Newgard (Scimmia)
Monday, 06 July 2015, 04:34 GMT
Reason for closing:  No response
Comment by Thomas Bächler (brain0) - Wednesday, 16 April 2014, 15:16 GMT
Last time I checkted, the units in the upstream package contained lots of redhat-isms and referred to units that do not exist in upstream systemd. That's why I wrote my own units, which are way simpler than what lvm upstream provides.
Comment by Ronald (Rexilion) - Wednesday, 16 April 2014, 15:26 GMT
Upstream is interested in your opinion. They strive to use unified configuration files across all distributions.

The example I gave in #19 does not indicate a replacement of RedHat specific bits, but I could be wrong. Did not check that much.

Comment #21 indicates that a lot of testing goes into these sensitive settings. I want to help unifying these configuration files with the little knowledge I have.
Comment by Thomas Bächler (brain0) - Wednesday, 16 April 2014, 15:33 GMT
I just went over the comments from the redhat bug report and I think I know what I should do. I am unsure when I'll have the time, but I'll keep you updated.
Comment by Ronald (Rexilion) - Wednesday, 16 April 2014, 18:57 GMT
Splendid! Let me know when you need victims ... erm testers :) .
Comment by Rasmus Edgar (ashren) - Friday, 25 April 2014, 08:59 GMT
This regression appears to have caused my separate logical volume for /usr to fail to mount. Downgrading to lvm2 2.02.105 made it work again.
Comment by Thomas Bächler (brain0) - Saturday, 26 April 2014, 12:47 GMT
Please test lvm2 2.02.106-2 in testing. It now uses the systemd unit provided from upstream. I doubt that it fixes your problems, but at least you can now provide proper information to the upstream bug report.
Comment by Ronald (Rexilion) - Sunday, 27 April 2014, 08:49 GMT
Thank you for your time and especially your effort (judging from the commit log).

It does not fix my issue, I'll report back to the upstream developers. Let them have their spin with it.

Again, thank you.
Comment by Dark (Dark) - Wednesday, 30 April 2014, 04:57 GMT
I can also confirm this bug - it rendered my Arch box unbootable even in systemd rescue mode. I have /home on lvm+md, everything else on a basic partition. The boot process almost consistently hung (with 1 in ~20 boots succeeding - race condition?) on trying to fsck (or mount, with fsck disabled) the /home partition. Downgrading to 2.02.105-2 fixed the issue.

I question the severity of the bug report - looking at the forums it's not just me that has been left with an unbootable system since this update. Does this not qualify for critical status?
Comment by Ronald (Rexilion) - Wednesday, 30 April 2014, 08:03 GMT
@Dark & @ashren: did you try the new testing package as suggested in this bug?

Maybe I should open a new RH bug...
Comment by Thomas Bächler (brain0) - Wednesday, 30 April 2014, 10:38 GMT
You should definitely reopen the existing bug report that got closed, as this is clearly not a packaging problem. I am not of much help, since I cannot reproduce this problem.

You could post the output of systemctl status lvm2-pvscan@a:b.service (use tab completion on the status command to find out the correct a:b). For me, this returns the following (which is correct):

$ systemctl status lvm2-pvscan@8:4.service
● lvm2-pvscan@8:4.service - LVM2 PV scan on device 8:4
Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static)
Active: active (exited) since Mi 2014-04-30 10:18:01 CEST; 2h 18min ago
Docs: man:pvscan(8)
Process: 278 ExecStart=/usr/bin/lvm pvscan --cache --activate ay %i (code=exited, status=0/SUCCESS)
Main PID: 278 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/system-lvm2\x2dpvscan.slice/lvm2-pvscan@8:4.service

Apr 30 10:18:01 lije lvm[278]: 2 logical volume(s) in volume group "lije" now active
Apr 30 10:18:01 lije systemd[1]: Started LVM2 PV scan on device 8:4.

$ systemctl status lvm2-pvscan@9:0.service
● lvm2-pvscan@9:0.service - LVM2 PV scan on device 9:0
Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static)
Active: active (exited) since Mi 2014-04-30 10:18:02 CEST; 2h 18min ago
Docs: man:pvscan(8)
Process: 389 ExecStart=/usr/bin/lvm pvscan --cache --activate ay %i (code=exited, status=0/SUCCESS)
Main PID: 389 (code=exited, status=0/SUCCESS)

Apr 30 10:18:01 lije lvm[389]: 3 logical volume(s) in volume group "architect" now active
Apr 30 10:18:02 lije systemd[1]: Started LVM2 PV scan on device 9:0.

As you can see here, it lists the number of LVs that got activated. There could also be an error message here, which would be interesting.

Another question: Do you activate lvm2 in initramfs, too? If so, are you using base+udev+lvm2 or systemd+sd-lvm2 hooks?
Comment by Ronald (Rexilion) - Wednesday, 30 April 2014, 11:42 GMT
I have base+udev+lvm2. Can I safely replace those? According to the wiki, the systemd does not work.

https://wiki.archlinux.org/index.php/mkinitcpio#Common_hooks

My current layout:

HOOKS="base udev autodetect modconf lvm2 block filesystems build_system keyboard fsck shutdown"

Should that be:

HOOKS="systemd udev autodetect modconf sd-lvm2 block filesystems build_system keyboard fsck shutdown"

???

About the missing install_tmpfiles_configuration line, that was meant as an observation. I hope you did not consider that as offensive. I really appreciate your help on this.
Comment by Thomas Bächler (brain0) - Wednesday, 30 April 2014, 13:12 GMT
You can omit shutdown entirely (it has no effect with systemd, and is no longer needed with base). I have no idea what build_system is, so I can't comment on that (if it uses a run script, it won't work with systemd).

This line, for example, is correct: HOOKS="systemd autodetect modconf sd-lvm2 block filesystems keyboard fsck".

However, I am much more interested in the status output from lvm2-pvscan@.service.
Comment by Ronald (Rexilion) - Wednesday, 30 April 2014, 13:46 GMT
The build_system is a script mounts part of my partitions (/usr , /etc , /opt) on seperate btrfs lv's located on the same LVM.

As per the RH bug: Only the swap is mounted 'automatically' from an fstab. I mount the other partions by copying /etc/fstab from my lv containing /etc and then loop over each partition with a mountpoint located under /root.

So, I am suffering the same bug. I have posted build_system and fstab to help you understand my lousy explanation:

build_system http://http://pastebin.com/raw.php?i=sLfx1H4b
fstab http://pastebin.com/raw.php?i=CZUP3PMA
Comment by Ronald (Rexilion) - Wednesday, 30 April 2014, 13:51 GMT
This implies I have to write a systemD service that does the same as my build_system script and move that into the initramfs. That should do it I think.

Yeah, that should work. sd-lvm2 does not have add_runscript (like lvm2) because I take it systemD does not execute those. However, sd-lvm2 does copy over the *.service files. ... right? :)
Comment by Thomas Bächler (brain0) - Wednesday, 30 April 2014, 22:03 GMT
You should have mentioned this earlier. I looked at the systemctl output from the rh bug report again, and I noticed that _none_ of your LVM devices are registered to systemd.

Can you paste the output of: udevadm info /dev/main/swap
Comment by Ronald (Rexilion) - Thursday, 01 May 2014, 06:34 GMT Comment by Thomas Bächler (brain0) - Thursday, 01 May 2014, 08:50 GMT
I cannot see anything wrong or missing here, can you check systemctl again for the missing devices?
Comment by Ronald (Rexilion) - Friday, 02 May 2014, 19:04 GMT
This is with your testing package: http://pastebin.com/raw.php?i=PQmY2rBG

sdb .. sde are part of a broken card reader.
Comment by Ronald (Rexilion) - Monday, 05 May 2014, 13:00 GMT
I migrated to a systemd initrd which works around the problem.

It does not appear to have native support for resuming after suspend to disk. So I made something myself which does that.
Comment by Thomas Bächler (brain0) - Monday, 05 May 2014, 17:22 GMT
This is interesting. This means that the legacy initramfs has some problems that the systemd one hasn't. I have no clue right now what that would be - however, in my tests, the legacy initramfs also initialized properly.
Comment by Ronald (Rexilion) - Monday, 05 May 2014, 18:44 GMT
Yes, my thoughts exactly.

However, on the other side. I tried to setup my filesystems using my old build_system script instead of fstab entries with x-initrd.mount. This seemed to work, but then systemd-udevd hangs after switching root. It somehow fails to do a sd_notify.

Currently, it's just fstab with the x-initrd.mount option for the systems that I used to mount myself.

I also looked at dracut to reliably implement resume from disk. Would you like me to provide my findings on how to do this for the hooks in mkinitcpio? (in a seperate task ofc)
Comment by Wojo (Wojo) - Sunday, 08 June 2014, 13:17 GMT
Has there been any workaround for this issue? Entering emergency shell drives me crazy (mostly becasue of 1:30min timeout), almost every boot I have to `vgchange -ay` to boot my system properly. Thanks in advance.
Comment by Ronald (Rexilion) - Sunday, 08 June 2014, 16:55 GMT
Yes, a systemd based initrd appears to work around the issue.
Comment by Wojo (Wojo) - Wednesday, 11 June 2014, 15:38 GMT
Indeed, I switched from:
HOOKS="base udev autodetect modconf block mdadm_udev filesystems keyboard lvm2 usr fsck shutdown"
to:
HOOKS="systemd autodetect modconf sd-lvm2 block filesystems keyboard usr fsck"

and the issue is gone. Thank you.
Comment by Doug Newgard (Scimmia) - Friday, 15 May 2015, 02:23 GMT
Status?

Loading...