FS#70830 - [systemd] 248-2 fails to decrypt multiple volumes with same passphrase

Attached to Project: Arch Linux
Opened by Federico (fedev) - Wednesday, 12 May 2021, 23:39 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:14 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Christian Hesse (eworm)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:

This is very similar to https://bugs.archlinux.org/task/70272. The initial difference with that ticket (which might not be so different as I'll explain) is that this Luks volume is encrypted by password and not a key-file.

Why I said that it might not be so different after all is that I have two volumes and while the first is decrypted, the second isn't. It seems the reason behind is is that systemd stores the passphrase temporarily as a file as it tries the same passphrase for different volumes. This can be seen in the logs:

May 13 07:19:18 archlinux systemd-tty-ask-password-agent[315]: Failed to send: No such file or directory
May 13 07:19:18 archlinux systemd-tty-ask-password-agent[315]: Invalid password file /run/systemd/ask-password/ask.RpMg9u
May 13 07:19:18 archlinux systemd-tty-ask-password-agent[315]: Failed to process password: No such file or directory

Additional info:
* package version(s): systemd 248.2-2 (I'll test with previous version and report back)


Steps to reproduce:

- Power-up
- Provide Luks passphrase
- Observe the drive failing to decrypt and mount
This task depends upon

Closed by  Buggy McBugFace (bugbot)
Saturday, 25 November 2023, 20:14 GMT
Reason for closing:  Moved
Additional comments about closing:  https://gitlab.archlinux.org/archlinux/p ackaging/packages/systemd/issues/6
Comment by Federico (fedev) - Thursday, 13 May 2021, 00:39 GMT
Downgrading systemd (all systemd packages) to 248-5, didn't do the trick (however it was working on that version before). Besides downgrading, I removed manually the following files and forced mkinitcpio to run again as they were throwing an error after the downgrade:

Skipping "/boot/EFI/systemd/systemd-bootx64.efi", since a newer boot loader version exists already.
Skipping "/boot/EFI/BOOT/BOOTX64.EFI", since a newer boot loader version exists already.

What did work in the end was to use snapper to revert to a previous snapshot (still with systemd at 248-5). What is curious is that at that point, while an error was still showing:

May 13 08:24:07 archlinux systemd-tty-ask-password-agent[311]: Invalid password file /run/systemd/ask-password/ask.rJg049
May 13 08:24:07 archlinux systemd-tty-ask-password-agent[311]: Failed to process password: Bad message

The system was able to unlock the drive and boot normally. Notice that the error is not the same.
Comment by Federico (fedev) - Friday, 14 May 2021, 02:41 GMT
Since my initial report, I tried upgrading back to 248.2-2 to test further. While the error is thrown into the logs consistently:

May 14 09:24:56 archlinux systemd-tty-ask-password-agent[320]: Failed to send: No such file or directory
May 14 09:24:56 archlinux systemd-tty-ask-password-agent[320]: Invalid password file /run/systemd/ask-password/ask.Uelawq
May 14 09:24:56 archlinux systemd-tty-ask-password-agent[320]: Failed to process password: No such file or directory

There are times in which all volumes end-up mounted as expected (archroot resides in a single PV, archhome is a VG that spans across 2 PVs) despite of the error. As for the times when it did fail, it seems the issue was not really that the volumes failed to decrypt (which is what I thought was going on initially). When it failed and dropped me to the emergency console, all I had to do in order to complete the successful boot was:

<code>
vgchange -ay
exit
</code>

That'd mean that the volumes were unlocked, just not activated.

So here is a comparison of when it fails against when it works:

FAILS:
May 14 09:20:43 archlinux lvm[473]: pvscan[473] PV /dev/mapper/nvme0p3crypt online, VG archhome incomplete (need 1).
May 14 09:20:48 archlinux lvm[819]: pvscan[819] PV /dev/mapper/nvme1p1crypt online, VG archhome is complete.
May 14 09:20:48 archlinux lvm[819]: pvscan[819] VG archhome run autoactivation.
May 14 09:20:48 archlinux lvm[819]: 0 logical volume(s) in volume group "archhome" now active
May 14 09:20:48 archlinux lvm[819]: archhome: autoactivation failed.

WORKS:
May 14 09:24:58 archlinux lvm[498]: pvscan[498] PV /dev/mapper/nvme0p3crypt online, VG archhome incomplete (need 1).
May 14 09:25:01 archlinux lvm[650]: pvscan[650] PV /dev/mapper/nvme1p1crypt online, VG archhome is complete.
May 14 09:25:01 archlinux lvm[650]: pvscan[650] VG archhome run autoactivation.
May 14 09:25:01 archlinux lvm[650]: 1 logical volume(s) in volume group "archhome" now active

When it does fail, I also see (these lines are not present when it works):

May 14 09:20:48 archlinux systemd[1]: lvm2-pvscan@254:4.service: Main process exited, code=exited, status=5/NOTINSTALLED
May 14 09:20:48 archlinux systemd[1]: lvm2-pvscan@254:4.service: Failed with result 'exit-code'.


Let me know if there is anything else I could be looking at or help with.
Comment by Federico (fedev) - Thursday, 03 June 2021, 00:17 GMT
I did a change which seems to be working fine for me. I was decrypting and activating all volumes by adding them to crypttab.initramfs (this was working fine until I hit the bug in question). With the current bug, only the logical volumes found in the same PV as root were decrypted and activated. As I don't really need the "archhome" VG to be available that early anyway, I moved the decryption of that volume group to crypttab. I don't get the error anymore and all volumes are decrypted and activated as expected.

I don't see the errors from "systemd-tty-ask-password-agent" anymore.

This does sound more of a work-around rather than a fix though.
Comment by Ron Waldon (jokeyrhyme) - Friday, 18 March 2022, 08:39 GMT
I'm booting from an initramfs image created by mkinitcpio 31-2 (with `HOOKS=(... sd-encrypt ...)`), with systemd 250.4-1 as the init process, on linux-5.16.14.arch1-1

I've noticed a very similar issue, and I'm not sure if this is an upstream thing or not

My setup has 2x LUKS volumes that are encrypted with different keys (in terms of cryptsetup keyslots) but these keys are protected with the same passphrase

- the initramfs boots, and prompts me to enter the passphrase

- if I type this correctly, then both volumes are unlocked and the rest of my system boots as expected, I am not prompted for a passphrase for the second volume

- however, at the first passphrase prompt, if I make a mistake, I will get per-volume passphrase prompts, so I will need to type my passphrase correctly two more times (or three more times when there are 3x LUKS volumes)

I guess I would expect to only have to enter the passphrase correctly once, no matter how many volumes I have, and no matter how many previous attempts were incorrect (as the first path seems to suggest that a correct entry will be applied to as many matching volumes as possible)

I have already used /etc/crypttab and key files (stored in /root) to defer unlocking of my LUKS volumes until later in the boot process, so I'm aware of this workaround

The 2x volumes I need to unlock during initramfs are my / (root) volume and the swap volume I use for hibernate+resume, so I cannot reduce the number of volumes any further without losing hibernate capability
Comment by soloturn (soloturn) - Wednesday, 12 April 2023, 09:38 GMT
what is the upstream bug report for this, in case this is still a problem?
Comment by Ron Waldon (jokeyrhyme) - Wednesday, 12 April 2023, 09:44 GMT
Here's _an_ upstream bug report for this: https://github.com/systemd/systemd/issues/20455

I can confirm that I was able to reproduce this just last week on systemd 253.2-2-arch and kernel 6.2.9-arch1-1
Comment by Buggy McBugFace (bugbot) - Tuesday, 08 August 2023, 19:11 GMT
This is an automated comment as this bug is open for more then 2 years. Please reply if you still experience this bug otherwise this issue will be closed after 1 month.

Loading...