Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in Unsupported. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#64871 - [mkinitcpio] incomplete upgrade leads to unbootable system

Attached to Project: Arch Linux
Opened by Mikhail Zoryn (TenShiN) - Tuesday, 17 December 2019, 10:01 GMT
Last edited by freswa (frederik) - Friday, 21 February 2020, 14:46 GMT
Task Type Bug Report
Category Packages: Core
Status Assigned
Assigned To Dave Reisner (falconindy)
Giancarlo Razzolini (grazzolini)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 3
Private No

Details

Description:
When pacman has errors during system upgrade, it doesn't trigger rest of the hooks.
Had this situation twice.
I was upgrading vivaldi browser, which has link WidevineCdm -> /opt/google/chrome/WidevineCdm in it. I replaced link with actual files of upstream Widevine and it spit some errors on the next upgrade because of file conflicts. did install package though.
HOWEVER I was upgrading linux kernel too that time, and as I can understand, hook which is responsible for removal of initramfs and vmlinuz from /boot has triggered, however, because upgrade wasn't complete successfully, it did not trigger hook that had to regenerate this files and I ended up with almost empty boot and not working system after the reboot.

Yes, I understand that I should not manipulate files that way by myself and that it was the cause of a problem, however, I don't see how ending up with broken system because of minor error during installation of minor userspace package should leave anyone with broken system.
This task depends upon

Comment by Eli Schwartz (eschwartz) - Tuesday, 17 December 2019, 13:58 GMT
Because it's an unrecoverable error.

A better question is why the mkinitcpio hook does risky things like delete the initramfs and vmlinuz in a pretransaction hook on every single kernel upgrade, but it is what it is...
Comment by Giancarlo Razzolini (grazzolini) - Friday, 21 February 2020, 15:05 GMT
I *think* the hook can be changed to posttransaction. But regardless of that, partial upgrades (or broken ones) are not supported either.
Comment by Eli Schwartz (eschwartz) - Friday, 21 February 2020, 15:18 GMT
There's a difference between "not supported" and "aggressively remove all possible recovery methods in order to rub it in the user's face".

I'm *quite* positive that the latter is not an opinion anyone here holds. :)

Just saying "it isn't supported" doesn't seem much like useful data if we are anyways going to make the failure mode degrade in a more graceful manner. No one thought it's a supported configuration, the ticket is about making it easier to recover, e.g. just re-run mkinitcpio manually.

Moving the hook to post-transaction should I think mean (unless one hook ran, but not the other) that an aborted transaction leaves behind the old, working kernel+initramfs but cannot modprobe once booted. Moving the kernel handling back into the package as we used to do would mean the user simply re-runs mkinitcpio by hand to recover.
Comment by Giancarlo Razzolini (grazzolini) - Friday, 21 February 2020, 17:00 GMT
@Eli,

You know quite well that nobody is aggressively removing the possibility, so I don't see the need to bring this up. As I've said, I *think* it can be moved to PostTransaction, but, if I recall correctly, there was one use case where it required the hook to run PreTransaction. I will confirm if that's the case.

But even if it's moved to PostTransaction but any other hook fail/breaks the upgrade, the user might end up with an unbootable system. Which is why a lot of care must be taken when touching alpm hooks.
Comment by Bartosz Tomczyk (bartekplus) - Tuesday, 03 March 2020, 14:55 GMT
It happened to me some time ago. There was power outage during the update and I end up with empty /boot (no initramfs at all - both LTS and linux initramfs were missing). I have to use live cd to chroot into and regenerate initramfs with mkinitcpio.
I have installed two kernels(linux and linux-lts) just to avoid non bootable PC, but in this case, it's useless.

I think that updates should be atomic. We should avoid situation where we left unbootable PC. Can we generate new initrafmfs first and then just do an atomic update( with mv?) ?
Comment by Giancarlo Razzolini (grazzolini) - Tuesday, 03 March 2020, 15:01 GMT
@Bartosz

This could've happened regardless if the kernel is copied by a hook or installed by the package. The only way to solve this would be to have kernel versioning or, at minimum, some kernel dance where the hook copies the old kernel and initramfs to something else and install the new one, which I don't really like and could also have the same issue, if the interruptions happens at the worse time.
Comment by Bartosz Tomczyk (bartekplus) - Tuesday, 03 March 2020, 15:41 GMT
@grazzolini

I got it, but the current design is broken. And having multiple kernel packages don't help. I think most people use -lts kernel as failsave fallback, but it don't help at all. It would be great to workaround the issue somehow(upgrade linux packages one by one not all at once), or at least print warning some warning.
Comment by Eli Schwartz (eschwartz) - Tuesday, 03 March 2020, 15:54 GMT
That isn't entirely true, because people did not have these issues with nearly as much regularity back when the kernel was installed by the package and the initramfs was never deleted, only updated in place.

The problem here is that the window of opportunity for breakage increased drastically from "if something goes wrong in between the kernel package stage of the transaction and the mkinitcpio hook" to "if something goes wrong at any point in time". Because the working kernel and working initramfs is deleted before the transaction even started, for every upgrade.

Furthermore, with the old, correct method of updating the initramfs and only deleting it when the package itself is removed (via a pre/post _remove scriptlet, but Type = Package would work equally well, as would Type = File; Target = /boot/vmlinuz-*), a kernel that had e.g. the in-use filesystem driver for the rootfs as a builtin could boot into at least a recovery shell in the initramfs and might be able to repair things... even if the kernel was updated but pacman (or even the operating system?) crashed before rebuilding the initramfs. Admittedly, a custom kernel that has the filesystem drivers as builtins could ignore all mkinitcpio changes, refuse to "register" itself with the `pkgbase` file, and ship its own alpm hook for mkinitcpio since the one installed by the mkinitcpio package is useless to people who just want to trigger `mkinitcpio -p $preset` and nothing else.
Also, since /etc/mkinitcpio.d/*.preset is not deleted, if pacman crashed (for example during the PostTransaction hooks) but the system didn't, you could manually run mkinitcpio -p $preset. In the current state of things, you cannot, because that configuration file is removed on every upgrade too (which incidentally means that it is useless to try modifying it).

I will be happy to see the current core/mkinitcpio hooks improved to be *less* breakable and thus closer to the historic state, but for my own personal use I'm still going to be rolling my own kernel and mkinitcpio package to ensure the rest of the robustness I require... As a user, I have not seen a single advantage this new hook brings, but I've seen many disadvantages.
Comment by Giancarlo Razzolini (grazzolini) - Tuesday, 03 March 2020, 16:32 GMT
@Bartosz

The problem is not the package, but what the hook does and, more importantly, at what *time* on the upgrade process it runs.

@Eli

You made it abundantly clear so far that you dislike this approach. But yet, I see no patches at all on the mkinitcpio github page. If you've put the time you spent on this gigantic rant on an actual patch, with tests, it would be much more productive. As I've said already, I'll consider moving the removal hook to posttransaction, that will help with most of the issue here.

Loading...