Historical bug tracker for the Pacman package manager.
The pacman bug tracker has moved to gitlab:
https://gitlab.archlinux.org/pacman/pacman/-/issues
This tracker remains open for interaction with historical bugs during the transition period. Any new bugs reports will be closed without further action.
The pacman bug tracker has moved to gitlab:
https://gitlab.archlinux.org/pacman/pacman/-/issues
This tracker remains open for interaction with historical bugs during the transition period. Any new bugs reports will be closed without further action.
FS#73217 - Please consider providing some sort of diff for packages
Attached to Project:
Pacman
Opened by Eric Engestrom (1ace) - Saturday, 01 January 2022, 23:58 GMT
Last edited by Allan McRae (Allan) - Saturday, 12 March 2022, 23:42 GMT
Opened by Eric Engestrom (1ace) - Saturday, 01 January 2022, 23:58 GMT
Last edited by Allan McRae (Allan) - Saturday, 12 March 2022, 23:42 GMT
|
DetailsMy specific use-case might be niche (I sometimes have to work on my phone's connection for a week or two), but I think everyone would benefit from being able to download the binary diff since the previous package version instead of downloading the entire package every time.
This would obviously increase the mirror's storage usage, but it would also greatly reduce their bandwidth usage, which I believe to be where most of the cost lies for a mirror provider, so I expect they would approve. This would need to be confirmed with them though, obviously. Implementation-wise, I'm thinking that whenever a new package is generated, `makepkg` would uncompress the old & new packages and diff them (using `bsdiff` or equivalent), and that diff would be stored next to the new package. This way, someone who doesn't have the previous package in their cache would download the full package, as before, nothing changes. But if `pacman` sees a previous version in its cache (such as when performing an `-Syu` update), it would query the mirror for the diff between that version and the latest, and fall back to downloading the full package if that diff doesn't exist. If it does exist, the diff is applied onto the package in the cache (or directly on the local install?) To avoid having too big diffs (resulting in too much storage cost/not enough bandwidth saving), a threshold can be added, where the diff isn't uploaded if it's not <50% of the full package size, for instance for packages that contain mostly binary files that are recompiled into something completely different each time. The whole "reproducible builds" effort should also help with this, to avoid having compilation outputs that change from one version to the next when the source file hasn't changed. In summary: - The downside is (on top the of implementation effort) slightly[*] increased storage requirement and a slightly[*] longer package build & upload for package managers. - The upside is faster download/updates for users, and less bandwidth consumed for users & mirror providers. [*] Note that I /assume/ that the build time & storage space cost are small, but I haven't built a prototype of this and ran it against the existing repos' packages, which is the only way to actually know. I'm willing to make these implementations, but I want first to make sure this is an idea that has a chance of being accepted, and also I'll need a contact person (or some access to gitlab.archlinux.org) to discuss the makepkg & pacman implementations :) |
This task depends upon
Closed by Allan McRae (Allan)
Saturday, 12 March 2022, 23:42 GMT
Reason for closing: None
Additional comments about closing: package diffs will only be reconsidered if someone provides details of an implementation with numbers demonstrating its values
Saturday, 12 March 2022, 23:42 GMT
Reason for closing: None
Additional comments about closing: package diffs will only be reconsidered if someone provides details of an implementation with numbers demonstrating its values
A PoC might involve a mirror that generates zsync files for packages along with a wrapper that uses zsync/zsync2 to download packages to the cache for use by pacman.
Oh, I wasn't aware of that; do you have a link to the discussion that lead to this decision?
> One approach would be to use zsync as it doesn't need intermediary deltas (use the cached package as seed, remote package as source)
That would be a diff of the compressed package then, right? If so, the better the compression algorithm, the more the diff would tend towards 100%; I have no idea how the current zstd would fare, but this doesn't sound like a viable long term solution :/
I don't know zsync though, I'll have a look; thanks!
https://lists.archlinux.org/pipermail/pacman-dev/2019-March/023211.html
https://lists.archlinux.org/pipermail/pacman-dev/2019-March/023217.html
https://lists.archlinux.org/pipermail/pacman-dev/2019-March/023218.html
It sounds like basically, it was just a bad implementation (inefficient & insecure) so it was removed, but another implementation of the same general idea might work, right?
But signing the diff the same way the package is signed would avoid having to do all that, right?
You could solve your issue by not updating when you only have phone access. I think it would be more constructive to work on your update addiction!
Haha, that's fair :)
I'm leaving this task open as I might give it a shot anyway at some point (and measure the actual savings!), or maybe someone else will want to give it a go.
I'll post a link here if/when I start the makepkg/pacman implementations.