Pacman

Historical bug tracker for the Pacman package manager.

The pacman bug tracker has moved to gitlab:
https://gitlab.archlinux.org/pacman/pacman/-/issues

This tracker remains open for interaction with historical bugs during the transition period. Any new bugs reports will be closed without further action.
Tasklist

FS#64352 - Minimize amount of data fetched for VCS sources

Attached to Project: Pacman
Opened by Anatol Pomozov (anatolik) - Friday, 01 November 2019, 22:37 GMT
Task Type Feature Request
Category makepkg
Status Unconfirmed
Assigned To No-one
Architecture All
Severity Low
Priority Normal
Reported Version 5.2.0
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 2
Private No

Details

Some of the packages use VCS sources. 'android-tools' could be a good example - upstream does not release any tarballs and packagers have to use 'git' tool to fetch the data.

One of the problems with it is history for all the development branches has a lot of data that is not needed for the package build. Again Android is a great example - there are tons of release/development branches and there is not much sense to fetch all of it if we want to build only 1 release (e.g. tag platform-tools-29.0.5).

I see an opportunity for Android build speedup:

1) Fetch only the branch needed for the build. 'git clone' has '--single-branch' option that fetches only 1 branch that is requested.

2) If '#tag=' url is used and the tag is fetched already then there is no point to re-fetch it again. Tags should be considered as immutable pointers that never change in time.
This task depends upon

Comment by Jonas Witschel (diabonas) - Friday, 01 November 2019, 22:56 GMT
Similar approaches have been discussed and rejected in the past (see e.g.  FS#34677 ) because they don't allow operations like cherry-picking from other branches. See e.g. Allan's comment on the aforementioned issue: "Can you clone another branch from this checkout? A different tag or commit? Cherry-pick from another branch? All features I actually use in PKGBUILDs, especially when bisecting."
Comment by Jensen McKenzie (your_doomsday) - Saturday, 02 November 2019, 13:37 GMT
A simple opt-out which will allow going back to full clone will make all Allan arguments moot. Shallow clone will be enough for 90%+ packages. Current state is big resource waste and contributes to global warming.
Comment by Jonas Witschel (diabonas) - Saturday, 02 November 2019, 13:56 GMT
I'd argue it is not worth the effort since only the initial clone will take some effort, later updates don't take a lot of bandwidth and should be fast. Anyway, there are some possible workarounds to allow shallow clones, see e.g. https://wiki.archlinux.org/index.php/User:Apg#makepkg:_shallow_git_clones (in particular the last link https://gist.github.com/andrewgregory/770cbedfa2da3817d762483155a330b8).
Comment by Jensen McKenzie (your_doomsday) - Saturday, 02 November 2019, 14:07 GMT
@diabonas you assumes that downloaded sources will be kept persistently which isn't always(mostly?) true especially if they take several gigabytes of data.
Comment by Eli Schwartz (eschwartz) - Sunday, 03 November 2019, 00:39 GMT
> Current state is big resource waste and contributes to global warming.

I'm pretty sure trying to tell us we must implement a conceptually broken feature because we're harming the environment if we don't, is not even a valid argument to make. So you may take that argument right back home and come back once you have valid arguments based on the technical possibility of doing this.

> @diabonas you assumes that downloaded sources will be kept persistently which isn't always(mostly?) true especially if they take several gigabytes of data.

If you're building the same package multiple times, I am pretty sure you're hurting the environment by downloading multiple GB of data a second/third/fourth time, more than the lack of an option to avoid downloading some small portion of that data.

Yes, we assume people keep persistent sources, because, that is what people do. For programs where even the tarball is 200+ MB, doing a fresh git clone with --depth=1 is still agonizingly, painfully long. This isn't the 2000s anymore, it is actually hard to buy hard drives which do *not* have at least a few GB of extra space (unless you are using, like, a chromebook, in which case you aren't building gigantic programs in the first place due to RAM and CPU issues, not bandwidth), and most off-the-shelf hard drives have several hundred of them. Using that to cache frequently used resources is just common sense.
Comment by Eli Schwartz (eschwartz) - Sunday, 03 November 2019, 00:49 GMT
@anatolik,

Unless there's something new to be added to a discussion which has happened a bunch of times already and been rejected, I'm not sure what is going to happen here. It seems like a very complicated feature to add, breaking a bunch of assumptions, and even if it could be gotten right it would need per-PKGBUILD configuration and a lot of special handling. Moreover, what happens if someone else's PKGBUILD uses the same repository with a shared SRCDEST, but does want to cherry-pick patches or something? git remembers when it's been shallow-cloned, so a simple git fetch won't cut it.

Anyway, you can just download the tarballs in these cases. For example, the android-tools ones would be:

https://android.googlesource.com/platform/frameworks/base/+archive/platform-tools-29.0.5.tar.gz

This is still 767.17MB though, so there isn't actually that much comparative savings...
Comment by Anatol Pomozov (anatolik) - Sunday, 03 November 2019, 01:54 GMT
The discussion is circling around git shallow copies that has its own pros and cons.

But actually my initial proposal was:
- fetch current branch only
- do not call 'get fetch' if the tag already presented locally

It will reduce size and time of the initial project clone as well as subsequent sync. It is especially useful for interative development.

Running iterative 'makepkg' over 'android-tool' project takes somewhere between 1 and 10 minutes for me (San Francisco area, Comcast internet provider). Most of the time is spent to fetch all the git refs for huge Android's 'base' project despite that I already have platform-tools-29.0.5 tag fetched locally. Git tags are immutable pointers essentially. There is no reason to re-fetch the refs in this case.
Comment by Eli Schwartz (eschwartz) - Sunday, 03 November 2019, 03:29 GMT
Maybe the existing makepkg --holdver feature does what you want? It will avoid downloading updates for VCS repos that are already cloned, and is pretty much meant for the iterative case, even if it doesn't help when bumping to a new version that you haven't previously fetched.

makechrootpkg also respects --holdver, by the way. :)

(This also means that if you've manually fetched the source repository with a refspec that only pulls a given branch or tag, you can prevent makepkg from undoing your manual optimization work.)

> Git tags are immutable pointers essentially. There is no reason to re-fetch the refs in this case.

Regrettably, a number of projects do force-push their tags. Some package maintainers therefore do a bit of extra legwork to find the unique sha1 of the tag object, then use #tag=${tag_sha1} in their sources.

This is actually just a random observation, because in such cases we are kind of screwed whether we fetch updates to the tag, or not. So it doesn't necessarily invalidate the idea of skipping the fetch stage when a tag is already present.

Loading...