Historical bug tracker for the Pacman package manager.
The pacman bug tracker has moved to gitlab:
https://gitlab.archlinux.org/pacman/pacman/-/issues
This tracker remains open for interaction with historical bugs during the transition period. Any new bugs reports will be closed without further action.
The pacman bug tracker has moved to gitlab:
https://gitlab.archlinux.org/pacman/pacman/-/issues
This tracker remains open for interaction with historical bugs during the transition period. Any new bugs reports will be closed without further action.
FS#64352 - Minimize amount of data fetched for VCS sources
Attached to Project:
Pacman
Opened by Anatol Pomozov (anatolik) - Friday, 01 November 2019, 22:37 GMT
Opened by Anatol Pomozov (anatolik) - Friday, 01 November 2019, 22:37 GMT
|
DetailsSome of the packages use VCS sources. 'android-tools' could be a good example - upstream does not release any tarballs and packagers have to use 'git' tool to fetch the data.
One of the problems with it is history for all the development branches has a lot of data that is not needed for the package build. Again Android is a great example - there are tons of release/development branches and there is not much sense to fetch all of it if we want to build only 1 release (e.g. tag platform-tools-29.0.5). I see an opportunity for Android build speedup: 1) Fetch only the branch needed for the build. 'git clone' has '--single-branch' option that fetches only 1 branch that is requested. 2) If '#tag=' url is used and the tag is fetched already then there is no point to re-fetch it again. Tags should be considered as immutable pointers that never change in time. |
This task depends upon
FS#34677) because they don't allow operations like cherry-picking from other branches. See e.g. Allan's comment on the aforementioned issue: "Can you clone another branch from this checkout? A different tag or commit? Cherry-pick from another branch? All features I actually use in PKGBUILDs, especially when bisecting."I'm pretty sure trying to tell us we must implement a conceptually broken feature because we're harming the environment if we don't, is not even a valid argument to make. So you may take that argument right back home and come back once you have valid arguments based on the technical possibility of doing this.
> @diabonas you assumes that downloaded sources will be kept persistently which isn't always(mostly?) true especially if they take several gigabytes of data.
If you're building the same package multiple times, I am pretty sure you're hurting the environment by downloading multiple GB of data a second/third/fourth time, more than the lack of an option to avoid downloading some small portion of that data.
Yes, we assume people keep persistent sources, because, that is what people do. For programs where even the tarball is 200+ MB, doing a fresh git clone with --depth=1 is still agonizingly, painfully long. This isn't the 2000s anymore, it is actually hard to buy hard drives which do *not* have at least a few GB of extra space (unless you are using, like, a chromebook, in which case you aren't building gigantic programs in the first place due to RAM and CPU issues, not bandwidth), and most off-the-shelf hard drives have several hundred of them. Using that to cache frequently used resources is just common sense.
Unless there's something new to be added to a discussion which has happened a bunch of times already and been rejected, I'm not sure what is going to happen here. It seems like a very complicated feature to add, breaking a bunch of assumptions, and even if it could be gotten right it would need per-PKGBUILD configuration and a lot of special handling. Moreover, what happens if someone else's PKGBUILD uses the same repository with a shared SRCDEST, but does want to cherry-pick patches or something? git remembers when it's been shallow-cloned, so a simple git fetch won't cut it.
Anyway, you can just download the tarballs in these cases. For example, the android-tools ones would be:
https://android.googlesource.com/platform/frameworks/base/+archive/platform-tools-29.0.5.tar.gz
This is still 767.17MB though, so there isn't actually that much comparative savings...
But actually my initial proposal was:
- fetch current branch only
- do not call 'get fetch' if the tag already presented locally
It will reduce size and time of the initial project clone as well as subsequent sync. It is especially useful for interative development.
Running iterative 'makepkg' over 'android-tool' project takes somewhere between 1 and 10 minutes for me (San Francisco area, Comcast internet provider). Most of the time is spent to fetch all the git refs for huge Android's 'base' project despite that I already have platform-tools-29.0.5 tag fetched locally. Git tags are immutable pointers essentially. There is no reason to re-fetch the refs in this case.
makechrootpkg also respects --holdver, by the way. :)
(This also means that if you've manually fetched the source repository with a refspec that only pulls a given branch or tag, you can prevent makepkg from undoing your manual optimization work.)
> Git tags are immutable pointers essentially. There is no reason to re-fetch the refs in this case.
Regrettably, a number of projects do force-push their tags. Some package maintainers therefore do a bit of extra legwork to find the unique sha1 of the tag object, then use #tag=${tag_sha1} in their sources.
This is actually just a random observation, because in such cases we are kind of screwed whether we fetch updates to the tag, or not. So it doesn't necessarily invalidate the idea of skipping the fetch stage when a tag is already present.