FS#16165 - [vim] fetch_patches.sh is very slow, should use gzipped bundles

Attached to Project: Arch Linux
Opened by Devin Cofer (Ranguvar) - Sunday, 13 September 2009, 04:25 GMT
Last edited by Dan Griffiths (Ghost1227) - Saturday, 13 February 2010, 21:01 GMT
Task Type Feature Request
Category Packages: Extra
Status Closed
Assigned To Tobias Kieslich (tobias)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

The [vim] package downloads patches one by one, which takes a very long time as there are often hundreds of them.

The makers of Vim, fortunately, bundle up each series of 100 patches into a gzipped patch, which is available at, e.g., ftp://ftp.vim.org/pub/vim/patches/7.2/7.2.101-200.gz

The fetch_patches.sh script should use the patch bundles when possible, to minimize makepkg time.


I'll work on a patch when I have some time, but it'd be great if someone else could take a crack at it too, as I am still a Bash novice.
This task depends upon

Closed by  Dan Griffiths (Ghost1227)
Saturday, 13 February 2010, 21:01 GMT
Reason for closing:  Fixed
Additional comments about closing:  Not using gzipped bundles, but we have cut the download time down significantly. Take a look at my comment for details.
Comment by Devin Cofer (Ranguvar) - Sunday, 13 September 2009, 04:33 GMT
This also needs to be put in [gvim]
Comment by Tobias Kieslich (tobias) - Sunday, 13 September 2009, 22:18 GMT
I don't think it's too much of a deal. First of all, Arch is a binary based distro after all, so there is no need to rebuild packages. People who do rebuild them usually do that on a regular basis and for that all downloaded patches are cached as long as you have the src cache in makepkg.conf enabled.
The vim build process is already very complex so I'm a little hesitant to make it even more complex. However, I will review the patches and we will go from there.
Comment by Devin Cofer (Ranguvar) - Monday, 14 September 2009, 03:19 GMT
Fair enough. I don't think it will make the script too complex, and it should save full minutes of downloading.

I'm working on rewriting the script now, first as just cleared up and working with e.g. nonstandard chars in the build path ("$srcdir" as opposed to $srcdir, etc.), and then I'll make a patch for THAT to make it have this feature, so you can only apply the former if you like.

This will teach me a fair bit about shell scripting, so it'll be fun.
Comment by Devin Cofer (Ranguvar) - Tuesday, 22 September 2009, 22:33 GMT
Almost done, just fixing bugs. Decided to rewrite, and it's roughly 30 lines longer than before, but I think it's much more understandable overall (and does some nice stuff it didn't before, like warning if patchlevel is too _high_, etc.)
Comment by Devin Cofer (Ranguvar) - Thursday, 24 September 2009, 00:54 GMT
Finished the rewrite. I tested this in every way I could think of, but you may want to do more testing. No changes to the PKGBUILD are needed besides the md5sums change.
The new script is a fair bit longer than the old one, but IMHO more readable and more logical.
Comment by Xavier (shining) - Friday, 25 September 2009, 06:48 GMT
Just for the record, it looks like there was a patch doing that which has been ignored :
http://bugs.archlinux.org/task/12440?getfile=3141

But now that I look a bit closer at that patch and your new script, I think yours is nicer.

Another thing which bothers me. Are we really the only ones doing that ? I would guess virtually any vim packagers would need something like this. So how is everyone else doing it ? Everyone is writing his custom script on his side, without looking at what all the others are doing ?
Comment by Tobias Kieslich (tobias) - Friday, 25 September 2009, 18:28 GMT
Many other Distributions have source packages and their update frequency is much lower. So for rolling a rpm let's say, it's totally fine to download the needed patches manually, roll them up into a single file pack it with the source an voila.

If you go back in history, that's what we did: download all, concatenate them put that file in the repository until I got sick of the manual labour.
The current approach was developed under the estimation that people who build it download the stuff once and cache it. And it works fine for me as packager. I always have to download just like 20 patches and that's it.

I will look through your file when I have some time, Thanks.
Comment by Xavier (shining) - Friday, 25 September 2009, 19:22 GMT
Hm ok.
Anyway I am probably more blaming this crappy upstream distribution model than the fetch patch script.

Otherwise, as I said in bbs ( http://bbs.archlinux.org/viewtopic.php?id=80886 ) :
I am as old as CVS, and people still don't get the point of Version Control System ?
Why don't we just use vim-cvs / vim-svn / vim-git / whatever instead of manually fetching all these patches ?

That way should reduce the complexity of arch pkgbuild and fetch_patches a lot, by moving the complexity to the vcs.
We would get the same code, easy way to rebuild the same code, easy and efficient way to update. The way I see it, it would be a win-win situation.
Comment by Tobias Kieslich (tobias) - Friday, 25 September 2009, 19:47 GMT
Using SVN/CVS is not a solution at all. Because the purpose of that bug is to increase speed of download, not to slow it down. Vims development model is patch based and their repositories are behind the patches as stated on their website. And as I said, fetching from svn-cvs is slower, takes more bandwidth etc and we would have no benefit from caching over what we have at the moment, because the patches are cached as well.

The fetch_patches framework came a long way. It's a tribute to vims admittedly awkward, but functioning development model. It allows people to just enter the version number they want and build the proper package. It might not be pretty but it works. For the vim fols and for us.
Comment by Xavier (shining) - Friday, 25 September 2009, 20:08 GMT
Well talking about speed :
cvs -z3 -d:pserver:anonymous@vim.cvs.sf.net:/cvsroot/vim checkout vim7 4,42s user 1,10s system 38% cpu 14,382 total

This is the time makepkg needs just to dowload the three sources : vim-7.2.tar.bz2 vim-7.2-extra.tar.gz vim-7.2-lang.tar.gz
(maybe -extra and -lang are still needed when using cvs, but anyway it doesn't change much)

And downloading all the patches took ... 6 minutes ! lol !
I suspect that even using the gzipped bundles, it would still be much slower than 14 seconds...
Comment by Tobias Kieslich (tobias) - Friday, 25 September 2009, 20:37 GMT
I have no idea where yur numbers come from, I assume from an already checked out directory. But for a freshly checked out vim7 it looks more like this:
real 8m6.351s
user 0m3.166s
sys 0m1.097s

and I did use the comman you provided.
Comment by Xavier (shining) - Saturday, 26 September 2009, 00:58 GMT
It was a fresh checkout. I checked a second time and got the same time. My bandwidth is between 1 and 2 MB/s.
The resulting tree is 43MB. So since it is compressed (-z3), 14s looks very possible if my bandwidth is used (and apparently it is).

The resulting tree seems to contain vim-7.2.tar.bz2 + vim-7.2-extra.tar.gz + vim-7.2-lang.tar.gz + the hundred patches.
Comment by Xavier (shining) - Wednesday, 30 September 2009, 12:15 GMT
Seems like we are not getting in a agreement any time soon, so I posted a pkgbuild on AUR :
http://aur.archlinux.org/packages.php?ID=30507
So users who want to rebuild vim could use that alternative.
Comment by Devin Cofer (Ranguvar) - Thursday, 01 October 2009, 21:51 GMT
I like Shining's approach much better, should have looked into it before doing my script :)

IMO Shining's PKGBUILD should be used for the official Vim package, and a similar setup for gVim, etc.
Comment by Tobias Kieslich (tobias) - Thursday, 01 October 2009, 22:14 GMT
While it does look slimmer there are some down falls to it:
- we don't build files in extra from csv unless we need features from
development or have other sources available
- that the csv/svn is behind the patches in time
- the current approach allows to cache the source in the makepkg
cachedir instead of the build directory (which is no problem for
normal users but yields more work for me as maintainer who build
from fresh PKKBUILD checkouts in clean environments)
- while there was this impressive number for fast download presented,
I wasn't able on two different machines on two different connections
to reproduce that speedup over the current approach. All together they
clock in about the same.
Comment by Devin Cofer (Ranguvar) - Friday, 02 October 2009, 01:00 GMT
1.) This is a problem? We don't usually, yes, but if there's a good reason to... I propose we not use makepkg's CVS auto-stuff (cvsmod, etc.) but instead use a PKGBUILD that always fetches the specified version of Vim (or make a source tarball and put it on ftp.archlinux.org as is often done).
2.) I don't think that this is a problem at all. Vim's CVS is not far behind at all -- Arch is much farther behind itself usually in its Vim packages.
3.) I don't understand, but OK.
4.) Really? The difference is massive, and should be apparent even without testing, just from looking at the process?
On a Q6600, 6GiB RAM, ext4 7,200rpm drive, 10Mb/s downstream:
(Informal, uses full makepkg time, 'sync' ran in between tests, 'vim-rang' is just normal Vim with the new fetch_patches.sh):

vim:
real 20m16.117s
user 1m33.994s
sys 0m33.162s

vim-rang:
real 5m8.494s
user 0m39.086s
sys 0m29.134s

vim-cvs:
real 1m3.011s
user 0m19.465s
sys 0m9.329s
Comment by Kaiting Chen (Phoenixfire159) - Monday, 23 November 2009, 00:52 GMT
I have my own Vim PKGBUILD that I use on my machine. In it I have replaced fetch_patches.sh with a small piece of code that does the same thing but fetches the patches in parallel. I've never looked at the fetch_patches.sh code at all but perhaps this will be useful. I've attached the entire PKGBUILD and the specific part copied out of it that deals with applying the patches.
Comment by Dan Griffiths (Ghost1227) - Saturday, 13 February 2010, 21:00 GMT
After discussing with Tobias, we have come to a compromise of sorts. We both believe that packages in extra should avoid the use of revision control systems as much as possible. If you really want a cvs version, put it in AUR. However, the length of time it takes to download the patches was a bit ridiculous. Phoenixfire159's patch is pretty good, but prevents users from specifying a patchset for those who want to recompile based on a specific patch. As such, I've rewritten it in such a way that we retain the parallel downloads, cutting the download time down to under a minute, but users can still select the most recent patch they want.

Loading...