FS#11302 - repo-add should generate the repo.files.tar.gz file

Attached to Project: Pacman
Opened by Gavin Bisesi (Daenyth) - Monday, 25 August 2008, 15:04 GMT
Last edited by Dan McGee (toofishes) - Thursday, 20 January 2011, 21:48 GMT
Task Type Feature Request
Category Scripts & Tools
Status Closed
Assigned To Dan McGee (toofishes)
Architecture All
Severity Low
Priority Low
Reported Version 3.2.0
Due in Version 3.5.0
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

repo-add should generate the repo.files.tar.gz file so that pkgfile can work for any arch repo. (Or users can do it manually, etc)
This task depends upon

Closed by  Dan McGee (toofishes)
Thursday, 20 January 2011, 21:48 GMT
Reason for closing:  Implemented
Additional comments about closing:  For 3.5.0 in eda4d9ec00be1108a
Comment by Gavin Bisesi (Daenyth) - Monday, 25 August 2008, 15:04 GMT
Woops, this should have been a feature request.
Comment by Xavier (shining) - Monday, 25 August 2008, 22:14 GMT
This feature would be nice, we just need someone to write a patch :)
Comment by Gavin Bisesi (Daenyth) - Tuesday, 26 August 2008, 00:45 GMT
I would write it myself but I can't find the current code to generate it. Can someone put it up? I looked in the ML from when it was originally implemented, but it has been moved from there since then.
Comment by Xavier (shining) - Tuesday, 26 August 2008, 06:42 GMT Comment by Allan McRae (Allan) - Tuesday, 26 August 2008, 06:46 GMT
and then pkgfile can make its way into pacman-contrib...
Comment by Aaron Griffin (phrakture) - Thursday, 28 August 2008, 07:15 GMT
Try to keep this fast when adding one package to the DB. If it gets too slow, we'll need to skip this step on gerolde. We need it to be quick

Additionally, could you make this optional? Something like:

repo-add foo.db.tar.gz blah.pkg.tar.gz #just update the DB
repo-add --files foo.db.tar.gz blah.pkg.tar.gz #update both the DB and the .files.tar.gz
Comment by Xavier (shining) - Thursday, 28 August 2008, 08:39 GMT
What if you execute these two commands in a row :
repo-add foo.db.tar.gz blah1.pkg.tar.gz # add blah1 in db but not in files
repo-add --files foo.db.tar.gz blah2.pkg.tar.gz # add blah2 in both db and files

So db and files end up to be inconsistent.

Or maybe this should be an env var like REPOADD_FILES to be more sure we keep a consistent value?
Comment by Gavin Bisesi (Daenyth) - Thursday, 28 August 2008, 15:22 GMT
It's a little more complicated than I thought it would be so I haven't really started work on it yet. I'll definitely try to add it to the package update function... or maybe not. Maybe have its own function that takes all the packages so it can do them in one go... not really sure. I'll try to think of a way to make it optional.

If I split it into a --files thing I see two ways to do it.. I could either update all entries (slow and ugly), or try to somehow figure out which entries are missing and do those. If --files was used with an update arg then it would update the DB and also the file list for that package. I don't think that would allow errors.

Anyone have any opinions/suggestions?
Comment by Xavier (shining) - Thursday, 28 August 2008, 15:31 GMT
Forget about what I said, just keep it simple.
repo-add [--files] foo.db.tar.gz bar.pkg.tar.gz :
if --files then remove old bar/files entry from foo.files.tar.gz , and extract the new bar/files one

Note that you don't even have the option to update all entries, you can only choose to update for the packages that are being added.
Comment by Gavin Bisesi (Daenyth) - Thursday, 28 August 2008, 19:16 GMT
I'm leaning toward an implementation that means "repo-add foo.pkg" is the same as "repo-add --files foo.pkg", with the only difference being that file lists are generated. I would then also supply perhaps a "--files all" and "--files auto" to fix missing filelists. All would obviously recreate the entire db from scratch, and auto would try to detect what is missing and update them.
Comment by Xavier (shining) - Thursday, 28 August 2008, 20:18 GMT
I don't like the --files all and --files auto because repo-add is supposed to add the packages specified as arguments.
So if you need to rebuild the whole db, you simply do : repo-add --files *.pkg.tar.gz , and that's all.
Otherwise you would have to "guess" where the packages are, which is not so nice.
Comment by Aaron Griffin (phrakture) - Thursday, 28 August 2008, 20:35 GMT
Daenyth, if you do that, we will be unable to use repo-add to manage the DBs on the main arch server. I would then have to veto the patch.

What we need is for this to be quick and work for a single package at a time. Generating file lists should also be optional, for the sake of people running their own repos.
Comment by Dan McGee (toofishes) - Saturday, 01 November 2008, 01:06 GMT
I had to recreate the testing DB the other day on gerolde for x86_64- I as well can't take any patch for repo-add that makes it slower in every case.

Having a simple "--files" flag that enables files entries to be created as well would be fine for me, but I don't think we can make that the default.
Comment by Gavin Bisesi (Daenyth) - Saturday, 01 November 2008, 01:53 GMT
What would be your ideal implementation or usage? Have one flag which generates the file list for every, but is not required? Have one that does it automatically as part of the DB generation for each argument given? Something else?
Comment by Dan McGee (toofishes) - Saturday, 01 November 2008, 01:59 GMT
Ideal would be the same way repo-add works now. Each call to repo-add takes a DB and as many packages as you desire. It only operates on those packages and not anything else in the DB (except old package entries).

So for files, calling:
repo-add --files foo.db.tar.gz foobar.pkg.tar.gz

would only create a filelist for foobar, and not touch anything else. It would remove any existing filelist for foobar.

What I don't know is what you would pass for the DB name- I would assume the files DB (so foo.files.db.tar.gz). In order to "fully" add a package, you would have one call to "repo-add" and one call to "repo-add --files".
Comment by Aaron Griffin (phrakture) - Sunday, 09 November 2008, 02:46 GMT
Additional point. Seeing as we specify the DB filename, we should also have an arg for --files that is the filename for that DB too.

That is:
repo-add --files foo.files.tar.gz foo.db.tar.gz *.pkg.tar.gz

This will future-proof us if we change names of any of these files.
Comment by Xavier (shining) - Sunday, 09 November 2008, 07:06 GMT
Dan proposed to separate the two operations, so I believe we only need to specify one db filename per operation.
1) repo-add foo.db.tar.gz *.pkg.tar.gz
2) repo-add --files foo.files.db.tar.gz *.pkg.tar.gz
Comment by Michael Trunner (trunneml) - Sunday, 26 September 2010, 12:08 GMT
Short question is anyone still working on it or how should the file.tar.gz now created?
Comment by Xavier (shining) - Sunday, 26 September 2010, 14:34 GMT
Very surprisingly, nothing changed, it's still created from dbscripts cron jobs.
http://projects.archlinux.org/dbscripts.git/tree/cron-jobs/create-filelists
Comment by PyroPeter (pyropeter) - Tuesday, 11 January 2011, 20:25 GMT
I implemented the proposal made by toofishes, the patch is attached.

Imho it would make sense to add functionality to create the pkgfile-db and the normal one in one run.
This would only require changes at the end of the script, when the db gets re-archived:
It could very easily be tar'ed a second time with an "--exclude=files" (or similar, I don't know about the syntax of the --exclude switch) directive.

This would also simplify what seems to be the main use case: Modifying both a db.tar.gz and a files.tar.gz
Comment by Dan McGee (toofishes) - Tuesday, 11 January 2011, 21:23 GMT
Quick comments:
1. You will need doc/repo-add.8.txt updates as well.
2. Please find a way other than awk to include this list- we just had compat issues with it in makepkg (http://projects.archlinux.org/pacman.git/commit/?id=bd98b93a6e161c436c22f6c39d2d6293f420cbcc) and you can do this more simply- do it like lines 76/77 in here: http://projects.archlinux.org/dbscripts.git/tree/cron-jobs/create-filelists
3. Keep the flags in the "Usage" part in alpha order. And why did you add it to repo-remove? That makes no sense the way you set this up, right?
4. Thanks! I know this looks like a lot of whining but I just want to make sure it is done right. And if you would like to be attributed under your full name, make sure you submit the patch that way.
Comment by PyroPeter (pyropeter) - Tuesday, 11 January 2011, 22:49 GMT
> 1. You will need doc/repo-add.8.txt updates as well.
I added that (the delta-feature is missing documentation too, btw.)

> Please find a way other than awk to include this list- we just had compat issues with it in makepkg (http://projects.archlinux.org/pacman.git/commit/?id=bd98b93a6e161c436c22f6c39d2d6293f420cbcc) and you can do this more simply- do it like lines 76/77 in here: http://projects.archlinux.org/dbscripts.git/tree/cron-jobs/create-filelists
It's now using sed and echo.

> 3. Keep the flags in the "Usage" part in alpha order. And why did you add it to repo-remove? That makes no sense the way you set this up, right?
I assumed it may be of use in later states of development (see my last comment), but it really is unapt.

> 4. Thanks! I know this looks like a lot of whining but I just want to make sure it is done right. And if you would like to be attributed under your full name, make sure you submit the patch that way.
It would have been surprising if my first patch for pacman had been perfect. I appreciate your comments.
I would rather use a pseudonym (I am a bit paranoid).
Comment by Dan McGee (toofishes) - Tuesday, 11 January 2011, 23:13 GMT
Applied locally, with just a few more small changes:
1. Fixed alpha ordering in the docs, thanks for pointing out missing -d/--delta docs there.
2. Removed reference to pkgfile. This is not an official tool, so we shouldn't talk about it and we can be more generic.
3. Removed $startdir prefixing in the files bsdtar call, this behaved nothing like the rest of the metadata reading and we have no right assuming files are in $startdir anyway.
Comment by Pierre Schmitz (Pierre) - Thursday, 13 January 2011, 12:17 GMT
I like this simple solution. The only thing for me to do once this is released is to write a script to add the files entry for all packages. Any maybe I need to add a link from $repo.files.tar.gz to $repo.db.tar.gz.
Comment by Allan McRae (Allan) - Thursday, 13 January 2011, 12:30 GMT
We should really consider whether we want to add that as default in Arch Linux. Looking at the extra repo, it will increase the database size by more than 10x. Given pacman does not use that information in anyway, it is probably not worth it. Maybe if we can use daily deltas on the db...
Comment by Pierre Schmitz (Pierre) - Thursday, 13 January 2011, 12:51 GMT
The advantage would be that something like -So could be implemented easily. Personally I am neutral about the increased file size. Deltas fpr the db files seem like overkill to me though.
Comment by Dan McGee (toofishes) - Thursday, 13 January 2011, 14:34 GMT
Uhhhh what? -1000 on making it the default DB. If that happens I'm sorry I ever considered this feature request, that will make the DBs absolutely huge...

And a script? It's one line to our dbscripts (just call repo-add with the -f arg and the files DB instead), and we can kill the entire create-filelists cronjob after that.
Comment by Pierre Schmitz (Pierre) - Thursday, 13 January 2011, 16:15 GMT
No need to get a heart attack about this. ;-) I wasn't implying that this has to be the default by all costs. And I wasn't aware of the file size.

Anyway, the point is that this simplifies the dbscripts and also syncs the db and files.

Loading...