FS#7485 - use bsdtar in makepkg

Attached to Project: Pacman
Opened by Baptiste Daroussin (bapt) - Thursday, 21 June 2007, 09:25 GMT
Last edited by Dan McGee (toofishes) - Monday, 09 July 2007, 04:32 GMT
Task Type Feature Request
Category makepkg
Status Closed
Assigned To Aaron Griffin (phrakture)
Dan McGee (toofishes)
Architecture All
Severity Medium
Priority Normal
Reported Version 3.0.5
Due in Version 3.1.0
Due Date Undecided
Percent Complete 100%
Votes 1
Private No


Summary and Info:
pacman depends on libarchive which provides bsdtar.
bsdtar can handle nearly all the format needed by makepkg and is always installed as it comes as a dependency for pacman.

I've made a patch to be able to use bsdtar instead of unzip in makepkg.
I tested it on serval packages and it works great.

the interest is on uniq tool to rely on for uncompressing sources, less dependencies (gnutar, unzip would not be necessary any more) also it removes the unziphack.

Here are two patches :
one that only removes unzip but keep gnutar : makepkg-nounzip.patch
and one that only uses bsdtar : makepkg-bsdtar.patch (it also seems to be a little - very little :) - bit faster.)

patch are done on makepkg delivered with pacman-3.0.5-2
This task depends upon

Closed by  Dan McGee (toofishes)
Monday, 09 July 2007, 04:32 GMT
Reason for closing:  Implemented
Additional comments about closing:  implemented in GIT
Comment by Baptiste Daroussin (bapt) - Thursday, 21 June 2007, 13:38 GMT
In makepkg-bsdtar.patch, I juste find that the .FILELIST is always created empty. Because of bsdtar writing in stderr (2) when compressing, and the output is different from gnutar (there is a 'a ' before the file names, could be solve with a cut or a awk).
Comment by Roman Kyrylych (Romashka) - Thursday, 21 June 2007, 13:43 GMT
The patch is against current makepkg.
This is not an issue in recent git versions because .FILELIST is generated by ls now (after "allow it to create an empty package" patch).
Comment by Baptiste Daroussin (bapt) - Thursday, 21 June 2007, 14:11 GMT
Do you want me to recreate the patch against the git version ?
Comment by Roman Kyrylych (Romashka) - Thursday, 21 June 2007, 16:03 GMT
Would be nice.
Also, it seems this solves the complaint about adding unzip to makepkg's depends, because bsdtar handles this as I see.
Comment by Baptiste Daroussin (bapt) - Thursday, 21 June 2007, 16:40 GMT
I'll see what I can do.
Comment by Baptiste Daroussin (bapt) - Friday, 22 June 2007, 08:37 GMT
Here is a new patch against git version taken from http://projects.archlinux.org/git/?p=pacman.git;a=blob_plain;f=scripts/makepkg.in;hb=HEAD with the new makepkg.conf
it seems to work for me, but it need testing as I don't have installed the whole pacman git (I need to keep the stable one)

all decompression that can be done with bsdtar are done with the command bsdtar -x -f (no need for z or j or anything else as bsdtar can automatically detect the archive type, ie tar.gz, tar.bz2, cpio, zip)
I remove the unzip hack
And I use bsdtar instead of tar for creating the package, like this makepkg only rely on bsdtar as a (de)compression program).

For the record bsdtar can also manage the following formats : iso9660, and ar files I think iso9660 is useless for makepkg, but ar could be interesting for debian packages, for programs that are only available to download as rpm or deb for examples we currently have the openoffice langpacks that use rpm, so depends on rpmextract.sh and are also available as deb, so we could drop the rpmextract.sh dependency :) but I think this should be a new bug anyway.
Comment by Roman Kyrylych (Romashka) - Friday, 22 June 2007, 08:46 GMT
does bsdtar create exactly the same tarballs as tar? (so xdelta project will work with bsdtar too)
Comment by Baptiste Daroussin (bapt) - Friday, 22 June 2007, 09:09 GMT
archives produced by bsdtar and gnutar are compatible, but produced a different checksum.
Comment by Baptiste Daroussin (bapt) - Friday, 22 June 2007, 09:51 GMT
Is it a really a blocking point ?
I mean as soon as the packages will rebuild at least once, xdelta would be working again. as xdelta works for creating patch/patching bsdtar archives.
Comment by Roman Kyrylych (Romashka) - Friday, 22 June 2007, 09:56 GMT
ah, yes. ok :)
Comment by Andrew Fyfe (space-m0nkey) - Friday, 22 June 2007, 17:52 GMT
This should be put on hold until we bump to libarchive2, the current permission problems with pacman are related to some problems with libarchive1 (and II assume bsdtar will have the same issues).
Comment by Andrew Fyfe (space-m0nkey) - Friday, 22 June 2007, 18:13 GMT
Just been doing some testing. bsdtar (and find from Roman's empty pkg patch) does not produce the same output as tar...

tar: usr/bin/
bsdtar/find: usr/bin

Some testing needs done with pacman to see if it has any negative effects.
Comment by Andrew Fyfe (space-m0nkey) - Friday, 22 June 2007, 18:46 GMT
I've created a branch for the bsdtar patch on my git repo... http://neptune-one.homeip.net/cgi-bin/gitweb.cgi?p=pacman;a=shortlog;h=bsdtar
Comment by Andrew Fyfe (space-m0nkey) - Friday, 22 June 2007, 21:39 GMT
Directories in .FILELIST must end with a / otherwise pacman thinks they are files and does a conflict check.

.FILELIST should be created with tar everything else can be done with bsdtar.
Comment by Baptiste Daroussin (bapt) - Saturday, 23 June 2007, 08:13 GMT
My thought, perhaps I'm wrong but :
1- pacman 3.0.5 comes with libarchive2, so nothing to wait
2- the .FILELIST in the last official git is no more created with tar but with find. so your git doesn't seem up to date.

If there is a problem with the the creation of the .FILELIST, I think the best is to write our own tiny c program to create it, better than faking compression I think. I also know that zsh with print -l **/* gives exactly the same output as tar cvf /dev/null *, perhaps there is something in bash that can do it.

I prefer the homemade tiny program like mkfilelist because it is easy to do, to maintain, and would be always sure of the way .FILELIST is created.
Comment by Andrew Fyfe (space-m0nkey) - Saturday, 23 June 2007, 11:56 GMT
1. Ok at the time of my last post arch hadn't upgraded to libarchive 2 :)

2. The switch to find needs reverted back to tar. I don't see the point in creating a program to create .FILELIST when tar does the job perfectly and is already included in the base system.
Comment by Baptiste Daroussin (bapt) - Saturday, 23 June 2007, 12:19 GMT
I think not using gnutar is simply respecting the KISS philosophy, we don't need two program that do the same job in the base, do we ?

pacman uses libarchive which comes with bsdtar so I think we do not need gnutar in base as bsdtar can not the job.

Considering the .FILELIST I want to test some stuf. Which git version should I consider the official one ? in case I would propose patches for the FILELIST creation. Currently I use the one in projetcs.archlinux/git, which uses find to create the package list, would it be reverted to tar, or are you open to a new proposition for this job. perhaps there is a better place to discuss about it ? I'm quite new to archlinux, but I really love it and want to get involve in it.
Comment by Andrew Fyfe (space-m0nkey) - Saturday, 23 June 2007, 15:42 GMT
Removing gnutar from base isn't an option as bsdtar doesn't produce the same output or accept all the options gnutar does.

With regard to which git repo to use projects.archlinux is the official one, Dan McGee (one of the pacman devs) has a repo at http://code.toofishes.net/gitweb.cgi?p=pacman.git;a=summary and I've got a repo (I'm not a dev but I do contribute a bit) at http://neptune-one.homeip.net/cgi-bin/gitweb.cgi?p=pacman;a=summary
Comment by Dan McGee (toofishes) - Tuesday, 26 June 2007, 17:00 GMT
OK- I think this could be a decent idea since libarchive/bsdtar is required by pacman anyway, but we need to think a few things out first.

1. What do we lose by removing any dependency on tar? I would rather go 100% or not at all. My first thought of something missing- the xdelta/rsync tar offset stuff that allows these programs to create smaller binary diffs. We don't use this option at the moment, although I think we should.
2. How can we create the filelist without using GNU tar? It is silly to need such a large program to do a small task, when we could easily write one ourselves to create the filelist in a consistent fashion.
Comment by Andrew Fyfe (space-m0nkey) - Tuesday, 26 June 2007, 19:18 GMT
1) My previous comment still stands for removing tar from base, for for just replacing tar with bsdtar is possible. xdelta/rsync shouldn't be a problem as long as the old package and new package are created with the same tar program.

cd $pkgdir
find -type d | sed 's#$#&/#' >.FILELIST
find ! -type d >>.FILELIST
sort .FILELIST > .FILELIST-sorted
mv .FILELIST{-sorted,}
Comment by Dan McGee (toofishes) - Tuesday, 26 June 2007, 20:55 GMT
Filelists should be created the same way as before with regard to the prefix- find defaults to this:
While tar cvf did this:

Using find -printf '%P\n' fixes this.
Is there really nothing else simpler than the above command?
Comment by Roman Kyrylych (Romashka) - Wednesday, 27 June 2007, 06:54 GMT
find -printf '%P\n' doesn't work for me.

This works: find * -exec ls -dp {} \; 2>/dev/null
Comment by Xavier (shining) - Wednesday, 27 June 2007, 15:26 GMT
find really doesn't have a way to output dir with a trailing / , with all the options it has :d ?
if not, what about this :
find \( -type d -printf '%P/\n' \) , \( ! -type d -printf '%P\n' \) | sort
Comment by Xavier (shining) - Wednesday, 27 June 2007, 15:40 GMT
Sorry, find also prints the pwd '.', and printing it with %P/ resulted in '/'
Anyway looks like the pwd can be removed from find using -mindepth 1 :
find -mindepth 1 \( -type d -printf '%P/\n' \) , \( ! -type d -printf '%P\n' \) | sort
Comment by Xavier (shining) - Wednesday, 27 June 2007, 15:48 GMT
There is still at least one difference with find compared to tar :
the hidden files in cwd are printed (eg .PKGINFO)
if .FILELIST is always the first hidden file created, it shouldn't matter.
Comment by Baptiste Daroussin (bapt) - Wednesday, 27 June 2007, 16:46 GMT
I still think the best way to create .FILELIST is to create our tiny own program, that will exactly fit our needs.
I wrote a small C that can do it like this:
mkfilelist | sort.

I really don't know much of C, so it is just missing the sorting.
Because creating the .FILELIST is a small but important task, having such a program (very easy to maintain as it is not complicated) would make us sure that the .FILELIST is always well formated.

This program is just an example as the code is very trivial, I'm sure it could be done better, but anyway it shows that's it is easy to do.
A separate program is better for me because we can keep control on it. If in pacman-15.0.40 there are new informations stored in .FILELIST we would just have to adapt mkfilelist instead of searching for a new tool that can do the stuff.
Comment by Xavier (shining) - Wednesday, 27 June 2007, 17:23 GMT
I think find does the job just fine, and it probably isn't going to change.
Even if I didn't get it right the first time, it was quite close, and it didn't require much time.
Anything wrong with it?
find -mindepth 1 \( -type d -printf '%P/\n' \) , \( ! -type d -printf '%P\n' \) | sort

I personally prefer using existing tools when possible, instead of creating new ones.
If FILELIST needs to change in the feature, it shouldn't be hard either to find out how to use
the existing tools how we want.
Btw, that little C prog doesn't work properly yet. I admit it would probably be easy to fix it,
but I'm really not sure it's a good idea.
Anyway, I'm not the one to decide :)
Comment by Dan McGee (toofishes) - Wednesday, 27 June 2007, 18:10 GMT
Cleaned up and works. Use as so:

./mkfilelist <dir to list>

So we would use it like the following:

mkfilelist pkg/

By the way, some times of these programs were posted on the ML, and this kicks the snot out of them. (Use gcc -O2 -o mkfilelist mkfilelist.c).
Comment by Xavier (shining) - Wednesday, 27 June 2007, 18:38 GMT
Ok, this one works fine.
Though, I don't think performance is the only thing to take in consideration ;)
Building the filelist is very fast anyway, even with lot of files,
and there wasn't a big difference anyway.
Eg 100 ms for mkfilelist vs 150 ms for find, on a dir with more than 16000 files :)

But well, if you see only advantages about going this way, then why not.
I just find it a bit overkill to have an external and dedicated binary just for this simple task.
Comment by Dan McGee (toofishes) - Wednesday, 27 June 2007, 19:16 GMT
Remember what we already build in src/util/.

vercmp and testpkg live there, and they are on your system.
Comment by Xavier (shining) - Wednesday, 27 June 2007, 20:00 GMT
Ah, good point, I forgot about vercmp, and didn't even know about testpkg.
Though, I find these more useful than this one. For example, I don't think they can be easily emulated in one line using existing tools. ;)
But I can't see any real downsides either going this way, so I guess it's just a matter of preference, nothing important.
Comment by Dan McGee (toofishes) - Wednesday, 27 June 2007, 20:10 GMT
Two downsides:
1. I used a few possible non-portable functions, although this should be easily fixable.
2. find has built in protection against recursive directory loops, and this simple program does not.
Comment by Baptiste Daroussin (bapt) - Wednesday, 27 June 2007, 22:39 GMT
My thought about the downsides :
the first one should be easily fixable, the second is not really a problem for us since the program is really specific to package creation which shoud not normaly contains any directory loops.
Comment by Andrew Fyfe (space-m0nkey) - Wednesday, 27 June 2007, 23:45 GMT
The times posted to the ML were only to highlight the point that using find * -exec ls -dp {} \; wasn't a good idea because it was spawning a call to ls for every file found.

I agree with Chantry on this one, why create a new program when there's already one that does the job. vercmp is simply a wraper to call the vercmp function from libalpm, it's not possible to use exisiting programs to create a one line alternative. Also the creation of .FILELIST is done only once in makepkg vercmp is used in several places in makepkg and .INSTALL scripts.
Comment by Dan McGee (toofishes) - Thursday, 28 June 2007, 00:51 GMT
Andrew- I pulled your bsdtar branch into GIT. Marking as needs testing, as it definitely does.
Comment by Roman Kyrylych (Romashka) - Thursday, 28 June 2007, 09:34 GMT
the last commit contains
find -mindepth 1 \( -type d -printf '%P/\n' \) , \( ! -type d -printf '%P\n' \) 2>/dev/null | sort >.FILELIST
which leads to inclusion of .PKGINFO .INSTALL .CHANGELOG into .FILELIST

Can we just modify makepkg to create .PKGINFO .INSTALL .CHANGELOG in $startdir instead of $startdir/pkg ?
Comment by Andrew Fyfe (space-m0nkey) - Thursday, 28 June 2007, 09:52 GMT
Added grep -v '^\.' between find and sort. (http://neptune-one.homeip.net/cgi-bin/gitweb.cgi?p=pacman;a=shortlog;h=ready_to_pull)

We need the dot files inside pkg/ so they can be included in the package file, otherwise you've got to do extra work to get them included into the package.
Comment by Xavier (shining) - Thursday, 28 June 2007, 09:53 GMT
Are there cases when the pkg/ dir isn't removed, before being created ?
If yes, is there any reasons to do so ?

If we always start from a clean pkg/, we won't have all these .* files for creating the filelist.
Comment by Andrew Fyfe (space-m0nkey) - Thursday, 28 June 2007, 11:30 GMT
When using -R (repackaging) the pkg/ dir might already contain dot files.
Comment by Xavier (shining) - Thursday, 28 June 2007, 11:37 GMT
hm ok but isn't -R supposed to recreate the pkg/ dir ?
If that's the case, maybe it could also delete it first.
Sorry if I'm missing something, I don't know makepkg much.
Comment by Andrew Fyfe (space-m0nkey) - Thursday, 28 June 2007, 11:51 GMT
The normal process for makepkg is

- check deps
- download source
- check checksums
- build pkg and install to pkg/
- tidy up the package
- create the package

-R skips the first 4 steps and uses the existing pkg/ directory to create the package, this case the pkg/ directory may already contain the dot files from the last time makepkg was run.
Comment by Roman Kyrylych (Romashka) - Thursday, 28 June 2007, 12:03 GMT
then -R should remove .PKGINFO .INSTALL .CHANGELOG first.
Comment by Dan McGee (toofishes) - Friday, 29 June 2007, 03:54 GMT
I want this to be sufficiently tested before I close it- someone post here once they have generated a few packages with this makepkg and installed them.
Comment by Baptiste Daroussin (bapt) - Friday, 29 June 2007, 09:46 GMT
Sorry for my newbie question, but is there a way to have pacman-git and stable offical pacman installed at the same time ? it will be easier for me to test it.
Comment by Andrew Fyfe (space-m0nkey) - Friday, 29 June 2007, 10:01 GMT
You can download the latest snapshot of the code from http://neptune-one.homeip.net/~andrew/pacman/, then

./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var

Then you can use src/pacman/pacman.static, you'll also need the config file from etc/pacman.conf (it contains some new options that won't be in your current pacman.conf).

You can then run `pacman.static --config <config file> ...` to test the new pacman.
Comment by Baptiste Daroussin (bapt) - Friday, 29 June 2007, 10:21 GMT

so It tested the latest git from http://code.toofishes.net/gitprojects/pacman.git with all my aur (only 5 :)) and it works perfectly. I tested some packages in abs and it works to.
Comment by Xavier (shining) - Saturday, 30 June 2007, 09:21 GMT
I compared the result of makepkg 3.0.5 vs makepkg 3.1 for two packages (comparing the list of files and permissions with tar tzf),
and they looked identical.
Only the order for the second package was a bit different, 3 files in the archive weren't at the same place.
They were at a slightly strange place with bsdtar, but well, I don't see any reason why that would matter, there are many correct orders possible.