FS#17280 - [repo] Using XZ rather than gzip

Attached to Project: Arch Linux
Opened by Ronan RABOUIN (DarkBaboon) - Wednesday, 25 November 2009, 15:32 GMT
Last edited by Pierre Schmitz (Pierre) - Monday, 22 February 2010, 02:45 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Jan de Groot (JGC)
Pierre Schmitz (Pierre)
Aaron Griffin (phrakture)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 4
Private No


If Archlinux would use XZ (lzma) rather than gzip for compression, providing smaller package sizes without the memory and CPU penalties associated with bzip2. This would lets Archlinux repositories use less space and bandwidth (25% in terms of space and bandwidth savings has been reported by Slackware devs). Considering our devs have to limit each connection to 50kbytes/s on ftp.archlinux.org, the default mirror for Arch, because it uses too much bandwidth, the benefits would be felt right away.

This task depends upon

Closed by  Pierre Schmitz (Pierre)
Monday, 22 February 2010, 02:45 GMT
Reason for closing:  Implemented
Additional comments about closing:  first xz compressed packages are in [testing]
Comment by Aaron Griffin (phrakture) - Wednesday, 25 November 2009, 15:59 GMT
Assigning to a handful of opinionated people :)
Comment by Allan McRae (Allan) - Wednesday, 25 November 2009, 16:13 GMT
So being opinionated... I think we would be better getting deltas working. The support is basically complete in pacman (I think a script is needed to clear unneeded deltas from the repo db but I have seen a version or two floating around). Of course, xdelta3 is patched for xz support, so these are not exclusive options.
Comment by xduugu (xduugu) - Tuesday, 01 December 2009, 04:22 GMT
I have not looked at the delta implementation yet but I expect that deltas increase the traffic between mirrors and the required disk space because of the additional delta files and only decrease the traffic between the end-user and its mirror. On the other hand, the xz compression would decrease disk usage and all kind of traffic and it is even less complex. Would it not make sense to switch to xz compression and use delta patches for these smaller packages then?
Comment by Pierre Schmitz (Pierre) - Tuesday, 01 December 2009, 10:56 GMT
I think Allan should open his own feature request for this as this is completely unrelated. :P

For xz: Using this compression decreases upload/download time and disk space significantly. I have added support for xz to our scripts some time ago and it should just work. Pacman itself doesn't need to know about it thanks to libarchive. The problem is the migration step. We have two possibilities:

1) Recompress every package with xz and adjust makepkg.conf to use xz by default. This way we wont need to change any scripts and it should just work. Of course this will be quite insane because recompression of the whole repo takes a long time and mirros would have to resync every single package.

2) update our scripts to be able to use both compression methos at the same time and set xz as default. This way we would migrate slowly with every new package. The downside is that our scripts need to have some voodoo added which makes them even more complex. I have had a look at this and it's not as simple as it sounds. So: patches are welcome.
Comment by Allan McRae (Allan) - Tuesday, 01 December 2009, 11:19 GMT
Why would we need to have support for both formats? Can we not just declare packages are in xz format from a given time and release a new devtools/db-scripts on that day. If really needed, the db-scripts could probably sit in a separate folder for a while to allow some sort of transition until people update devtools and their makepkg.conf. pacman will handle repos with mixtures of file compression just fine.
Comment by Dan McGee (toofishes) - Tuesday, 01 December 2009, 13:41 GMT
Yeah, there definitely is zero voodoo required on the pacman side...coding for the future is something we do try to keep in mind, and as every package entry in the database has the filename of the package, we could name them whatever the heck we want and pacman will find them. If dbscripts didn't have this in mind, then I'm with Allan- there isn't really a need to support multiple formats, and even if there was it can't be that complex to do.
Comment by Aaron Griffin (phrakture) - Tuesday, 01 December 2009, 14:17 GMT
The only place db-scripts would need looking at is in the section that greps the PKGINFO file
Comment by Pierre Schmitz (Pierre) - Tuesday, 01 December 2009, 14:29 GMT
Good point Allan. I didn't really think about it, but accepting only xz for new packages might help here. But there are still scipts left which would have to deal with both. For example the cleanup-scripts move and remove scripts and probably more.
Comment by Aaron Griffin (phrakture) - Tuesday, 01 December 2009, 17:06 GMT
Actually, lets try to get a review of the dbscripts in. We should try to use the filename from the DB entry whenever possible and perhaps use PKG-VER-REL-ARCH.* for the file scanning pattern. Does anyone see a problem with that?
Comment by Gavin Bisesi (Daenyth) - Tuesday, 01 December 2009, 20:44 GMT
I don't really see a problem with that, Aaron.

To clarify, the only place where mixed-extension stuff is not supported is within db-scripts and devtools? That seems like the place to target this IMO.
Comment by Allan McRae (Allan) - Monday, 25 January 2010, 08:36 GMT
So, do we have a volunteer to take the lead and make all necessary changes to db-scripts & devtools?

(unassigned myself as I am not interested in doing this)
Comment by Dan McGee (toofishes) - Monday, 25 January 2010, 13:18 GMT
Nor am I, not to be a Debbie Downer.