Pacman

Welcome to the Pacman bug tracker. Please search the current bugs and feature requests before filing a new one! Use advanced search and select "Search in Comments".

* Please select the correct category and version.
* Write a descriptive summary, background info, and provide a reproducible test case whenever possible.
Tasklist

FS#70172 - ParallelDownloads: order downloads by size

Attached to Project: Pacman
Opened by Geert Hendrickx (ghen) - Friday, 26 March 2021, 08:48 GMT
Last edited by Allan McRae (Allan) - Saturday, 04 September 2021, 02:26 GMT
Task Type Feature Request
Category General
Status Closed
Assigned To No-one
Architecture All
Severity Very Low
Priority Normal
Reported Version git
Due in Version 6.0.1
Due Date Undecided
Percent Complete 100%
Votes 5
Private No

Details

Summary and Info:

I tested pacman 6.0 alpha1 with the ParallelDownloads feature on a box with many updates to install.
The parallel downloads are working nicely (very cool visually!), only in the end everything was waiting on a single large download.
A simple optimization would be to order downloads by size, starting with the largest, to maximize parallelization until the end.

This task depends upon

Closed by  Allan McRae (Allan)
Saturday, 04 September 2021, 02:26 GMT
Reason for closing:  Implemented
Additional comments about closing:  git commit efb714b31cd30c21df179d8cbe3730b05fffd6bd
Comment by Evangelos Foutras (foutrelis) - Thursday, 10 June 2021, 22:53 GMT
A quick *unscientific* benchmark:

- "benchmark" command: sudo pacman -Swdd --noconfirm --logfile /dev/null --cachedir $(mktemp -dp.) <pkgs>
- some tiny packages: perl_a=($(pacman -Slq | grep -m30 perl-))
- more tiny packages: perl_b=($(pacman -Slq | grep perl- | tail -30))

tl;dr: ordering downloads by size slightly improves throughput in the somewhat extreme cases tested (~5% faster if latency isn't very low); maybe worth implementing if the complexity to do so is minimal


63ms ping to HTTPS mirror with HTTP/2 support (best of 3 runs each)
-------------------------------------------------------------------
libreoffice-fresh linux ${perl_a[@]} ${perl_b[@]} => 232.8 MiB 5.86 MiB/s 00:40
${perl_a[@]} linux ${perl_b[@]} libreoffice-fresh => 232.8 MiB 5.62 MiB/s 00:41
${perl_a[@]} ${perl_b[@]} libreoffice-fresh linux => 232.8 MiB 5.44 MiB/s 00:43

libreoffice-fresh ${perl_a[@]} ${perl_b[@]} => 137.9 MiB 5.71 MiB/s 00:24
${perl_a[@]} libreoffice-fresh ${perl_b[@]} => 137.9 MiB 5.43 MiB/s 00:25
${perl_a[@]} ${perl_b[@]} libreoffice-fresh => 137.9 MiB 5.12 MiB/s 00:27


63ms ping to HTTP mirror (best of 3 runs each):
-------------------------------------------------------------------
libreoffice-fresh linux ${perl_a[@]} ${perl_b[@]} => 232.8 MiB 5.63 MiB/s 00:41
${perl_a[@]} linux ${perl_b[@]} libreoffice-fresh => 232.8 MiB 5.49 MiB/s 00:42
${perl_a[@]} ${perl_b[@]} libreoffice-fresh linux => 232.8 MiB 5.43 MiB/s 00:43

libreoffice-fresh ${perl_a[@]} ${perl_b[@]} => 137.9 MiB 5.82 MiB/s 00:24
${perl_a[@]} libreoffice-fresh ${perl_b[@]} => 137.9 MiB 5.49 MiB/s 00:25
${perl_a[@]} ${perl_b[@]} libreoffice-fresh => 137.9 MiB 5.21 MiB/s 00:26


19ms ping to HTTP mirror (best of 3 runs each)
----------------------------------------------
libreoffice-fresh linux ${perl_a[@]} ${perl_b[@]} => 232.8 MiB 5.95 MiB/s 00:39
${perl_a[@]} linux ${perl_b[@]} libreoffice-fresh => 232.8 MiB 5.93 MiB/s 00:39
${perl_a[@]} ${perl_b[@]} libreoffice-fresh linux => 232.8 MiB 5.91 MiB/s 00:39
Comment by Geert Hendrickx (ghen) - Friday, 11 June 2021, 06:52 GMT
Those are just very small differences indeed - what ParallelDownloads value did you use?
Comment by Evangelos Foutras (foutrelis) - Friday, 11 June 2021, 06:56 GMT
5, which is the default if you just uncomment the option in pacman.conf.
Comment by Geert Hendrickx (ghen) - Friday, 11 June 2021, 10:20 GMT
I remember from the first time I tested this, it downloaded a bunch of small packages first, and started a big one (kernel) as almost-last, so it was bottlenecked by that one big (single threaded) download in the end.
Starting the big download(s) first and keeping the small ones last allows for maximal parallellism. But it's pausible that the practical gains are marginal (if your download link is saturated anyway), which you proved here.
On the other hand the download sizes are known in advance (from the sync db) so it's probably not that complex to sort them on size anyway?
Comment by Charlie Sale (softwaresale) - Thursday, 12 August 2021, 01:31 GMT
Is anyone working on this? I'm interesting in contributing to pacman and this seems like a decently easy task.

Loading...