FS#70172 : ParallelDownloads: order downloads by size

FS#70172 - ParallelDownloads: order downloads by size

Attached to Project: Pacman
Opened by Geert Hendrickx (ghen) - Friday, 26 March 2021, 08:48 GMT
Last edited by Allan McRae (Allan) - Saturday, 04 September 2021, 02:26 GMT

Task Type	Feature Request
Category	General
Status	Closed
Assigned To	No-one
Architecture	All
Severity	Very Low
Priority	Normal
Reported Version	git
Due in Version	6.0.1
Due Date	Undecided
Percent Complete
Votes	5 Juri Vitali (Juma93) (2021-06-15) ZeDoCaixao (croin) (2021-06-13) Yauhen (actionless) (2021-06-10) Maxim Baz (maximbaz) (2021-06-03) Mark Blakeney (bulletmark) (2021-06-03)
Private	No

Details

Summary and Info:

I tested pacman 6.0 alpha1 with the ParallelDownloads feature on a box with many updates to install.
The parallel downloads are working nicely (very cool visually!), only in the end everything was waiting on a single large download.
A simple optimization would be to order downloads by size, starting with the largest, to maximize parallelization until the end.

This task depends upon

Closed by Allan McRae (Allan)
Saturday, 04 September 2021, 02:26 GMT
Reason for closing: Implemented
Additional comments about closing: git commit efb714b31cd30c21df179d8cbe3730b05fffd6bd

Comment by Evangelos Foutras (foutrelis) - Thursday, 10 June 2021, 22:53 GMT

A quick *unscientific* benchmark:

- "benchmark" command: sudo pacman -Swdd --noconfirm --logfile /dev/null --cachedir $(mktemp -dp.) <pkgs>
- some tiny packages: perl_a=($(pacman -Slq | grep -m30 perl-))
- more tiny packages: perl_b=($(pacman -Slq | grep perl- | tail -30))

tl;dr: ordering downloads by size slightly improves throughput in the somewhat extreme cases tested (~5% faster if latency isn't very low); maybe worth implementing if the complexity to do so is minimal

63ms ping to HTTPS mirror with HTTP/2 support (best of 3 runs each)
-------------------------------------------------------------------
libreoffice-fresh linux ${perl_a[@]} ${perl_b[@]} => 232.8 MiB 5.86 MiB/s 00:40
${perl_a[@]} linux ${perl_b[@]} libreoffice-fresh => 232.8 MiB 5.62 MiB/s 00:41
${perl_a[@]} ${perl_b[@]} libreoffice-fresh linux => 232.8 MiB 5.44 MiB/s 00:43

libreoffice-fresh ${perl_a[@]} ${perl_b[@]} => 137.9 MiB 5.71 MiB/s 00:24
${perl_a[@]} libreoffice-fresh ${perl_b[@]} => 137.9 MiB 5.43 MiB/s 00:25
${perl_a[@]} ${perl_b[@]} libreoffice-fresh => 137.9 MiB 5.12 MiB/s 00:27

63ms ping to HTTP mirror (best of 3 runs each):
-------------------------------------------------------------------
libreoffice-fresh linux ${perl_a[@]} ${perl_b[@]} => 232.8 MiB 5.63 MiB/s 00:41
${perl_a[@]} linux ${perl_b[@]} libreoffice-fresh => 232.8 MiB 5.49 MiB/s 00:42
${perl_a[@]} ${perl_b[@]} libreoffice-fresh linux => 232.8 MiB 5.43 MiB/s 00:43

libreoffice-fresh ${perl_a[@]} ${perl_b[@]} => 137.9 MiB 5.82 MiB/s 00:24
${perl_a[@]} libreoffice-fresh ${perl_b[@]} => 137.9 MiB 5.49 MiB/s 00:25
${perl_a[@]} ${perl_b[@]} libreoffice-fresh => 137.9 MiB 5.21 MiB/s 00:26

19ms ping to HTTP mirror (best of 3 runs each)
----------------------------------------------
libreoffice-fresh linux ${perl_a[@]} ${perl_b[@]} => 232.8 MiB 5.95 MiB/s 00:39
${perl_a[@]} linux ${perl_b[@]} libreoffice-fresh => 232.8 MiB 5.93 MiB/s 00:39
${perl_a[@]} ${perl_b[@]} libreoffice-fresh linux => 232.8 MiB 5.91 MiB/s 00:39

Comment by Geert Hendrickx (ghen) - Friday, 11 June 2021, 06:52 GMT

Those are just very small differences indeed - what ParallelDownloads value did you use?

Comment by Evangelos Foutras (foutrelis) - Friday, 11 June 2021, 06:56 GMT

5, which is the default if you just uncomment the option in pacman.conf.

Comment by Geert Hendrickx (ghen) - Friday, 11 June 2021, 10:20 GMT

I remember from the first time I tested this, it downloaded a bunch of small packages first, and started a big one (kernel) as almost-last, so it was bottlenecked by that one big (single threaded) download in the end.
Starting the big download(s) first and keeping the small ones last allows for maximal parallellism. But it's pausible that the practical gains are marginal (if your download link is saturated anyway), which you proved here.
On the other hand the download sizes are known in advance (from the sync db) so it's probably not that complex to sort them on size anyway?

Comment by Charlie Sale (softwaresale) - Thursday, 12 August 2021, 01:31 GMT

Is anyone working on this? I'm interesting in contributing to pacman and this seems like a decently easy task.

	Tasks related to this task (0)

Duplicate tasks of this task (1)
~~FS#71209 - [pacman] When downloading in parallel, files should be stored by~~

Arch Linux

FS#70172 - ParallelDownloads: order downloads by size

Details

Loading...