FS#70172 - ParallelDownloads: order downloads by size
Attached to Project:
Pacman
Opened by Geert Hendrickx (ghen) - Friday, 26 March 2021, 08:48 GMT
Last edited by Allan McRae (Allan) - Saturday, 04 September 2021, 02:26 GMT
Opened by Geert Hendrickx (ghen) - Friday, 26 March 2021, 08:48 GMT
Last edited by Allan McRae (Allan) - Saturday, 04 September 2021, 02:26 GMT
|
Details
Summary and Info:
I tested pacman 6.0 alpha1 with the ParallelDownloads feature on a box with many updates to install. The parallel downloads are working nicely (very cool visually!), only in the end everything was waiting on a single large download. A simple optimization would be to order downloads by size, starting with the largest, to maximize parallelization until the end. |
This task depends upon
Closed by Allan McRae (Allan)
Saturday, 04 September 2021, 02:26 GMT
Reason for closing: Implemented
Additional comments about closing: git commit efb714b31cd30c21df179d8cbe3730b05fffd6bd
Saturday, 04 September 2021, 02:26 GMT
Reason for closing: Implemented
Additional comments about closing: git commit efb714b31cd30c21df179d8cbe3730b05fffd6bd
- "benchmark" command: sudo pacman -Swdd --noconfirm --logfile /dev/null --cachedir $(mktemp -dp.) <pkgs>
- some tiny packages: perl_a=($(pacman -Slq | grep -m30 perl-))
- more tiny packages: perl_b=($(pacman -Slq | grep perl- | tail -30))
tl;dr: ordering downloads by size slightly improves throughput in the somewhat extreme cases tested (~5% faster if latency isn't very low); maybe worth implementing if the complexity to do so is minimal
63ms ping to HTTPS mirror with HTTP/2 support (best of 3 runs each)
-------------------------------------------------------------------
libreoffice-fresh linux ${perl_a[@]} ${perl_b[@]} => 232.8 MiB 5.86 MiB/s 00:40
${perl_a[@]} linux ${perl_b[@]} libreoffice-fresh => 232.8 MiB 5.62 MiB/s 00:41
${perl_a[@]} ${perl_b[@]} libreoffice-fresh linux => 232.8 MiB 5.44 MiB/s 00:43
libreoffice-fresh ${perl_a[@]} ${perl_b[@]} => 137.9 MiB 5.71 MiB/s 00:24
${perl_a[@]} libreoffice-fresh ${perl_b[@]} => 137.9 MiB 5.43 MiB/s 00:25
${perl_a[@]} ${perl_b[@]} libreoffice-fresh => 137.9 MiB 5.12 MiB/s 00:27
63ms ping to HTTP mirror (best of 3 runs each):
-------------------------------------------------------------------
libreoffice-fresh linux ${perl_a[@]} ${perl_b[@]} => 232.8 MiB 5.63 MiB/s 00:41
${perl_a[@]} linux ${perl_b[@]} libreoffice-fresh => 232.8 MiB 5.49 MiB/s 00:42
${perl_a[@]} ${perl_b[@]} libreoffice-fresh linux => 232.8 MiB 5.43 MiB/s 00:43
libreoffice-fresh ${perl_a[@]} ${perl_b[@]} => 137.9 MiB 5.82 MiB/s 00:24
${perl_a[@]} libreoffice-fresh ${perl_b[@]} => 137.9 MiB 5.49 MiB/s 00:25
${perl_a[@]} ${perl_b[@]} libreoffice-fresh => 137.9 MiB 5.21 MiB/s 00:26
19ms ping to HTTP mirror (best of 3 runs each)
----------------------------------------------
libreoffice-fresh linux ${perl_a[@]} ${perl_b[@]} => 232.8 MiB 5.95 MiB/s 00:39
${perl_a[@]} linux ${perl_b[@]} libreoffice-fresh => 232.8 MiB 5.93 MiB/s 00:39
${perl_a[@]} ${perl_b[@]} libreoffice-fresh linux => 232.8 MiB 5.91 MiB/s 00:39
Starting the big download(s) first and keeping the small ones last allows for maximal parallellism. But it's pausible that the practical gains are marginal (if your download link is saturated anyway), which you proved here.
On the other hand the download sizes are known in advance (from the sync db) so it's probably not that complex to sort them on size anyway?