FS#29293 - Don\t retry downloading from a mirror if the mirror is clearly down

Attached to Project: Pacman
Opened by smyrman (smyrman) - Thursday, 05 April 2012, 15:16 GMT
Last edited by Eli Schwartz (eschwartz) - Sunday, 18 April 2021, 16:11 GMT
Task Type Bug Report
Category General
Status Closed
Assigned To No-one
Architecture All
Severity Low
Priority Normal
Reported Version 4.0.2
Due in Version 6.0.0
Due Date Undecided
Percent Complete 100%
Votes 5
Private No

Details

Summary and Info:
If a mirror is down, pacman tries to sync/download from that mirror again and again for every package. Whenever a mirror is down often, this is really annoying!

Example output:
$ pacman -Syu
:: Synchronizing package databases...
error: failed retrieving file 'core.db' from mirror.archlinux.no : connect() timed out!
core 103.5 KiB 547K/s 00:00 [######################################################################################################] 100%
error: failed retrieving file 'extra.db' from mirror.archlinux.no : connect() timed out!
extra 1382.9 KiB 2.72M/s 00:00 [######################################################################################################] 100%
error: failed retrieving file 'community.db' from mirror.archlinux.no : connect() timed out!
community 1639.3 KiB 3.04M/s 00:01 [######################################################################################################] 100%
:: Starting full system upgrade...
resolving dependencies...
looking for inter-conflicts...

Targets (29): binutils-2.22-5 coreutils-8.16-2 expat-2.1.0-1 fftw-3.3.1-1 freetype2-2.4.9-2 gcc-4.7.0-3 gcc-ada-4.7.0-3 gcc-fortran-4.7.0-3 gcc-libs-4.7.0-3 git-1.7.9.6-1 glibc-2.15-10 go-2:1-4 intltool-0.50.2-1 krb5-1.10.1-2 libarchive-3.0.4-1
libltdl-2.4.2-5 libpng-1.5.10-1 libtool-2.4.2-5 linux-api-headers-3.3-1 linux-lts-3.0.26-1 mercurial-2.1.2-1 p11-kit-0.12-1 perl-net-ssleay-1.46-1 phpmyadmin-3.4.10.2-1 ruby-1.9.3_p125-3 tzdata-2012b-3 xorg-server-1.12.0.901-1
xorg-server-common-1.12.0.901-1 xorg-xlsatoms-1.1.1-1

Total Download Size: 120.75 MiB
Total Installed Size: 490.30 MiB
Net Upgrade Size: 15.57 MiB

Proceed with installation? [Y/n] y
:: Retrieving packages from core...
error: failed retrieving file 'linux-api-headers-3.3-1-x86_64.pkg.tar.xz' from mirror.archlinux.no : connect() timed out!
linux-api-headers-3.3-1-x86_64 594.3 KiB 1590K/s 00:00 [######################################################################################################] 100%
error: failed retrieving file 'tzdata-2012b-3-any.pkg.tar.xz' from mirror.archlinux.no : connect() timed out!
tzdata-2012b-3-any 135.2 KiB 709K/s 00:00 [######################################################################################################] 100%
error: failed retrieving file 'glibc-2.15-10-x86_64.pkg.tar.xz' from mirror.archlinux.no : connect() timed out!
glibc-2.15-10-x86_64 7.6 MiB 6.72M/s 00:01 [######################################################################################################] 100%
error: failed retrieving file 'binutils-2.22-5-x86_64.pkg.tar.xz' from mirror.archlinux.no : connect() timed out!
binutils-2.22-5-x86_64 3.4 MiB 2.05M/s 00:02 [######################################################################################################] 100%
error: failed retrieving file 'coreutils-8.16-2-x86_64.pkg.tar.xz' from mirror.archlinux.no : connect() timed out!
coreutils-8.16-2-x86_64 1990.4 KiB 3.39M/s 00:01 [######################################################################################################] 100%
error: failed retrieving file 'expat-2.1.0-1-x86_64.pkg.tar.xz' from mirror.archlinux.no : connect() timed out!
expat-2.1.0-1-x86_64 101.4 KiB 536K/s 00:00 [######################################################################################################] 100%
error: failed retrieving file 'gcc-libs-4.7.0-3-x86_64.pkg.tar.xz' from mirror.archlinux.no : connect() timed out!
gcc-libs-4.7.0-3-x86_64 769.3 KiB 1996K/s 00:00 [######################################################################################################] 100%
error: failed retrieving file 'gcc-4.7.0-3-x86_64.pkg.tar.xz' from mirror.archlinux.no : connect() timed out!

......

:: Retrieving packages from community...
error: failed retrieving file 'go-2:1-4-x86_64.pkg.tar.xz' from mirror.archlinux.no : connect() timed out!
go-2:1-4-x86_64 16.9 MiB 8.47M/s 00:02 [######################################################################################################] 100%
...


Steps to Reproduce:
Run 'pacman -Syu' when the first mirror on your mirrorlist is down.


Proposed solution:
1. When pacman starts it should read in the mirror list, and store it in memory.
2. If a mirrior times out more then X times (e.g. two times), it should be removed from that internal mirror list.
This task depends upon

Closed by  Eli Schwartz (eschwartz)
Sunday, 18 April 2021, 16:11 GMT
Reason for closing:  Implemented
Additional comments about closing:  https://git.archlinux.org/pacman.git/com mit/?id=4bf7aa119d7fc2ace226439407169884 e07dbc88
Comment by Barbu Paul - Gheorghe (paullik) - Saturday, 28 July 2012, 08:08 GMT
I am investigating this issue and I'd like to know whether it is desired to allow the mirror to fail X times or to remove the mirror on the first fail (404, timeout, etc.).

I think the first approach wouldn't require a new data type (I created a POC just for -Syu and it's OK), but the second one I think would need a data type like "struct server_t" which would contain two fields, the address of the mirror (server) and it's number of failures, then alpm_db_t will hold a alpm_list_t of server_t type variables instead of char pointers.

So I'd like to know which method is preferred.
Comment by smyrman (smyrman) - Saturday, 28 July 2012, 10:15 GMT
I have noticed at least once, that there is only one package that fails for a given set of mirrors, but I would say this situation is rather rare. At least this is the case for me.

I haven't read the pacman source, but if you keep a list of servers in memory, I guess you can duplicate each server entry X times in that list, as an alternative to creating a new data type. It probably isn't the cleanest way to do it though...

I think failing on the first 404 might be acceptable, but I wouldn't say it's ideal.
Comment by Barbu Paul - Gheorghe (paullik) - Saturday, 28 July 2012, 11:00 GMT
Sindre, I don't understand what you mean by:
"I have noticed at least once, that there is only one package that fails for a given set of mirrors, but I would say this situation is rather rare. At least this is the case for me."

Duplicating the server list IMHO is not an optimal solution, both for memory usage and for execution time.

I like the idea of failing on the first 404, but giving the server a second chance if another error occurred. I have to investigate how I can check the curl's error code (probably by looking at pm_errno), but this, adds too much unnecessary complexity to the code maybe?

PS:
Dave Reisner, sorry if I interrupted or managed to trip your work on this bug report.
Comment by smyrman (smyrman) - Saturday, 28 July 2012, 13:56 GMT
Sorry If i am unclear.

Usually, if a package can not be found on a given server, the server is down. At least if there is a server connection timeout.

But I have experienced that there can be only one package that is missing from the server, but that the server is otherwise up and running. In this case you would get a 404 (I guess).
E.g. a couple of days back, the chromium package was missing from the Norwegian servers, a server in Holland, and some other servers. It was still found (in the right version) at some US server though.

Barbu:
"Duplicating the server list IMHO is not an optimal solution, both for memory usage and for execution time."

For memory usage and execution time, it's hard to imagine that this kind of duplication wold be possible to notice. But your idea is better in the way that it creates code that is easier to read, and more logical.

I believe that a good solution, would be to allow a couple of errors (at least a couple of 404s), and then remove the server if more then N errors occurs. An alternative idea, could be to treat different errors differently. If you chose this approach, you would typically want to allow a 404, but remove the server if you get a connection timeout or a 503 etc. Though it is often simpler to just group similar errors, and treat them all the same way - makes the Software more predictable:-)



Comment by Barbu Paul - Gheorghe (paullik) - Saturday, 28 July 2012, 14:04 GMT
Yeah, I'd go with allowing the server to fail X times without taking into consideration the error.

But let's see what others say about this approach vs. the single fail approach.

There is one more thing to this bug:
The server removal is made per db, so if I remove the server foo from core, pacman will still try to use foo for community, if it fails here too, it will be removed, but the next repos aren't affected, the same as community wasn't affected.

So after fixing this if one will do "pacman -Sy" and a server will fail the user won't feel the difference, the reason is stated above.
But if doing a -S or -Syu (or -Syyu) and a server will fail you won't be bothered again because the removal took place when syncing the first db/package and for the following packages from the same db the server is still removed from the internal list.

PS: A patch for the single fail approach will be available soon.
Comment by Barbu Paul - Gheorghe (paullik) - Thursday, 02 August 2012, 10:34 GMT
Ok, so it seems that I'm making (almost) no progress in getting -S to remove a server from memory.
For -Syu or -Syu <pkg_name> it works fine.

In lib/libalpm/sync.c, download_single_file function, when downloading the first package fails, upon removal of the faulty server, the pointers to the server list for the upcoming packages become NULL, therefore I get a segfault.

This could be avoided by:
1. Keeping a server list ONLY in alpm_handle_t, don't spread this server list in every structure of the app, it's hard to maintain...
This would require the struct dload_payload to be changed to say which repo the payload should be downloaded from.

2. Building the payloads on the go, try to download the first payload, if some servers are removed then the next payload will be built directly without those servers, so no NULL pointers there, I'm not really sure if this would be possible because of download_files and the way it does things.

I would go for 1, but I'm waiting for advice.
In the meanwhile should I submit my patch for -Syu?

Loading...