FS#15369 - Pacman should timeout when mirror is not responding

Attached to Project: Pacman
Opened by Tomas Mudrunka (harvie) - Saturday, 04 July 2009, 14:42 GMT
Last edited by Dave Reisner (falconindy) - Monday, 22 August 2011, 02:26 GMT
Task Type Bug Report
Category General
Status Closed
Assigned To Dave Reisner (falconindy)
Architecture All
Severity Medium
Priority Normal
Reported Version 3.2.2
Due in Version 4.0.0
Due Date Undecided
Percent Complete 100%
Votes 12
Private No

Details

Summary and Info:
When pacman repository is droping incomming connections insted of refusing them, or if the server is just tooooooo much overloaded or even down, pacman will stuck forever (which is not good when launching it from cron, etc...). Pacman should have some timeout (eg. ~20 seconds - like web browsers do). It is not problem when it timeouts, because it can continue to the next mirror (like when connection is properly refused by mirror).

Steps to Reproduce:
1.) Set pacman to use some non-existing IP (non-respondning server) as mirror
2.) pacman -Sy (or -Syuw)

I had this problem yesterday, when my local mirror was not responding properly and pacman (-Syuw from cron) wasn't trying another mirrors, but it was just locked in background and i was forced to kill it each time i wanted to work with pacman. Thanks.
This task depends upon

Closed by  Dave Reisner (falconindy)
Monday, 22 August 2011, 02:26 GMT
Reason for closing:  Implemented
Additional comments about closing:  See comments
Comment by Tomas Mudrunka (harvie) - Sunday, 05 July 2009, 21:54 GMT
oh and i have found another similar serious issue. when mirror server stops responding during package download, there should be some timeout to restart transfer (if possible) or continue to another mirror.
Comment by SanskritFritz (SanskritFritz) - Wednesday, 12 August 2009, 22:57 GMT
pacman 3.2 behaved correctly, it reported the problem like this:
error: failed retrieving file 'dragonlord.db.tar.gz' from repo.dragonlord.cz : Connection timed out

The new pacman 3.3 just stops at the repo and waits forever.
Comment by Dan McGee (toofishes) - Wednesday, 12 August 2009, 23:42 GMT Comment by Nagy Gabor (combo) - Thursday, 13 August 2009, 16:16 GMT
http://ring.nict.go.jp/archives/NetBSD/NetBSD-release-4-0/src/external/bsd/fetch/dist/libfetch/common.c
"timeout.tv_sec += fetchTimeout;"

This suggests me that we should add sec instead of microsec.
Comment by Nagy Gabor (combo) - Thursday, 13 August 2009, 16:23 GMT
I mean we should use sec instead of millisec. ;-)
Comment by Dan McGee (toofishes) - Friday, 14 August 2009, 02:05 GMT Comment by Xavier (shining) - Tuesday, 18 August 2009, 19:02 GMT
I don't think it does. It seems libfetch is stuck on connect (in common.c , fetch_connect())
and from man connect :
If the connection cannot be established immediately and O_NONBLOCK is not set for the file descriptor for the socket, connect() shall block
for up to an unspecified timeout interval until the connection is established.

"unspecified timeout interval"... very useful...

I tested as said in the original report, by using a random IP :
Server = ftp://129.34.123.2

Do you get different results?
Comment by Xavier (shining) - Monday, 07 September 2009, 12:51 GMT
Joerg answer (libfetch developer) on this :
"This is not a bug, it is normal TCP/IP behavior. If you really insist on
knowing better, you can use an alarm timer and mark the signal as
non-restart. The connect will fail and continue to the next socket.

Joerg"
Comment by Dieter Plaetinck (Dieter_be) - Sunday, 18 April 2010, 21:59 GMT
This ticket should be reopened.
- the issue is still not fixed.
- Timeouts are good: having pacman error out after a certain period is much more friendly towards humans and scripts.
- pacman is a package manager, not a tcp/ip library
- Dan is in favor of fixing this.
Comment by Dieter Plaetinck (Dieter_be) - Sunday, 18 April 2010, 22:02 GMT
This ticket should be reopened.
- the issue is still not fixed.
- Timeouts are good: having pacman error out after a certain period is much more friendly towards humans and scripts.
- pacman is a package manager, not a tcp/ip library
- Dan is in favor of fixing this.
Comment by Xavier (shining) - Sunday, 18 April 2010, 22:09 GMT
- pacman is a package manager, not a tcp/ip library
Heh, that's exactly why I closed this bug. pacman is not a tcp/ip library, and I did not insist I knew better than tcp/ip behavior.
Timeouts are obviously good and friendly, nothing to argue here :)
It will also be interesting to see how this can be fixed, if just by implementing what Joerg said works, and how that can be implemented.
Comment by Tomas Mudrunka (harvie) - Monday, 19 April 2010, 13:36 GMT
I've found that there are serious bugs in timeouting TCP/IP connections on Linux and if you are writng application which is making lot of connections (eg. HTTP requests), one day you will came into the situation where socket will not timeout properly and your socket will block your program forever (or until you'll kill it).

This affects not only C, but also Scripting languages like PERL or PHP.
I've used ugly hack to workaround this. You can fopen() pipe to netcat or curl and handle it in non-blocking manner, but it's IMHO really ugly and can be done somehow nicer (using separate threads or something...)
Comment by Dieter Plaetinck (Dieter_be) - Thursday, 09 December 2010, 10:48 GMT
any progress on this? In AIF I rely on pacman's exitcode to tell me "everything was ok" and I rely on pacman aborting when that is appropriate. (maybe with a timeout flag?)
As long as these issues are not fixed, the installation procedure exposes various bugs as well. ( FS#16277  and  FS#19342 , although the latter might be a different issue)
Comment by mattia (nTia89) - Friday, 11 March 2011, 10:48 GMT
i put in my mirrorlist file a non-existent server: Server = ftp://wertyevteyuibnu.asd/$repo/os/x86_64
and pacman more or less instantly return the error;

but if i append Server = http://xyz.xyz.xyz.xyz/$repo/os/x86_64
pacman wait and not time out .....

so the bug is still here (pacman-3.4.3-1)
Comment by Dave Reisner (falconindy) - Monday, 22 August 2011, 02:26 GMT
Implemented in http://projects.archlinux.org/pacman.git/commit/?id=8807cac

Connections which fail to reach a speed of 1024 bytes/second for 10 seconds will error out.

Loading...