FS#20371 - Pacman doesn't close ftp connections

Attached to Project: Pacman
Opened by Guilherme Andrade (gueek) - Wednesday, 04 August 2010, 17:36 GMT
Last edited by Andreas Radke (AndyRTR) - Friday, 10 December 2010, 21:17 GMT
Task Type Bug Report
Category Backend/Core
Status Closed
Assigned To No-one
Architecture All
Severity Low
Priority Normal
Reported Version 3.4.0
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 5
Private No

Details

Summary and Info:
When downloading packages from an ftp mirror, Pacman doesn't close the connections right after downloading each package. Hence, in certain mirrors, which have built-in "max connections" limits, downloads will fail after a specific amount of downloaded packages (it depends on each mirror configuration), and pacman will try the next mirror. This doesn't happen if Pacman is instructed to use wget (in pacman.conf).

Steps to Reproduce:
1. Choose a mirror with a low "max connection limit" (in this case: ftp://ftp.rnl.ist.utl.pt/pub/archlinux/$repo/os/$arch ; but others will certainly exist (see https://bbs.archlinux.org/viewtopic.php?id=102106 ))
2. Download a reasonable amount of packages (in this case, at least 10 - the 10th package will fail)
This task depends upon

Closed by  Andreas Radke (AndyRTR)
Friday, 10 December 2010, 21:17 GMT
Reason for closing:  Fixed
Comment by Dan McGee (toofishes) - Wednesday, 04 August 2010, 19:59 GMT
A good starting point here would be to investigate if this is libfetch's fault or ours. We may be forgetting to call some cleanup function, or they may be forgetting to properly close the connection.
Comment by Alexander Duscheleit (jinks) - Wednesday, 18 August 2010, 09:26 GMT
From my mail to arch-general for completeness' sake:
After playing around i a throwaway-chroot, the problem seems to be
libfetch >=2.30. I just modified the PKGBUILD to different versions
(without replacing or rebuilding pacman at all).

Libfetch 2.26 fetches files without a problem, 2.30+ fails after
downloading 5 files while MaxInstances in proftpd is set to 8. If I set
MaxInstances to 3, downloading fails outright (also only for 2.30+), so
something there seems to consume 3 connections before actually
downloading something at all.

I'd try to break it down further, but NetBSDs CVS server isn't talking
to me at the moment.
Comment by Alexander Duscheleit (jinks) - Wednesday, 18 August 2010, 09:41 GMT
Digging further...

Since NetBSD CVS wouldn't talk to me, and since I haven't done anything meaninfull with CVS since subversion hit it's beta phase, I cloned git://git.dragonflybsd.org/pkgsrc.git which should be a plain git-mirror for NetBSD's CVS according to http://www.dragonflybsd.org/release26/.

Bisecting between 2.26 and 2.30 (365c8abb44ad4871b7ef9b2cc5f8b7c1ba452a34 and 230b2e9177eb84dda30c51eb298a6aaa92d4a6c7) got me to:
5e460432d720265b643baf64c545a368c8425ded is the first bad commit
commit 5e460432d720265b643baf64c545a368c8425ded
Author: Joerg Sonnenberger <joerg@NetBSD.org>
Date: Mon Jan 11 17:23:10 2010 +0000

libfetch-2.27:
The connection sharing didn't handle the case of active transfers
correctly and tried to close the connection in that case (PR 42607).
Correctly check if there is a transfer going on and just leave the
connection alone in that case.

:040000 040000 1f0b961091945c5304f2fc105c5fde7ed94daf84 89b619c8bab0521166eef7dabb4f7624191d87b5 M net

I will attach the whole git show output for that commit.

Judging by the changes "cached_connection->is_active" is supposed to be false but isn't. So something somewhere is forgetting to mark the ftp connection inactive after use. (Or is not reusing a cached_connection when it could. Is it really neccessary to create a new ftp connection for every single file?)
Comment by Dan McGee (toofishes) - Wednesday, 18 August 2010, 17:58 GMT
I think we need to figure out how to reuse connections for both the FTP and HTTP transfers- I have the feeling we are not doing that correctly. If you wouldn't mind looking into that a bit more, that would be awesome. At the very least we need to clean up better after being done with a connection it sounds like.
Comment by Dan McGee (toofishes) - Wednesday, 18 August 2010, 18:44 GMT
2.28 completely revamped connection sharing in order to support it with both FTP and HTTP, so I'm sure it has something to do with that. If you could work with upstream perhaps that would be awesome- Joerg is pretty responsive at dealing with these type of things. A short test program would probably be best to show what is going on.

For our latest cut of the code you can go to ftp://ftp.archlinux.org/other/libfetch/.
Comment by Dan McGee (toofishes) - Wednesday, 18 August 2010, 18:44 GMT Comment by Alexander Duscheleit (jinks) - Thursday, 19 August 2010, 11:11 GMT
I hope, with "you" you didn't mean me. I haven't touch a line of C in over 15 years, and I wasn't very good to begin with.

I merely poked around with git bisect to spare some *proper* dev the time to hunt for the root cause.

(I even tried to have a look at dload.c from libalpm. You lost me at line 56)
Comment by Xavier (shining) - Thursday, 19 August 2010, 14:22 GMT
Alex, your report was better than we could hope ! Very detailed with all the necessary information to reproduce easily + even some pointers at what to look for in libfetch, though we are not expert of that code :)
I contacted Joerg and then nailed the issue down to our fetchStat call in our code, so that he very quickly found and fixed the issue in libfetch code.
The libfetch patch he provided works perfectly for me so that this whole issue will be solved in next libfetch release or (archlinux) package.
Comment by Dan McGee (toofishes) - Monday, 23 August 2010, 21:41 GMT
Fixed in libfetch upstream in 2.33, entering testing now.

Loading...