FS#15657 - presence of new db version is not checked when <dbfile>.part exists

Attached to Project: Pacman
Opened by Roman Kyrylych (Romashka) - Thursday, 23 July 2009, 19:49 GMT
Last edited by Allan McRae (Allan) - Saturday, 04 December 2010, 09:18 GMT
Task Type Bug Report
Category General
Status Closed
Assigned To Xavier (shining)
Dan McGee (toofishes)
Architecture All
Severity Medium
Priority Normal
Reported Version 3.2.2
Due in Version 3.4.0
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Summary and Info:

When doing -Syy some time after previous download was interrupted:

:: Synchronizing package databases...
debug: using 'testing.db.tar.gz' for download progress
debug: existing file found, using it
debug: HTTP_PROXY: (null)
debug: http_proxy: (null)
debug: FTP_PROXY: (null)
debug: ftp_proxy: (null)
error: failed retrieving file 'testing.db.tar.gz' from
dev.archlinux.org : Requested Range Not Satisfiable
debug: failed to sync db: Requested Range Not Satisfiable
error: failed to update testing (Requested Range Not Satisfiable)

Between download was interrupted and new download started the db file was updated
and became smaller so that range does not exist anymore.
Not sure what would happen if range was satisfiable, theoretically it should result in a broken db file.

The bug here is that pacman does not seem to even check if the file was changed and goes straight to continuing the download of <dbname>.part


Steps to Reproduce:

you may want to try the attached file,
just put it in /var/lib/pacman and run -Syy
This task depends upon

Closed by  Allan McRae (Allan)
Saturday, 04 December 2010, 09:18 GMT
Reason for closing:  Fixed
Additional comments about closing:  was fixed in 3.4.0 with git commit d2dbb04a9a
Comment by Xavier (shining) - Friday, 24 July 2009, 07:30 GMT
And how could pacman check that?
Comment by Dan McGee (toofishes) - Friday, 24 July 2009, 23:53 GMT
We probably shouldn't use partial files at all when it comes to DB downloads. It makes sense for packages where the content of a specific filename is immutable, but it doesn't make sense for something like the database.
Comment by Roman Kyrylych (Romashka) - Saturday, 25 July 2009, 16:10 GMT
> And how could pacman check that?
When starting new download check if dbfile last update time as specified by server is newer than that of local file or .part file.

It's easier to just go with Dan's idea though.
Then such check won't be needed (even for packages, I cannot imagine a situation when package file gets updated without changing its name).
Comment by Xavier (shining) - Saturday, 25 July 2009, 18:42 GMT
So do we need to add a "int resume" parameter to all of our download functions?

- int _alpm_download_single_file(const char *filename,
alpm_list_t *servers, const char *localpath,
time_t mtimeold, time_t *mtimenew)
- handle->fetchcb(url, localpath, mtimeold, mtimenew) (+ the callback itself)
- int _alpm_download_files(alpm_list_t *files,
alpm_list_t *servers, const char *localpath)
- static int download(const char *url, const char *localpath,
time_t mtimeold, time_t *mtimenew) {
- download_internal(url, localpath, mtimeold, mtimenew)

?
Comment by Dan McGee (toofishes) - Saturday, 25 July 2009, 22:24 GMT
That was my initial thought, although now I am looking closer at our download_internal logic. I noticed we don't first to a fetchStat operation first, which might allow us to do some more careful things with mtime and size checks that we aren't currently doing.

The funny thing with the mtime stuff right now is we only use it for databases. If we didn't, our logic would be completely broken as passing an identical mtime in aborts the download early without any regard for whether the download is finished.

I would have tried to address this before 3.3 if it was an easy problem, but it might have to wait until 3.3.1 or even 3.4 as it will probably require an API change.
Comment by Dan McGee (toofishes) - Monday, 07 September 2009, 14:46 GMT
Xavier- don't know how far you got on this but let me look around for some work I might have started here.
Comment by Xavier (shining) - Monday, 07 September 2009, 14:55 GMT
I did absolutely nothing yet, but I wanted to look into it so I added it to my task list to not forget.
I will wait for your initial work :)
Comment by Ray Rashif (schivmeister) - Monday, 28 September 2009, 18:36 GMT
I should say that this does not apply to db alone. If a package download is interrupted, same thing happens. All this while I've been blaming my mobile connection (to an extent it _is_ at fault, because it frequently stops transmission for a short while when it goes from 3G to EDGE and vice-versa).

Workaround there is to remove the cached (partial) download, and restart. If that does not work, wget && pacman -U =p
Comment by Xavier (shining) - Monday, 28 September 2009, 18:53 GMT
Wait, what ? I interrupt and resume package download all the time, this works fine. Packages have an unique filename so don't change.

However, the database changes all the time while keeping the same name. This is what gives us some troubles.
Comment by Ray Rashif (schivmeister) - Monday, 28 September 2009, 19:19 GMT
So do you mean

a) package downloads should never error out like:

error: failed retrieving file 'foobar-1.0-1-i686.pkg.tar.gz' from
foo.bar.org : Requested Range Not Satisfiable

-OR-

b) package downloads can error out like that (if an update is interrupted and then resumed with an -Syyu for instance) but it's because the database changes all the time

-OR-

c) I should not stop cursing my ISP
Comment by Xavier (shining) - Monday, 28 September 2009, 19:43 GMT
I mean that :
- this should never happen for package download (unless you are hit by some very weird and more serious bug like  FS#16359 )
- this might happen for database download, because of a limitation of pacman (not a bug), which is what this whole report is about

So if your situation is not the same as  FS#16359 , please open a new bug report, with as much details as possible.
Comment by Ray Rashif (schivmeister) - Monday, 28 September 2009, 20:54 GMT
Before that, let me just clarify this. By interruption, I meant network interruption (physical or not; cable unplug, congestion, whatever). A ^C and then resuming will always work (even with db). I simulated a disconnection in the following:

[schiv@v3000 ~]$ sudo pacman -S go-openoffice
resolving dependencies...
looking for inter-conflicts...

Targets (1): go-openoffice-3.1.1.2-1

Total Download Size: 166.99 MB
Total Installed Size: 361.03 MB

Proceed with installation? [Y/n] y
:: Retrieving packages from extra...
^Co-openoffice-3.1.1... 1392.5K 82.6K/s 00:34:12 [---------] 0%
Interrupt signal received

[schiv@v3000 ~]$ sudo pacman -S go-openoffice
resolving dependencies...
looking for inter-conflicts...

Targets (1): go-openoffice-3.1.1.2-1

Total Download Size: 166.99 MB
Total Installed Size: 361.03 MB

Proceed with installation? [Y/n] y
:: Retrieving packages from extra...
error: failed retrieving file 'go-openoffice-3.1.1.2-1-i686.pkg.tar.gz' from mirror.internode.on.net : Requested Range Not Satisfiable
^C
Interrupt signal received

I believe it's related, which then as a by-product is leading to  FS#16359 . So, in the words of this task's summary:

"The bug here is that pacman does not seem to even check if the file was changed and goes straight to continuing the download of <pkg>.part"

But alas, I wouldn't be able to attach the .part file even if I interrupted it at 10K, because it'd be over 1G by the time I do a ^C! (moving on to  FS#16359 )
Comment by Xavier (shining) - Monday, 28 September 2009, 21:29 GMT
I still don't get it. You say the file was changed. How was the file changed ? A given package never changes on the remote server.

Another thing I don't get : in the first download, how did you simulate network disconnection ? And when you did that simulation, the download did not stop ?
And you had to to ctrl+C after the simulated interruption ?
Comment by Ray Rashif (schivmeister) - Monday, 28 September 2009, 22:02 GMT
I stand corrected - my issue is actually borked before even resuming. But then again, libfetch (if that's the default downloader) is the problem (I will be commenting on this in the other task), so you may want to watch that thing with regards to this bug.

As for the simulation, simply networkmanager gui > disconnect, or killall $dialer where in my case is pppd. One could simply reproduce this by pulling out the network cable as well, or bringing down the interface, or any other brutal method (hint: please try).

Correct - the download did not stop. It just paused with no visual feedback, but in the background the file gets filled with crap at the speed of sound.
Comment by Michael Witten (mfwitten) - Saturday, 25 December 2010, 14:46 GMT
I request re-opening this task.

Using "Pacman v3.4.1 - libalpm v5.0.1", I recently had the same problem; for each mirror, I was getting "error: failed retrieving file 'extra.db.tar.gz' ... Requested Range Not Satisfiable" and finally "error: failed to update extra (Requested Range Not Satisfiable)".

The solution was to do the following: sudo rm /var/lib/pacman/extra.db.tar.gz.part

After this, everything went swimmingly.

Loading...