FS#16359 - Running out of disk space with pacman 3.3.1-1

Attached to Project: Pacman
Opened by Delete Me (steve-e) - Saturday, 26 September 2009, 09:51 GMT
Last edited by Xavier (shining) - Saturday, 03 October 2009, 12:24 GMT
Task Type Bug Report
Category General
Status Closed
Assigned To Xavier (shining)
Dan McGee (toofishes)
Architecture All
Severity High
Priority Normal
Reported Version 3.3.0
Due in Version 3.3.2
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Hello,

I have issues with pacman generating enormous packages.

Pacman sometimes (about every 10th package) fails when downloading an update. It just stoppes downloading, but keeps showing the current progress and donwload speed.
In the background my disk is filling up. The package that is meant to be downloaded is growing and growing until my disk is full. E.g., I just wanted to update to the latest kernel in testing and my system failed when the downloaded package had grown up to 2.6Gig.

Mind that my normal diskspace is absolutely sufficient for the packages I want to download. Furthermore, I am not downloading 2.6Gig, but instead it must be some kind of local issue.

I' am running latest testing and have the following versions installed:
pacman 3.3.1-1
libarchive 2.7.1-1
libfetch 2.25-11.

I know this problem description is very vague, but hopefully someone is able to reproduce it. Find my pacman.conf attached and if you need any more information, just let me know!

Thanks and Best Regards,
steve-e
This task depends upon

Closed by  Xavier (shining)
Saturday, 03 October 2009, 12:24 GMT
Reason for closing:  Fixed
Additional comments about closing:  fixed by commit 20392c0a025 , will be in pacman 3.3.2
Comment by Xavier (shining) - Saturday, 26 September 2009, 10:59 GMT
Is this a regression of 3.3.1 ? The problem just started after you upgraded to that version ?
You cannot reproduce by going back to 3.3.0 ?
And what do you mean there :
"It just stoppes downloading, but keeps showing the current progress and donwload speed."

You can still see the progress bar and dowload speed, but they don't change and are stuck, which makes you think the download stopped ?
Comment by Anarconda (Anarconda) - Saturday, 26 September 2009, 15:55 GMT
I can confirm this is happening here too. BTW, the problem started with 3.3.0.

In my case the problem is not every X packages, seems to occur only with the big ones (kernel from testing today or go-openoffice days ago). I mean, not always happens, most of the times all is OK.

I guess (I'm not sure) the problem has something to do with losing server connection or so. Pacman continues filling the file without receive and EOF. Every time I had this problem with pacman I tried to download the same file from the same mirror with firefox and I can see how several times the download speed is 0.0 and, after a while, restart downloading again.

Hope this help a bit. TIA.
Comment by Xavier (shining) - Monday, 28 September 2009, 14:47 GMT
Answer from Joerg :
"Hm. The only reason I can think of is if the server is not handling the
range header correctly. A strace would be a good start..."

Could you please reproduce that bug while running pacman in strace ?
Comment by Anarconda (Anarconda) - Monday, 28 September 2009, 15:03 GMT
Yes, I can reproduce the bug.

[root@Gaia Arch_Packages]# strace -o pacman_trace.txt pacman -Syu
:: Sincronizando las bases de datos de paquetes...
testing está actualizado
core está actualizado
extra está actualizado
community está actualizado
:: iniciando actualización completa del sistema...
precaución: transmission-gtk: local (1.75-1) es más nuevo que extra (1.74-1)
resolviendo dependencias...
verificando conflictos...

Objetivos (1): go-openoffice-3.1.1.2-1

Tamaño total de descarga: 178,83 MB
Tamaño total instalado: 417,30 MB

¿Continuar con la instalación? [S/n] s
:: Descargando paquetes desde extra...
error: no se pudo escribir al archivo '/var/cache/pacman/pkg/go-openoffice-3.1.1.2-1-x86_64.pkg.tar.gz': No queda espacio en el dispositivo
error: falló al obtener archivo 'go-openoffice-3.1.1.2-1-x86_64.pkg.tar.gz' desde mir.archlinux.fr: Requested Range Not Satisfiable
error: falló al obtener archivo 'go-openoffice-3.1.1.2-1-x86_64.pkg.tar.gz' desde mir2.archlinuxfr.org: Requested Range Not Satisfiable
error: falló al obtener archivo 'go-openoffice-3.1.1.2-1-x86_64.pkg.tar.gz' desde mirror.archlinux.com.ve: Bad Gateway
error: falló al obtener archivo 'go-openoffice-3.1.1.2-1-x86_64.pkg.tar.gz' desde archlinux.unixheads.org: Requested Range Not Satisfiable
precaución: falló al descargar algunos archivos desde extra
error: no se pudo iniciar la operación (Requested Range Not Satisfiable)
Ocurrieron errores, no se actualizaron paquetes
[root@Gaia Arch_Packages]#

[amarcord@Gaia pkg]$ ll
total 1966080
drwxr-xr-x 2 root root 4096 sep 28 16:54 .
drwxr-xr-x 3 root root 4096 ago 21 19:25 ..
-rw-r--r-- 1 root root 2013089792 sep 28 16:58 go-openoffice-3.1.1.2-1-x86_64.pkg.tar.gz.part
-rw-r--r-- 1 root root 107452 sep 28 16:51 strace-4.5.18-1-x86_64.pkg.tar.gz
[amarcord@Gaia pkg]$ df -h
S.ficheros Tamaño Usado Disp Uso% Montado en
/dev/sda1 4,6G 4,6G 0 100% /
none 1,8G 200K 1,8G 1% /dev
none 1,8G 48M 1,7G 3% /dev/shm
/dev/sda6 47G 34G 14G 72% /home
/dev/sda3 9,4G 642M 8,8G 7% /opt
/dev/sda7 130G 71G 59G 55% /mnt/sda7
/dev/sdb1 230G 198G 21G 91% /mnt/sdb1
[amarcord@Gaia pkg]$


Cheers.
Comment by Anarconda (Anarconda) - Monday, 28 September 2009, 15:13 GMT
If you need help to translate messages count on me.

TIA.
Comment by Xavier (shining) - Monday, 28 September 2009, 16:38 GMT
Well that was quick, thanks a lot ! I already forwarded this information to Joerg (libfetch developer).

Also it is stupid that pacman does not display the total download size, it only uses it for the progress bar.
Could you try the simple program attached to display that ?
gcc dload.c -o dload -lfetch
./dload

The result for me is : 187 517 348
   dload.c (0.2 KiB)
Comment by Anarconda (Anarconda) - Monday, 28 September 2009, 16:51 GMT
Same for me:

[amarcord@Gaia dload]$ ./dload
187517348
[amarcord@Gaia dload]$
Comment by Delete Me (steve-e) - Monday, 28 September 2009, 20:05 GMT
Also same for me.
Comment by Ray Rashif (schivmeister) - Monday, 28 September 2009, 22:14 GMT
This must be an issue with the default fetching utility, because using wget as the downloader proves to be fine and dandy. Some debunking though:

"In my case the problem is not every X packages, seems to occur only with the big ones (kernel from testing today or go-openoffice days ago). I mean, not always happens, most of the times all is OK.

I guess (I'm not sure) the problem has something to do with losing server connection or so. Pacman continues filling the file without receive and EOF."

Nope. Any package, small or big (depending on your connection speed). Your network is unstable, that is yet another cause of this problem. Packets can stop abruptly and you won't notice it. Usually, download utilities will resume as soon as the packets start flowing again. However, in this case, that's not happening.

What's happening is once the network is interrupted, libfetch (or pacman's downloading function around it) is not aware and continues downloading bogus data. The file.part size grows in the magnitude of around 1-100MB every second (a guess) regardless of your connection speed. Even if it struck at 1KB, you'd end up with a file of 1G and still growing! That is, of course, until you ^C the process.

When this happens with wget, the download speed slowly decreases to 0 and then NIL (--:--). Upon reconnection, it simply resumes from where it was interrupted.

Workaround: XferCommand = /usr/bin/wget --passive-ftp -c -O %o %u
Comment by Curtis (foxcm2000) - Tuesday, 29 September 2009, 02:14 GMT
I'm having the exact same issue... I have over 13GB free on the partition and the go-openoffice upgrades are filling it all up for no reason. This bug appears to be semi-random since if I re-try the upgrade several times it eventually works.
Comment by Dan McGee (toofishes) - Tuesday, 29 September 2009, 02:48 GMT
For anyone posting further comments here, we need:
1. What mirror.
2. Whether you had a partial download file in place. This looks like the crux of the issue.
3. How your download was interrupted (and how you resumed it if you did anything at all).
Comment by Ray Rashif (schivmeister) - Tuesday, 29 September 2009, 08:14 GMT
1. Any mirror (see attached).

2. Yes. The unsuspecting user simply sees a "stuck" progress, while the partial download is getting bigger and bigger in the background with nothing. So naturally, pacman will not resume this as the size does not match up with remote file.

3. The unsuspecting user does not even know there's been an interruption of any sorts. He sees a "stuck" progress, so eventually he also sees the need to terminate the process (CTRL+C). He never finds out about the sizes thereafter, until he becomes a suspecting user.

We all know that such a "stuck" progress of anything network-related is due to an unstable connection (can be host's, or can be server congestion). Take embedded flash players for example. This is a different story and we have nothing to do about it.

What's not a different story is that the partial file begins to gain invalid size at an accelerated rate upon meeting this fate. This is the bug.
Comment by Xavier (shining) - Tuesday, 29 September 2009, 17:28 GMT
The bug was actually a big gotcha in pacman code. Patch attached.

Of course this won't fix up all your network problems and download issues, but at least it will not fill up the disk.
And it should resume download fine.
Comment by Ray Rashif (schivmeister) - Tuesday, 29 September 2009, 18:04 GMT
Awesome, works :)

Thanks for taking care of this so swiftly even when it's a minority issue.

Anyway, the expected behaviour in case of a network problem:

* pacman will immediately report "failed retrieving file"
* pacman will then try the next mirror and resume

All is well; task resolved:

[schiv@v3000 ~]$ ls -lh /var/cache/pacman/pkg/go-openoffice-3.1.1.2-1-i686.pkg.tar.gz.part
-rw-r--r-- 1 root root 169K 2009-09-30 01:54 /var/cache/pacman/pkg/go-openoffice-3.1.1.2-1-i686.pkg.tar.gz.part
Comment by Xavier (shining) - Tuesday, 29 September 2009, 18:18 GMT
To Ray : thanks for pointing out the cause of network interruption, and for the prompt testing :)
Comment by Nagy Gabor (combo) - Thursday, 01 October 2009, 10:33 GMT
To Xavier: Good job. ;-)
Comment by Guus Snijders (ghs) - Thursday, 01 October 2009, 12:39 GMT
Just being curious; if the file kept growing after disconnecting (or interrupting) the connnection, what data was filling it up? Where did the data come from?
Comment by Dan McGee (toofishes) - Thursday, 01 October 2009, 13:24 GMT
A dump of your processes address space, as we decided to fwrite() 4GB of data (the value of -1 when converted to an unsigned integer).
Comment by Xavier (shining) - Thursday, 01 October 2009, 13:32 GMT
Oh thanks for reminding me. I had the same question and forgot to check the code for the answer !

Short answer : we kept rewriting the same data over and over (the last data that got downloaded when libfetch returned an error + a part of the stack ?)

Long answer :

This is our simplified loop :

while((nread = fetchIO_read(dlf, buffer, PM_DLBUF_LEN)) > 0) {
size_t nwritten = 0;
while(nwritten < nread) {
nwritten += fwrite(buffer, 1, (nread - nwritten), localf);
if(ferror(localf)) {
ret = -1;
goto cleanup;
}
}
dl_thisfile += nread;
}

What me missed is that fetchIO_read returns a signed type, because it returns -1 in case of error
(unlike fread which only returns unsigned, and 0 for error)
So we converted that to unsigned which gave us a huge value, SIZE_MAX = 4294967295

Now the loop calling fwrite is actually bogus. Notice how we never update the index of the source ? We always use buffer.
Another question is : when does fwrite write less than asked ? According to the man page, only in case of errors.
However, in the bug situation, fwrite was called multiple times, filling the disk. Though it would be interesting to see if it wrote more than the buffer length, taking many data from the stack as well.

Anyway, in the normal non bogus case, nread is at most PM_DLBUF_LEN, and it seems that only one fwrite call is needed to write that. And if fwrite does not write that, we can handle it as an error, as the man page suggests.
So I actually changed that a while ago (but its code which will only be used in pacman 3.4) :
http://projects.archlinux.org/?p=pacman.git;a=commitdiff;h=3cf0ee98c02e1f40954f0c2c508e16d33ca299ca

while((nread = fetchIO_read(dlf, buffer, PM_DLBUF_LEN)) > 0) {
size_t nwritten = 0;
nwritten = fwrite(buffer, 1, nread, localf);
if((nwritten != nread) || ferror(localf)) {
ret = -1;
goto cleanup;
}
dl_thisfile += nread;
}

Actually, this new code looks like it would have been safer. Because fwrite does not apparently write the 4GB in one call, so it would have returned before, and we would not have filled the disk.

To be even safer, I wonder if we should add a sanity check : nread <= PM_DLBUF_LEN ?
Comment by Guus Snijders (ghs) - Thursday, 01 October 2009, 15:32 GMT
Thanks Dan & Xavier,
the C code was a bit too much for me personally, but the descriptions are very clear.
Comment by Audren Cezar (audrencezar) - Thursday, 01 October 2009, 22:16 GMT
I had the same problem.

After I apply the patch from Xavier's Comment [http://bugs.archlinux.org/task/16359#comment50287] and recompile the pacman 3.1.1-1, it worked for me.

The PKGBUILD of 3.3.1 with the patch is here attached.

Thanks.

Excuse me, the my poor English.

Loading...