FS#16359 - Running out of disk space with pacman 3.3.1-1
Attached to Project:
Pacman
Opened by Delete Me (steve-e) - Saturday, 26 September 2009, 09:51 GMT
Last edited by Xavier (shining) - Saturday, 03 October 2009, 12:24 GMT
Opened by Delete Me (steve-e) - Saturday, 26 September 2009, 09:51 GMT
Last edited by Xavier (shining) - Saturday, 03 October 2009, 12:24 GMT
|
Details
Hello,
I have issues with pacman generating enormous packages. Pacman sometimes (about every 10th package) fails when downloading an update. It just stoppes downloading, but keeps showing the current progress and donwload speed. In the background my disk is filling up. The package that is meant to be downloaded is growing and growing until my disk is full. E.g., I just wanted to update to the latest kernel in testing and my system failed when the downloaded package had grown up to 2.6Gig. Mind that my normal diskspace is absolutely sufficient for the packages I want to download. Furthermore, I am not downloading 2.6Gig, but instead it must be some kind of local issue. I' am running latest testing and have the following versions installed: pacman 3.3.1-1 libarchive 2.7.1-1 libfetch 2.25-11. I know this problem description is very vague, but hopefully someone is able to reproduce it. Find my pacman.conf attached and if you need any more information, just let me know! Thanks and Best Regards, steve-e |
This task depends upon
Closed by Xavier (shining)
Saturday, 03 October 2009, 12:24 GMT
Reason for closing: Fixed
Additional comments about closing: fixed by commit 20392c0a025 , will be in pacman 3.3.2
Saturday, 03 October 2009, 12:24 GMT
Reason for closing: Fixed
Additional comments about closing: fixed by commit 20392c0a025 , will be in pacman 3.3.2
You cannot reproduce by going back to 3.3.0 ?
And what do you mean there :
"It just stoppes downloading, but keeps showing the current progress and donwload speed."
You can still see the progress bar and dowload speed, but they don't change and are stuck, which makes you think the download stopped ?
In my case the problem is not every X packages, seems to occur only with the big ones (kernel from testing today or go-openoffice days ago). I mean, not always happens, most of the times all is OK.
I guess (I'm not sure) the problem has something to do with losing server connection or so. Pacman continues filling the file without receive and EOF. Every time I had this problem with pacman I tried to download the same file from the same mirror with firefox and I can see how several times the download speed is 0.0 and, after a while, restart downloading again.
Hope this help a bit. TIA.
"Hm. The only reason I can think of is if the server is not handling the
range header correctly. A strace would be a good start..."
Could you please reproduce that bug while running pacman in strace ?
[root@Gaia Arch_Packages]# strace -o pacman_trace.txt pacman -Syu
:: Sincronizando las bases de datos de paquetes...
testing está actualizado
core está actualizado
extra está actualizado
community está actualizado
:: iniciando actualización completa del sistema...
precaución: transmission-gtk: local (1.75-1) es más nuevo que extra (1.74-1)
resolviendo dependencias...
verificando conflictos...
Objetivos (1): go-openoffice-3.1.1.2-1
Tamaño total de descarga: 178,83 MB
Tamaño total instalado: 417,30 MB
¿Continuar con la instalación? [S/n] s
:: Descargando paquetes desde extra...
error: no se pudo escribir al archivo '/var/cache/pacman/pkg/go-openoffice-3.1.1.2-1-x86_64.pkg.tar.gz': No queda espacio en el dispositivo
error: falló al obtener archivo 'go-openoffice-3.1.1.2-1-x86_64.pkg.tar.gz' desde mir.archlinux.fr: Requested Range Not Satisfiable
error: falló al obtener archivo 'go-openoffice-3.1.1.2-1-x86_64.pkg.tar.gz' desde mir2.archlinuxfr.org: Requested Range Not Satisfiable
error: falló al obtener archivo 'go-openoffice-3.1.1.2-1-x86_64.pkg.tar.gz' desde mirror.archlinux.com.ve: Bad Gateway
error: falló al obtener archivo 'go-openoffice-3.1.1.2-1-x86_64.pkg.tar.gz' desde archlinux.unixheads.org: Requested Range Not Satisfiable
precaución: falló al descargar algunos archivos desde extra
error: no se pudo iniciar la operación (Requested Range Not Satisfiable)
Ocurrieron errores, no se actualizaron paquetes
[root@Gaia Arch_Packages]#
[amarcord@Gaia pkg]$ ll
total 1966080
drwxr-xr-x 2 root root 4096 sep 28 16:54 .
drwxr-xr-x 3 root root 4096 ago 21 19:25 ..
-rw-r--r-- 1 root root 2013089792 sep 28 16:58 go-openoffice-3.1.1.2-1-x86_64.pkg.tar.gz.part
-rw-r--r-- 1 root root 107452 sep 28 16:51 strace-4.5.18-1-x86_64.pkg.tar.gz
[amarcord@Gaia pkg]$ df -h
S.ficheros Tamaño Usado Disp Uso% Montado en
/dev/sda1 4,6G 4,6G 0 100% /
none 1,8G 200K 1,8G 1% /dev
none 1,8G 48M 1,7G 3% /dev/shm
/dev/sda6 47G 34G 14G 72% /home
/dev/sda3 9,4G 642M 8,8G 7% /opt
/dev/sda7 130G 71G 59G 55% /mnt/sda7
/dev/sdb1 230G 198G 21G 91% /mnt/sdb1
[amarcord@Gaia pkg]$
Cheers.
TIA.
Also it is stupid that pacman does not display the total download size, it only uses it for the progress bar.
Could you try the simple program attached to display that ?
gcc dload.c -o dload -lfetch
./dload
The result for me is : 187 517 348
[amarcord@Gaia dload]$ ./dload
187517348
[amarcord@Gaia dload]$
"In my case the problem is not every X packages, seems to occur only with the big ones (kernel from testing today or go-openoffice days ago). I mean, not always happens, most of the times all is OK.
I guess (I'm not sure) the problem has something to do with losing server connection or so. Pacman continues filling the file without receive and EOF."
Nope. Any package, small or big (depending on your connection speed). Your network is unstable, that is yet another cause of this problem. Packets can stop abruptly and you won't notice it. Usually, download utilities will resume as soon as the packets start flowing again. However, in this case, that's not happening.
What's happening is once the network is interrupted, libfetch (or pacman's downloading function around it) is not aware and continues downloading bogus data. The file.part size grows in the magnitude of around 1-100MB every second (a guess) regardless of your connection speed. Even if it struck at 1KB, you'd end up with a file of 1G and still growing! That is, of course, until you ^C the process.
When this happens with wget, the download speed slowly decreases to 0 and then NIL (--:--). Upon reconnection, it simply resumes from where it was interrupted.
Workaround: XferCommand = /usr/bin/wget --passive-ftp -c -O %o %u
1. What mirror.
2. Whether you had a partial download file in place. This looks like the crux of the issue.
3. How your download was interrupted (and how you resumed it if you did anything at all).
2. Yes. The unsuspecting user simply sees a "stuck" progress, while the partial download is getting bigger and bigger in the background with nothing. So naturally, pacman will not resume this as the size does not match up with remote file.
3. The unsuspecting user does not even know there's been an interruption of any sorts. He sees a "stuck" progress, so eventually he also sees the need to terminate the process (CTRL+C). He never finds out about the sizes thereafter, until he becomes a suspecting user.
We all know that such a "stuck" progress of anything network-related is due to an unstable connection (can be host's, or can be server congestion). Take embedded flash players for example. This is a different story and we have nothing to do about it.
What's not a different story is that the partial file begins to gain invalid size at an accelerated rate upon meeting this fate. This is the bug.
Of course this won't fix up all your network problems and download issues, but at least it will not fill up the disk.
And it should resume download fine.
Thanks for taking care of this so swiftly even when it's a minority issue.
Anyway, the expected behaviour in case of a network problem:
* pacman will immediately report "failed retrieving file"
* pacman will then try the next mirror and resume
All is well; task resolved:
[schiv@v3000 ~]$ ls -lh /var/cache/pacman/pkg/go-openoffice-3.1.1.2-1-i686.pkg.tar.gz.part
-rw-r--r-- 1 root root 169K 2009-09-30 01:54 /var/cache/pacman/pkg/go-openoffice-3.1.1.2-1-i686.pkg.tar.gz.part
Short answer : we kept rewriting the same data over and over (the last data that got downloaded when libfetch returned an error + a part of the stack ?)
Long answer :
This is our simplified loop :
while((nread = fetchIO_read(dlf, buffer, PM_DLBUF_LEN)) > 0) {
size_t nwritten = 0;
while(nwritten < nread) {
nwritten += fwrite(buffer, 1, (nread - nwritten), localf);
if(ferror(localf)) {
ret = -1;
goto cleanup;
}
}
dl_thisfile += nread;
}
What me missed is that fetchIO_read returns a signed type, because it returns -1 in case of error
(unlike fread which only returns unsigned, and 0 for error)
So we converted that to unsigned which gave us a huge value, SIZE_MAX = 4294967295
Now the loop calling fwrite is actually bogus. Notice how we never update the index of the source ? We always use buffer.
Another question is : when does fwrite write less than asked ? According to the man page, only in case of errors.
However, in the bug situation, fwrite was called multiple times, filling the disk. Though it would be interesting to see if it wrote more than the buffer length, taking many data from the stack as well.
Anyway, in the normal non bogus case, nread is at most PM_DLBUF_LEN, and it seems that only one fwrite call is needed to write that. And if fwrite does not write that, we can handle it as an error, as the man page suggests.
So I actually changed that a while ago (but its code which will only be used in pacman 3.4) :
http://projects.archlinux.org/?p=pacman.git;a=commitdiff;h=3cf0ee98c02e1f40954f0c2c508e16d33ca299ca
while((nread = fetchIO_read(dlf, buffer, PM_DLBUF_LEN)) > 0) {
size_t nwritten = 0;
nwritten = fwrite(buffer, 1, nread, localf);
if((nwritten != nread) || ferror(localf)) {
ret = -1;
goto cleanup;
}
dl_thisfile += nread;
}
Actually, this new code looks like it would have been safer. Because fwrite does not apparently write the 4GB in one call, so it would have returned before, and we would not have filled the disk.
To be even safer, I wonder if we should add a sanity check : nread <= PM_DLBUF_LEN ?
the C code was a bit too much for me personally, but the descriptions are very clear.
After I apply the patch from Xavier's Comment [http://bugs.archlinux.org/task/16359#comment50287] and recompile the pacman 3.1.1-1, it worked for me.
The PKGBUILD of 3.3.1 with the patch is here attached.
Thanks.
Excuse me, the my poor English.