FS#1151 - pacman plays too much with HD. in the end system reminds of Window$
Attached to Project:
Pacman
Opened by Nikos Kouremenos (zeppelin) - Monday, 19 July 2004, 01:26 GMT
Last edited by Tobias Powalowski (tpowa) - Friday, 31 March 2006, 17:10 GMT
Opened by Nikos Kouremenos (zeppelin) - Monday, 19 July 2004, 01:26 GMT
Last edited by Tobias Powalowski (tpowa) - Friday, 31 March 2006, 17:10 GMT
|
Details
I have latest stable pacman
[root@Freud peers]# pacman -Syu :: Synchronizing package databases... current [#################################################] 100% 37K 5.3K/s 00:00:07 extra [#################################################] 100% 143K 4.0K/s 00:00:35 bfinch [#################################################] 100% 3K 1.5K/s 00:00:02 contrasutra [#################################################] 100% 0K 0.1K/s 00:00:03 deepfreeze [#################################################] 100% 0K 0.3K/s 00:00:02 dp [#################################################] 100% 3K 1.7K/s 00:00:02 hapy [#################################################] 100% 0K 0.4K/s 00:00:01 kritoke [#################################################] 100% 1K 1.0K/s 00:00:01 roberto [#################################################] 100% 0K 0.2K/s 00:00:01 staging [#################################################] 100% 22K 4.6K/s 00:00:04 twm [#################################################] 100% 2K 1.1K/s 00:00:02 whatah [#################################################] 100% 1K 0.7K/s 00:00:02 xentac [#################################################] 100% 0K 0.1K/s 00:00:04 brice [#################################################] 100% 3K 0.8K/s 00:00:04 tpowa [#################################################] 100% 1K 0.6K/s 00:00:01 testing [#################################################] 100% 0K 0.0K/s 00:00:01 link [#################################################] 100% 1K 1.3K/s 00:00:01 :: x-11R6.7.0-1t3: ignoring package upgrade (to be replaced by xorg-11R6.7.0-1) :: cgoban2-2.5.7-1: ignoring package upgrade (2.6.1-1) :: gnome-python-2.0.2-3: local version is newer :: mono-0.96-2: ignoring package upgrade (1.0-1) :: mplayer-1.0pre4-2: ignoring package upgrade (1.0pre5-1) :: mysql-4.0.18-3: ignoring package upgrade (4.0.20-2) :: samba-3.0.3-1: ignoring package upgrade (3.0.4-2) :: Above packages will be skipped. To manually upgrade use 'pacman -S <pkg>' Killed hopefully I was able to kill it, because it started playing with the HD for about 3-4 minutes! I was able to do a top (extra low speed) and pacman sometimes would even reach MEMORY usage more than X and I happend to realize that ArchLinux REALLY NEEDS SWAP ( i have 256 MB of RAM ) and with swap it's somehow better anyways this is not exactly a bug, but maybe a extra warning should be put somewhere in the installation steps that will say sth like this: "unless you have 2 GB of RAM, consider swapping" :P that's all Judd. I really like pacman [because these days you also have to say this] |
This task depends upon
Closed by Simo Leone (neotuli)
Sunday, 15 October 2006, 17:28 GMT
Reason for closing: None
Additional comments about closing: Discuss it on pacman-dev, if any of you are still so inclined.
Sunday, 15 October 2006, 17:28 GMT
Reason for closing: None
Additional comments about closing: Discuss it on pacman-dev, if any of you are still so inclined.
without swap you system seems like it hanged
There are hacks to get around this, but they are indeed hacks and probably not a good idea to implement in the mainstream. A better solution would be a new database format, but I'm not sold on that idea yet.
Anything that can be done to improve the performance of the package installation should be done.
CHUG-CHUG is the noise of the HD?
I'm not an expert, but what else can be done there? maybe use sqlite.. [what that make it faster than scanning for files in directories] I don't know :)
maybe if soon we have a libpacman, the devs of this magnificent tool, can focus on making the lib faster and/or making the CLI faster
Yes, the very audible sound of the hard drive servos moving. The IDE hard drive causes system lag as it labours away, eating up CPU time by waiting for system I/O to complete. Unless you're blessed with SCSI, hard drives needing to randomly seek hundreds of small files which may be anywhere at all on the physical hard drive platters, is quite slow and ineffective.
Blocks of one file are very unlikely to be fragmented, when compared to reading in randomly created, individual files.
Using sqlite or even just a big text file (wherein you write changes to a new file, and only then overwrite the old file with the new one, thereby making sure you never lose your whole data file accidentally) would be incredibly faster.
The amount of data is not the problem here at all, it is only a few MB worth of data. If that was unfragmented on the disk (in a line of blocks on the filesystem, one after the other) it would take only a couple seconds to read, none of this 30 or more seconds you often get once you have a lot of installed packages.
This has nothing whatsoever to do with hard drive settings. Arch Linux opens hundreds if not thousands of files when you run the "pacman" command. It is inefficient.
Glenn Matthys:
This isn't NTFS or FAT32 ! ReiserFS and Ext filesystems don't defragment like that. This is inefficient coding.
I'm speaking for my self and not myself. I opened this bug report. I like /trust Judd.
to say that it's "inefficient coding" means you know the parts of the code that need to be fixed. So why don't you be a good boy and send a patch to Judd? Either that, or stop "This isn't NTFS or Fat32 [..] this is ineffience coding." ok?
TY
ps. I don't know you, neither Judd. So PEACE
not myself --> only for myself
I think you were talking to me and not Glenn Matthys, I am the one that said it is inefficiently coded. The thing about filesystems was to someone that said perhaps the drive is fragmented ! Please re-read the posting, I specified who each comment was aimed at.
I am not talking out of my ass when I say that pacman is inefficiently coded. I just checked, and on my system pacman opens up a minimum of over one thousand data files every single time it is run:
strace pacman -Qo /usr/bin/mplayer 2>&1|grep 'open("/var/lib/pacman'|wc --lines
1065
Of course the *second* time you run pacman, it will have those 1000+ files in Linux's RAM cache, so it will be speedy, but the first time you run it, it is quite slow and chugs the hard drive a lot with all the random seeks performed.
Also, it is far slower when installing a large group of packages, as every one of them *repeats* these checks every time ! After extracting these files the RAM cache won't even be there anymore, so it will be reading them off the disk ... again, again, again and again.
Okay, I just downloaded packages from a pacman -Suy without installing any of them, and now I am going to check how many files are opened during an average update:
time strace pacman -Suy --noconfirm 2>&1|grep 'open("'>/Raven/strace.log
My /Raven directory is on a separate partition on a separate IDE cable, so writing out the log there will cause almost no performance hit.
My computer specs:
Athlon XP 1800+, 1625Mhz, 256KB cache, 3203 BMIPs
512MB PC3200 DDR SDRAM
Linux kernel 2.6.11.3
Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge (rev 128).
PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge (rev 0).
IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 6).
ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [K8T800 South] (rev 0).
Anyhow, it is done now:
real 19m35.412s
user 2m53.153s
sys 3m34.835s
So it took just under 20 minutes to update -- and here is the scary part:
-rw-r--r-- 1 root root 35M Mar 15 17:21 /Raven/strace.log
grep /var/lib/pacman /Raven/strace.log|wc --lines
404501
It opened nearly half a million package description files ! Over four hundred thousand random hard drive seeks. My "/" partition is on a fairly modern 7200RPM, 60GB hard drive, if I had a Pentium II system with a 4500RPM drive and 192MB RAM, this update would easily have taken several hours.
It is unthinkable to reload the package dependancy data over and over and over like this. Stick it into a data structure in pacman's RAM usage, recall it from RAM. Even if this data has to be swapped out to disk, it will still recall it in a contiguous block from disk in under a second, as opposed to reading over 1000 different parts of the hard drive at random which takes several seconds.
Another unrelated efficiency note:
Pacman does not use a keepalive connection, it severs the HTTP or FTP connection after every file it downloads. This is especially noticeable when downloading 5 or 10 small files -- it takes longer to connect to the server for each file than it takes to download it.
Actually, the [very crazy] default for Arch Linux is to use a RAM-based virtual partition for /tmp, so it would be sticking that tar file into RAM or swap anyhow.
tar cf test.tar /var/lib/pacman/{current,extra}
-rw------- 1 raven users 4.4M 2005-03-15 19:32 test.tar
Pacman is already using 20 to 40MB RAM, so another ~5MB to store these data structures is definitely peanuts.
strace pacman -Qo /usr/sbin/httpd 2>&1|grep 'open("/var/lib/pacman'|wc --lines
401
Eugenia, (or judd or jan) how soon should we expect this
I'm seperating the KEEP ALIVE thing, which is really really annoying in a new bug report
thank you all {and especialy Raven}
Another possiblity is to keep the files in the tar, without changing the current db format, and to use libtar's API to access the files.
As of seeking, to see the importance of a seek-times. Take an average low-spec drive such as my 7200RPM IBM drive. It happily reads around 40mbyte/second, in a contiguous block. The average seek is 7.2msec. 40mbyte/sec * 7.2msec = 288 kilobyte/seek! So in about the time of 10 hd seeks we could instead have read the entire pacman database.
So for a pacman database of 3 megabytes, it would take under 100msec to read from disk, and probably add another 50 msec for parsing the data into useful structures. Add some various overhead, and it's remarkable that reading the database ever takes more than half a second.
However, the database itself is hardly the only I/O hog in pacman though. I've performed some tests on async I/O and I estimate fs-conflicts could be performed around 3 times faster on average if they used async I/O and let the disk scheduler work, instead of the current synchronous solution. But that's a completely other topic.
Well, that's my five cents. And sorry for repeating and restating the obvious.
I was in the same trouble, but when I stopped mounting my /var partition (reiser3) with the notail option, pacman's performance improved a lot.