FS#11225 - repo-add creates wrong CSIZE entry
Attached to Project:
Pacman
Opened by Nicolai Lissner (blackpenguin) - Friday, 15 August 2008, 15:08 GMT
Last edited by Dan McGee (toofishes) - Saturday, 23 August 2008, 13:50 GMT
Opened by Nicolai Lissner (blackpenguin) - Friday, 15 August 2008, 15:08 GMT
Last edited by Dan McGee (toofishes) - Saturday, 23 August 2008, 13:50 GMT
|
Details
Description:
repo-add uses an invalid method to create the %CSIZE% entry in *.db.tar.gz /bin/du is NOT a tool to find size of a file, it reports the disk-usage of a file, which is not the same as it depends on the used filesystem. please replace csize=$(du -kL "$deltafile" | awk '{print $1 * 1024}') with csize=$(stat -c %s "$deltafile") and csize=$(du -kL "$pkgfile" | awk '{print $1 * 1024}') with csize=$(stat -c %s "$pkgfile") Thank you :) |
This task depends upon
Closed by Dan McGee (toofishes)
Saturday, 23 August 2008, 13:50 GMT
Reason for closing: Fixed
Additional comments about closing: Commit 282eeadc68fec1da8651d0c65ad0dfebd11a9c7f
Saturday, 23 August 2008, 13:50 GMT
Reason for closing: Fixed
Additional comments about closing: Commit 282eeadc68fec1da8651d0c65ad0dfebd11a9c7f
FS#10459du can also report the size, but the argument doing that is not portable so it was dropped.
And stat is not portable either unfortunately.
FS#10459)@all
However, I can't grasp why there never have been a problem with wrong %CSIZE% fields before. I guess gensync used a working method?
And what is wrong about the way Xilon suggested in http://bugs.archlinux.org/task/10459#comment28616
What exactly is not clean in using $(uname -s) to achieve portability if portability is an issue.
IMHO it's the only clean way, and actually much cleaner than using a method that reports wrong sizes. At least if you want to use coreutils to do it.
I thought you are trying to improve things. Don't make archlinux worse just for BSD portability, please. repo-add is generally a good idea, but it shouldn't break things.
Maybe we could have a check for using a different command for each OS, but I am not even sure how to do that.
BTW, the bug category is wrong, it should be pacman.
2) disk usage and real usage are close enough, the size does not need to be perfectly accurate
3) OS specific checks are never clean
That said, we indeed had to introduce a minor regression in order to avoid an os specific check.
You are welcome to work on a patch using uname -s, possibly with the help of Xilon, and I will support it.
case $(uname -s) in Linux) csize=$(stat -c %s "$pkgname"); ;; BSD) csize=$(stat -f "%z" "$pkgname"); ;; OSX) csize=$(...insert proper OSX command here...); ;;esac
Or, if you don't like this, just adding a line
[[ $(uname -s) == "Linux" ]] && csize=$(stat -c %s "$pkgname")
behind the current line that fills csize would at least report exact sizes when run on linux (I still believe most archlinux users use archLINUX)
Last not least, it would be very easy to write a small helper-tool that just reports the correct filesize based on c function stat() and would be portable (I wouldn't call this more clean, I just add this as an idea if someone doesn't like the "case $(uname -s) in..."
I know pacman doesn't care about filesizes when downloading files (however it actually _should_ care, but that's another issue). An md5sum only (without filesize) is pretty much useless. I also know, in an ideal world we wouldn't need to worry about manipulation of binary packages. Unfortunately we don't live in an ideal world. But we should do what we can do to make it better.
Sure, if you don't care about security at all and you just want the value to report an approx. downloadsize only, you are right, the value doesn't need to be accurate.
OTOH it's easy to implement an accurate version.
And what if the main server is compromised?
If you are concerned about security, you should rather help implementing package signing :
http://bbs.archlinux.org/viewtopic.php?pid=406967#p406967
And even with pacman you can do this very simply, a simple script "syncpkglist" that would change mirror, download *.db.tar.gz, and change mirror back would do it. You are probably right, not very much archlinux users do that. OTOH I guess not very much archlinux users would expect, that pacman doesn't compare filesizes but MD5SUM only. So, these users who get *.db.tar.gz from main-server but packages from other mirrors are even lost if CSIZE is not accurate.
Of course, signed packages would be much better and I would definetely welcome if archlinux would sign packages, too.
+1.
Even if you don't know the exact uname -s output, it does not matter, it can be fixed later.
Note that du is used in 3 different places : http://projects.archlinux.org/?p=pacman.git;a=commitdiff;h=149839c5391e9a93465f86dbb8d095a0150d755d
So the code has to be factorized a bit (at least inside repo-add).
And in makepkg, it is a recursive du (du -s) which is used. So if you want to implement that with stat, you will probably need a combination of find, stat, and bc, a bit like this example from my last comment in the other bug report :
echo $(find . -printf "%s + ") 0 |bc
Or make somehow configurable this get size check.
My main reason: I blame linux and BSD leaders because of this "incompatibility" (I may be wrong here, but this is my standpoint). Implementing two/three/... methods for certain OSs would support this "state", not implement is simply a protest vote from my side.
# Linux / Cygwin
STATCMD="stat -c %s"
# BSD / OSX
#STATCMD="stat -f %z"
Something like that?
If that is ok, we just have to implement this in makepkg and repo-add ...
Yes. IMHO this way is better.
find /tmp -exec stat -f %z '{}' ';' | awk '{sum+=$1} END {printf("%d\n", sum)}'
So we should be fine.
A few thoughts from my end, which unfortunately all is the path to getting code into pacman. :P
1. Md5sums are not for security. Ever. So any post implying this is just dumb. Package signing is the place to look for security.
2. How big of a difference are we talking here on sizes? No one has really identified this yet.
3. Why was more objection not raised when this went into git through the ML...
4. Designing applications to be portable is somewhat fun at least for me, so reverting this change and saying "we using Linux are superior" is a dumb justification to me.
5. I'm not a big fan of uname hacks.
6. If we do this STATCMD thing, it would make more sense for configure to fill this in, just as we do with the makeflags.
1. Right, but archlinux doesn't sign packages yet, does it? And without signatures, the *combination* of md5 and filesize is more secure than nothing, if you get your *db.tar.gz from a different site than the package. Nobody claims it would be 100% secure or eligible to replace signatures. Of course, I prefer signing, too. And if archlinux would have signatures I wouldn never have opened this bug, because I could live with ignoring filesizes then. No need to discuss that. And especially no need to call anyone dumb.
2. Actually most (all?) packages officially released by archlinux seem to match the sizes given in their *.db.tar.gz (Are these *.db.tar.gz actually created with repo-add? I doubt that.) The problem appeared to me first with the kdemod-repository. So for me it has been a new problem.
3. I've neither have read the ML nor do I follow the complete development of pacman. The problem that a downloaded file matched md5sum but didn't match the filesize given in *.db.tar.gz appeared to me the first time exactly one day before reporting this bug. I hunted down the reason for this and then reported it as a bug. So that's the reason why.
4. There's absolutely nothing wrong in designing applications portable. I don't question this. But I question changes of code behaviour to less accurate filesizes than the tools formerly used reported, I still would call this a bug. Btw, who said "we using Linux are superior"? I can't find anything like that and I don't think anyone in this thread really thought something like that. Anyway it's not about Linux. The problem here is GNU vs Unix. not Linux vs BSD or something. However it's senseless to discuss this point, as we know, GNU is not Unix. There _are_ some differences between the systems, and we have to handle this if we want portability. It's absolutely not a question of being superior or something. Differences rarely imply superiority. They are just what they are: differences. And they make the world more colorful. Let's handle this colors in full saturation instead of blurring them.
5. as you have seen - the solution of Xavier doesn't use uname.
6. How is configure supposed to find out the OS? Oh, damn, it is going to use uname ;-) But sure, it should fill this in automatically.
2. The Arch repositories do in fact use repo-add, but the pre-3.2.0 version at the moment which used the old way of 'du -b' to get the size.
3. Knowing that they were the exact size before is helpful, so thanks- I see now that the more important problem lies in repo-add and not makepkg.
4. Filesize was not intended to be a verification utility, and the "close enough" rule got applied here. And I saw this comment which seemed to imply some sense of superiority: "Don't make archlinux worse just for BSD portability, please." Having a slightly-off filesize is a far cry from making Arch terrible, that is what I was trying to convey.
5. I see this, I was just responding to the *entire* thread as I said.
6. But configure is specifically designed to do OS and machine-specific checks, which is a whole lot better than makepkg failing the first time you use it, for instance. Runtime detection is not our best friend. And in this case, should we even abstract it out to makepkg.conf at all, or just leave it in makepkg? That would make more sense.
I'm not trying to be hostile here btw. I'm just getting out my thoughts on this issue. And I've even attached my proposed solution to this whole fun problem. Note that I did not touch makepkg because our size there is not critical- a size to the nearest K is just fine, and switching to a find/stat way of doing it would cause all hard links to get double-counted.
(and du is indeed much more practical for getting the size of a directory)