FS#10459 - du -b not available on BSD

Attached to Project: Pacman
Opened by Xilon (Xilon) - Tuesday, 20 May 2008, 18:19 GMT
Last edited by Dan McGee (toofishes) - Saturday, 31 May 2008, 13:10 GMT
Task Type Bug Report
Category makepkg
Status Closed
Assigned To Dan McGee (toofishes)
Architecture Other
Severity Medium
Priority Normal
Reported Version 3.1.3
Due in Version 3.2.0
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Summary and Info:
makepkg uses "du -b" to find the file size of a package. BSD's du does not have this argument, and I have been unable to find an equivalent.

Steps to Reproduce:
Create a package with makepkg on any BSD system
This task depends upon

Closed by  Dan McGee (toofishes)
Saturday, 31 May 2008, 13:10 GMT
Reason for closing:  Fixed
Additional comments about closing:  Fixed in commit 149839c5391e9a93465f86dbb8d095a0150d755d in master
Comment by Dan McGee (toofishes) - Tuesday, 20 May 2008, 19:25 GMT
Hmm. Do we get any benefit out of actually using -b, or can we drop that? The only difference seems to be with sparse file handling unless I am interpreting things wrong, and installing things on any recent filesystem will accommodate sparse files anyway so we should be ok.
Comment by Xavier (shining) - Saturday, 24 May 2008, 21:25 GMT
-b, --bytes
equivalent to ‘--apparent-size --block-size=1'

--apparent-size
print apparent sizes, rather than disk usage; although the apparent size
is usually smaller, it may be larger due to holes in (‘sparse') files,
internal fragmentation, indirect blocks, and the like

There doesn't seem to be any equivalent for that. It does produce a difference on linux, at least it did on the first directory I tried.
As it said, the apparent size is smaller. For example on firefox3-bin package, I get 25632 kB vs 25001 kB.
But well, it is probably not a big deal, we don't need a perfectly accurate result here.
And maybe the space it occupies on the disk is more interesting than just the apparent size anyway..
So I think we can drop it.

-B, --block-size=SIZE
use SIZE-byte blocks

Now, that option doesn't exist either on bsd du. I find it silly that gnu du prints it kilo byte by default,
and both gnu and bsd du have option to display in kilo byte : -k and mega byte : -m, but not byte, which is what we want.
To get size in byte on bsd, you apparently need to do : BLOCKSIZE=1 du
Interestingly, this works on linux as well, but it is not documented in the man page. Can we still use it?

To sum up, I propose to replace du -b by BLOCKSIZE=1 du, which will produce slightly different results but should still be good.
Comment by Xilon (Xilon) - Sunday, 25 May 2008, 07:04 GMT
Alternatively we could check which system is running with uname -s and run the appropriate du command for the system.
Comment by Dan McGee (toofishes) - Sunday, 25 May 2008, 08:45 GMT
No, that is not a solution I would ever want to see, that is not clean by any definition.

Xavier's suggestion has been the best so far if it works correctly. We should probably get a proposed patch and some testing for that possible solution.
Comment by Xilon (Xilon) - Monday, 26 May 2008, 09:04 GMT
It doesn't work for me. Running it on FreeBSD 7.0 (setenv BLOCKSIZE 1 && du $file) gives the error "du: minimum blocksize is 512". The same thing occurs on Mac OSX 10.5. It does work on linux, but that's no help.

I looked into using stat, which can produce the exact same result as `du -b` (haven't tested it much), but it appears that, once again, the parameters are different. On linux you can use `stat -c "%s"`, on BSD it's `stat -f "%z"`. I guess the easiest was it so use `ls -l` and extract the size from the output. This seems like the most portable and consistent solution. It also appears to have the same output as `du -b`.
Comment by Xavier (shining) - Monday, 26 May 2008, 09:25 GMT
I was thinking about the same yesterday, either using stat, but it seemed totally not portable indeed. or ls, but ugly parsing.
But then I realized both solutions were not practical at all anyway. What we want is the size of a directory, which is exactly what du does :P

So my last idea was to just use size in kilo bytes with du -k
When we look at size, we usually look either in MB or kB anyway..
But these size in bytes are already in every arch package, local database and sync database, so converting is totally not practical.
So maybe we can just do this : du -sk | awk '{print $1 * 1024}'
(btw, awk is amazing, I just added the * 1024 like that without knowing anything and it worked :D)
Comment by Xavier (shining) - Monday, 26 May 2008, 09:41 GMT
It would be interesting to compare the results between du -k foo | awk '{print $1 * 1024}' and ls -l foo on freebsd and macos.

Also, du -k and du are equivalent on linux so -k is the default here, which might be a good argument for using it.
Comment by Xilon (Xilon) - Monday, 26 May 2008, 11:02 GMT
I just checked `du -k` and ls -l` on FreeBSD 7.0 and Mac OSX 10.5. What's interesting is that du on FreeBSD shows a higher value than du on Mac OSX. I only checked one file, so I suppose that's nothing to go by, but in general the values are fairly close:

FreeBSD:
$ du -k libfetch-6.2.0.0.tar.bz2 | awk '{print $1 * 1024}'
245760
$ ls -l libfetch-6.2.0.0.tar.bz2 | awk '{print $5}'
223875

Mac OSX:
$ du -k libfetch-6.2.0.0.tar.bz2 | awk '{print $1 * 1024}'
225280
$ ls -l libfetch-6.2.0.0.tar.bz2 | awk '{print $5}
223875

Archlinux:
$ du -k libfetch-6.2.0.0.tar.bz2 | awk '{print $1 * 1024}'
225280
$ ls -l libfetch-6.2.0.0.tar.bz2 | awk '{print $5}
223875

ls still seems to be the most consistent ;). I'll test it more thoroughly, especially on directories, a little later. Seems like a winner though :)
Comment by Xavier (shining) - Monday, 26 May 2008, 11:29 GMT
Ah yeah, I finally think the apparent size is more interesting because it is consistent across systems and file systems.

I found another way which I was happy with : echo $(find . -printf "%s + ") 0 |bc
But that printf thing is not standard and not supported on bsd :(
We could probably combine find + stat, but then we are back on the stat problem which has a totally different syntax..

Oh well, that du -k solution looks like the simplest one by far, and the result seems to be good enough.

Loading...