FS#61717 - Packages not reproducible due to different size
Attached to Project:
Pacman
Opened by Jelle van der Waa (jelly) - Monday, 11 February 2019, 10:25 GMT
Last edited by Allan McRae (Allan) - Friday, 11 October 2019, 10:24 GMT
Opened by Jelle van der Waa (jelly) - Monday, 11 February 2019, 10:25 GMT
Last edited by Allan McRae (Allan) - Friday, 11 October 2019, 10:24 GMT
|
Details
Summary and Info:
For reproducible builds we started reproducing packages using archlinux-repro (on Github.com/archlinux ) and discovered that some packages where not reproducible due to a difference in PKGINFO's size. [1] This issue seems to be caused by the size calculation in makepkg ie. du -ks --apparent-size reports a different size for files under btrfs compressed FS and for example ext4. [2] [1] https://gist.github.com/jelly/19edcbe1a9531b6890044adb9adc5152 [2]bug-coreutils@gnu.org/msg30689.html"> https://www.mail-archive.com/bug-coreutils@gnu.org/msg30689.html Steps to Reproduce: - Create two loopback fs's one ext4 and one btrfs - dd if=/dev/zero of=testbtrfs.img bs=1M count=200 - mkfs.btrfs testbtrfs.img - dd if=/dev/zero of=testfs.img bs=1M count=10 - mkfs.ext4 testfs.img - export SOURCE_DATE_EPOCH=1544997248 - mount it somewhere but with the same destdir - mount -t ext4 testfs.img /mnt/builddir - makepkg archlinux-keyring in /mnt/builddir - umount /mnt/builddir - mount -t btrfs -o loop,compress=zlib testbtrfs.img /mnt/builddir - makepkg archlinux-keyring - umount /mnt/builddir - diffoscope the result See that the size is different for both packages. Another testcase is against tmpfs. - mkdir ~/builddir - mount -t tmpfs -o size=1G tmpfs /home/jelle/builddir - makepkg in builddir - umount ~/builddir - makepkg in builddir - difffoscope result Notice that the size is also different! |
This task depends upon
Closed by Allan McRae (Allan)
Friday, 11 October 2019, 10:24 GMT
Reason for closing: Fixed
Additional comments about closing: git commit f26cb61c
Friday, 11 October 2019, 10:24 GMT
Reason for closing: Fixed
Additional comments about closing: git commit f26cb61c
It's most definitely gonna be slower than asking the fs metadata, but will be consistent.
https://git.archlinux.org/pacman.git/commit/scripts/makepkg.sh.in?id=241d6b884a3a6c883b6c61a3b175d17e7d317fc5
Instead of wc -c $file
you could also use
stat -c %s $file
strace shows, both use fstat to read file metadata and not the content itself.
It only differs for directories.
The remaining case is INODECMD, used in zipman.
Since we currently assume -c can format information most places (it works for GNU and busybox at a minimum), and we fallback to -f on BSD/Darwin, we can assume our existing assumptions stand strong. Can we use that instead?
Darn it.
It's also more reliable, apparently, to not try calculating the size of a directory entry.
du without --apparent-size has the problem of FS-dependent block sizes and thus rounding.
du, even with --apparent-size, has the problem of FS-dependent directory sizes. We can't tell it to ignore directories.
So using du is right out.
I would really like to avoid pushing all the file contents into a pipe (twice, if we count the actual archive creation).
I've tested
find . -type f -exec stat -c %s {} + | awk '{ x+=$1 } END { print x }'
against
find . -type f -exec cat {} + | wc -c
and both produce the same result for the reproduction case above, but the former is much faster.
Test case for speed was a directory with 10417 files totaling 335022335 bytes. (~30ms vs ~810ms)
Is `stat -c %s` really not portable enough?
Using POSIX-only commands like in that commit, one can construct a more complex pipeline:
find . -type f -exec sh -c 'wc -c "$@" | tail -n 1' sh {} + | awk '{ x+=$1 } END { print x }'
Not quite as fast, at ~100ms, but this code is getting unwieldy and I'd prefer the cat|wc again.