FS#77345 - .MTREE file generated in repo packages have wrong checksums for python single-newline files

Attached to Project: Arch Linux
Opened by Mike Kazantsev (mk-fg) - Wednesday, 01 February 2023, 00:14 GMT
Last edited by Toolybird (Toolybird) - Saturday, 27 May 2023, 03:00 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Allan McRae (Allan)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

.MTREE file in packages like python-pygments and btrfs-progs (and maybe others) has incorrect md5/sha256 checksums for single-newline files.

For example, after unpacking python-pygments-2.14.0-1-any.pkg.tar.zst (sha256 e6cfa1010d0e65ca5446ba60488c5d03f5bbd06ec60874d0a6354ce12c38b29a ) into an empty dir:

% tar -xf /var/cache/pacman/pkg/python-pygments-2.14.0-1-any.pkg.tar.zst

% zgrep -e dependency_links.txt -e not-zip-safe .MTREE

./usr/lib/python3.10/site-packages/Pygments-2.14.0-py3.10.egg-info/dependency_links.txt time=1672687238.0 size=1 md5digest=93b885adfe0da089cdf634904fd59f71 sha256digest=6e340b9cffb37a989ca544e6bb780a2c78901d3fb33738768511a30617afa01d
./usr/lib/python3.10/site-packages/Pygments-2.14.0-py3.10.egg-info/not-zip-safe time=1672687238.0 size=1 md5digest=93b885adfe0da089cdf634904fd59f71 sha256digest=6e340b9cffb37a989ca544e6bb780a2c78901d3fb33738768511a30617afa01d

% bsdtar -c --format=mtree --options=mtree:md5,mtree:sha256 usr | grep -e dependency_links.txt -e not-zip-safe

./usr/lib/python3.10/site-packages/Pygments-2.14.0-py3.10.egg-info/dependency_links.txt gname=fraggod uname=fraggod time=1672687238.0 mode=644 gid=1000 uid=1000 type=file size=1 md5digest=68b329da9893e34099c7d8ad5cb9c940 sha256digest=01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
./usr/lib/python3.10/site-packages/Pygments-2.14.0-py3.10.egg-info/not-zip-safe gname=fraggod uname=fraggod time=1672687238.0 mode=644 gid=1000 uid=1000 type=file size=1 md5digest=68b329da9893e34099c7d8ad5cb9c940 sha256digest=01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b

% find | grep -e dependency_links.txt -e not-zip-safe | xargs md5sum

68b329da9893e34099c7d8ad5cb9c940 ./usr/lib/python3.10/site-packages/Pygments-2.14.0-py3.10.egg-info/dependency_links.txt
68b329da9893e34099c7d8ad5cb9c940 ./usr/lib/python3.10/site-packages/Pygments-2.14.0-py3.10.egg-info/not-zip-safe

% find | grep -e dependency_links.txt -e not-zip-safe | xargs sha256sum

01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b ./usr/lib/python3.10/site-packages/Pygments-2.14.0-py3.10.egg-info/dependency_links.txt
01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b ./usr/lib/python3.10/site-packages/Pygments-2.14.0-py3.10.egg-info/not-zip-safe


So it looks like something is either changing these files after .MTREE file for the package is generated in the current build process for official Arch Linux packages, or something is changing that .MTREE file after it was generated, or maybe whatever is generating it (libarchive/bsdtar) has some kind of bug, producing incorrect hashes.
Another option is that pacman, bsdtar and GNU tar all fail to extract those files properly from the package archive, which seems unlikely.

As far as I can tell, package(s) in question that I have here are from official mirrors and pass all normal signature checks (see sha256 of the .pkg.tar.zst above to confirm).
Should be easily reproducible anywhere using simple commands above.

On my system, python-pygments-2.14.0-1-any.pkg.tar.zst and btrfs-progs-6.1.2-1-x86_64.pkg.tar.zst are only two installed packages that are affected, as spotted by a simple script running from crontab ( https://github.com/mk-fg/fgtk/blob/master/arch/pacman-fsck ).

Thanks.
This task depends upon

Closed by  Toolybird (Toolybird)
Saturday, 27 May 2023, 03:00 GMT
Reason for closing:  Fixed
Additional comments about closing:  See comments
Comment by Allan McRae (Allan) - Wednesday, 01 February 2023, 00:27 GMT
$ pacman -Qkk btrfs-progs
warning: btrfs-progs: /usr/lib/python3.10/site-packages/btrfsutil-6.1.2-py3.10.egg-info/dependency_links.txt (MD5 checksum mismatch)
warning: btrfs-progs: /usr/lib/python3.10/site-packages/btrfsutil-6.1.2-py3.10.egg-info/dependency_links.txt (SHA256 checksum mismatch)
btrfs-progs: 89 total files, 1 altered file

Confirmed. Will look into this from the pacman end, though I suspect it is a libarchive edge case.
Comment by gamezelda (gamezelda) - Wednesday, 01 February 2023, 00:48 GMT
Hi, this is very likely due to this Btrfs kernel bug: https://lore.kernel.org/linux-btrfs/20221223020509.457113-1-joanbrugueram%40gmail.com/ which affected from Linux 6.1.0 to Linux 6.1.4 inclusive, plus some -rc's.
In fact I found this bug when building some custom Python packages but now it seems to have propagated to the distro.

The problem summary is that when building a package on a Btrfs filesystem using an affected Kernel, when generating the .MTREE, tar (libarchive) will generate an invalid checksum for 1-byte files because Btrfs returns bogus information related to sparseness.
The reason this affects mostly Python packages is that many of them have a `dependency_links.txt` or `not-zip-safe` containing a single newline, triggering the problem. Though it can affect any package containing a 1-byte file.

To fix this it should be enough to rebuild the affected packages without changes on a fixed system.
Though I'm not sure if it's worth fixing since aside from the bad MTREE checksums everything else is fine and packages should be correctly rebuilt eventually.

List of affected packages I found by checking all python-* packages:
python-aenum-3.1.11-2
python-bluepy-1.3.0-6
python-cloudpickle-2.2.0-2
python-github3py-3.2.0-1
python-html2text-2020.1.16-6
python-inotify-simple-1.3.5-1
python-markups-3.1.3-2
python-matplotlib-3.6.2-2
python-nbval-0.10.0-1
python-pbr-5.11.1-1
python-perf-2.3.1-2
python-pooch-1.6.0-3
python-pook-1.1.1-1
python-proxmoxer-2.0.1-1
python-pyblake2-1.1.2-7
python-pyfakefs-5.1.0-1
python-pygame-sdl2-1:2.1.0.r419.de82dfb-1
python-pygments-2.14.0-1
python-shellingham-1.5.0.post1-1
python-sphinxext-opengraph-0.7.5-1
python-subunit-1.4.2-1
python-threat9-test-bed-0.6.0+2+g1ed61b3-7
python-torchvision-0.14.1-1
python-torchvision-cuda-0.14.1-1
python-trio-asyncio-0.12.0-5
python-wilderness-0.1.9-1
python-zope-component-5.1.0-1

Procedure used to generate the above list:
# Get links to all python-* Arch packages
curl -s "https://mirror.cmt.de/archlinux/core/os/x86_64/" | grep -oP 'href="\Kpython-[^"]*.zst(?=")' | sed 's|^|https://mirror.cmt.de/archlinux/core/os/x86_64/|' > all_python_links
curl -s "https://mirror.cmt.de/archlinux/extra/os/x86_64/" | grep -oP 'href="\Kpython-[^"]*.zst(?=")' | sed 's|^|https://mirror.cmt.de/archlinux/extra/os/x86_64/|' >> all_python_links
curl -s "https://mirror.cmt.de/archlinux/community/os/x86_64/" | grep -oP 'href="\Kpython-[^"]*.zst(?=")' | sed 's|^|https://mirror.cmt.de/archlinux/community/os/x86_64/|' >> all_python_links
# Download them all (about 2.5GB)
mkdir all_python && cd all_python
wget -i ../all_python_links
# Look for the bad checksum pattern (corresponding to the hash of an 1-byte file containing zeros)
for f in *; do tar --force-local -xOf "$f" .MTREE | zcat | grep -qE "md5digest=93b885adfe0da089cdf634904fd59f71" && echo "$f"; done

Someone with a copy of all Arch packages (I don't want to kill the mirrors) should be able to get a list of all packages using the last command.
Comment by Allan McRae (Allan) - Wednesday, 01 February 2023, 02:00 GMT
I can run this over all Arch packages and make a TODO list.
Comment by Allan McRae (Allan) - Wednesday, 01 February 2023, 03:44 GMT
BAD /srv/ftp/pool/packages/389-ds-base-2.3.1-3-x86_64.pkg.tar.zst
BAD /srv/ftp/pool/packages/gi-docgen-2023.1-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/bettercap-caplets-v20210412.r372.2d58298-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/borg-1.2.3-1-x86_64.pkg.tar.zst
BAD /srv/ftp/pool/community/img2pdf-0.4.4-2-any.pkg.tar.zst
BAD /srv/ftp/pool/community/pybind11-2.10.3-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-aenum-3.1.11-2-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-bluepy-1.3.0-6-x86_64.pkg.tar.zst
BAD /srv/ftp/pool/community/python-cloudpickle-2.2.0-2-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-github3py-3.2.0-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-html2text-2020.1.16-6-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-inotify-simple-1.3.5-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-markups-3.1.3-2-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-matplotlib-3.6.2-2-x86_64.pkg.tar.zst
BAD /srv/ftp/pool/community/python-nbval-0.10.0-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-pbr-5.11.1-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-perf-2.3.1-2-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-pooch-1.6.0-3-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-pook-1.1.1-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-proxmoxer-2.0.1-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-pyblake2-1.1.2-7-x86_64.pkg.tar.zst
BAD /srv/ftp/pool/community/python-pyfakefs-5.1.0-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-pygame-sdl2-1:2.1.0.r419.de82dfb-1-x86_64.pkg.tar.zst
BAD /srv/ftp/pool/community/python-pygments-2.14.0-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-shellingham-1.5.0.post1-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-sphinxext-opengraph-0.7.5-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-subunit-1.4.2-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-threat9-test-bed-0.6.0+2+g1ed61b3-7-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-torchvision-0.14.1-1-x86_64.pkg.tar.zst
BAD /srv/ftp/pool/community/python-torchvision-cuda-0.14.1-1-x86_64.pkg.tar.zst
BAD /srv/ftp/pool/community/python-trio-asyncio-0.12.0-5-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-wilderness-0.1.9-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/python-zope-component-5.1.0-1-any.pkg.tar.zst
BAD /srv/ftp/pool/community/qspectrumanalyzer-2.2.0-6-any.pkg.tar.zst
BAD /srv/ftp/pool/community/s-tui-1.1.4-2-any.pkg.tar.zst
BAD /srv/ftp/pool/community/soundconverter-4.0.3-2-any.pkg.tar.zst
BAD /srv/ftp/pool/community/speedtest-cli-2.1.3-3-any.pkg.tar.zst
Comment by Allan McRae (Allan) - Wednesday, 01 February 2023, 10:10 GMT
Closing - most have been fixed and the rest are tracked in the todo list:
https://archlinux.org/todo/invalid-mtree-files/

Loading...