FS#61785 - [pacman] Allow makepkg to preserve source timestamps

Attached to Project: Pacman
Opened by John Lindgren (jlindgren) - Saturday, 16 February 2019, 18:47 GMT
Last edited by Allan McRae (Allan) - Saturday, 16 February 2019, 22:43 GMT
Task Type Feature Request
Category Packages: Core
Status Unconfirmed
Assigned To No-one
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 1
Private No

Details

Description:

makepkg currently rewrites all timestamps in the built package to $SOURCE_DATE_EPOCH if set, otherwise to the current date and time.

This behavior was added in 2017 in an attempt to support reproducible builds:
https://git.archlinux.org/pacman.git/commit/?id=d30878763ce1b5be453b563f2729d7333242e79b

This seems heavy-handed and leads to unnecessary rebuilds when makepkg/pacman is used to install locally-built packages, as discussed here:
https://bbs.archlinux.org/viewtopic.php?id=244252

The approach recommended by reproducible-builds.org (which seems to be largely a Debian initiative) recommends a more subtle approach:

"One can reasonably assume that all source timestamps are before SOURCE_DATE_EPOCH and all builds take place after it. This means we can efficiently both preserve source-based timestamps and omit build-specific timestamps, by rewriting timestamps more recent than SOURCE_DATE_EPOCH back to the latter. See for example the --clamp-mtime option to GNU tar."

See: https://reproducible-builds.org/specs/source-date-epoch/

Additional info:
* package version(s)

pacman 5.1.2-2

Steps to reproduce:

Build any package (see the forum post linked earlier for a minimal test-case).
This task depends upon

Comment by John Lindgren (jlindgren) - Saturday, 16 February 2019, 18:50 GMT
A partial patch (this doesn't address all source files being touched if SOURCE_DATE_EPOCH is set):

--- makepkg.sh.in.0 2018-12-25 05:22:03.000000000 -0500
+++ makepkg.sh.in 2019-02-16 13:18:17.303427748 -0500
@@ -756,8 +756,15 @@
[[ -f $pkg_file ]] && rm -f "$pkg_file"
[[ -f $pkg_file.sig ]] && rm -f "$pkg_file.sig"

- # ensure all elements of the package have the same mtime
- find . -exec touch -h -d @$SOURCE_DATE_EPOCH {} +
+ # set all build-specific timestamps to the same time
+ # see: https://reproducible-builds.org/specs/source-date-epoch/
+ # "One can reasonably assume that all source timestamps are before
+ # SOURCE_DATE_EPOCH and all builds take place after it. This means
+ # we can efficiently both preserve source-based timestamps and
+ # omit build-specific timestamps, by rewriting timestamps more
+ # recent than SOURCE_DATE_EPOCH back to the latter."
+ touch -d @$SOURCE_DATE_EPOCH .PKGINFO
+ find . -newer .PKGINFO -exec touch -h -d @$SOURCE_DATE_EPOCH {} +

msg2 "$(gettext "Generating .MTREE file...")"
list_package_files | LANG=C bsdtar -cnf - --format=mtree \
Comment by Allan McRae (Allan) - Saturday, 16 February 2019, 23:13 GMT
Doesn't the act of copying the files from the source tree into the package directory adjust the timestamps? I don't see many uses of "install -p" to preserve them.
Comment by John Lindgren (jlindgren) - Saturday, 16 February 2019, 23:16 GMT
I suppose it depends on the package. The package I was dealing with when I noticed this behavior (audacious) does use "install -p".
Comment by Eli Schwartz (eschwartz) - Sunday, 17 February 2019, 01:42 GMT
In addition to not actually working when build systems use install without -p, this completely breaks the reproducibility of packages that use git or other VCS sources, so thoroughly that their reproducibility can never recover. The source file timestamp is completely nondeterministic and based on whenever the most recent git pull had to update the file. In a subset of situations, the git clone will be perhaps a minute or two more recent than SOURCE_DATE_EPOCH, which means it will be reset to SOURCE_DATE_EPOCH either by your proposal or by the final packaging run.

makepkg --nobuild && makepkg --noextract is one case that would totally break, which is expected to be reproducible.

The code we use to unify source file timestamps when $SOURCE_DATE_EPOCH is present in the build environment is necessary for some build systems (like python), and some source=() types (any VCS), and will clobber your timestamps even if we tried to guess at some too-clever timestamp to use in creating the archive.

...

By the way, I'm a member of the reproducible builds organization. I disagree with the Debian initiative -- fortunately that's okay, reproducible-builds.org does not mandate it, merely mentions that it exists and some people think it is useful, so I'm allowed to disagree.

I do not know what Debian tooling does with regard to git sources, so that issue may not affect them. IIUC, Debian still basically depends on packages for the official archives being built using their generated orig.tar.xz sources with stable timestamps.
Comment by Eli Schwartz (eschwartz) - Sunday, 17 February 2019, 01:44 GMT
What packages need this exactly -- why are you rebuilding the same version of a package multiple times? If it is not the same version, then what timestamps do you expect to be preserved?
Comment by John Lindgren (jlindgren) - Sunday, 17 February 2019, 04:04 GMT
In my ideal world, Git and other VCS's would track and control timestamps as reliably as they control file contents. And then source files (e.g. library headers) could be installed, reproducibly, with the same timestamp as they had in VCS. But since we don't live in that world, I can see the benefit of being able to "fix" the problem of random source timestamps by setting them all to $SOURCE_DATE_EPOCH.

> What packages need this exactly -- why are you rebuilding the same version of a package multiple times?

audacious + audacious-plugins, and I'm rebuilding them multiple times because I'm one of the upstream developers. audacious-plugins depends on the public headers installed by audacious. When I make a small change to audacious that doesn't affect those public headers, I want to be able to do a quick delta rebuild of both packages without triggering a recompile of every single source file in audacious-plugins.

I'm sure my use case is a niche one, so I won't be offended if this feature request is rejected. It was just an idea that would make my particular workflow easier, and I may be able to come up with other ways to solve the problem (like building audacious-plugins against the local/uninstalled headers, whose timestamps should be more stable).
Comment by Eli Schwartz (eschwartz) - Sunday, 17 February 2019, 04:59 GMT
Or alternatively if you know the public interface has not changed, you could set SOURCE_DATE_EPOCH for the package. But maybe mtime is not quite so great as you thought: https://apenwarr.ca/log/20181113
Some build systems try to be a little more accurate about this.

Actually I'd probably use ccache, which should return cached compilation artifacts whenever the inputs were identical, regardless of timestamps.

> my ideal world, Git and other VCS's would track and control timestamps as reliably as they control file contents. And then source files (e.g. library headers) could be installed, reproducibly, with the same timestamp as they had in VCS.

subversion has this feature: https://stackoverflow.com/a/2172067/9969680
Comment by John Lindgren (jlindgren) - Monday, 18 February 2019, 15:43 GMT
The ccache suggestion is a good idea, thanks. I hadn't thought of that, but someone else mentioned it on the forum as well.

I ended up finding a different solution. Please feel free to close this feature request.

Loading...