FS#70954 - [gcc] [toolchain] build order / reproducibility issue
Attached to Project:
Arch Linux
Opened by Toolybird (Toolybird) - Thursday, 20 May 2021, 02:32 GMT
Last edited by Toolybird (Toolybird) - Monday, 09 October 2023, 21:23 GMT
Opened by Toolybird (Toolybird) - Thursday, 20 May 2021, 02:32 GMT
Last edited by Toolybird (Toolybird) - Monday, 09 October 2023, 21:23 GMT
|
Details
(I'm sure "the powers that be" are fully aware of any issues
raised here, but it still might be good to have this
documented somewhere. Feel free to shut this down if not
appropriate for the bug tracker.)
Whenever Arch upgrades the toolchain with a major new GCC version (e.g. GCC-10 -> GCC11), a situation arises where the toolchain becomes (kind of) unreproducible. This is evidenced by (as of this writing) new entries appearing on the the Arch Repro Status Page[1] for toolchain components (in particular; binutils, gcc and gcc-libs). Part of the cause is quite obvious when considering the current toolchain build order. Glibc startfiles compiled with the previous GCC are linked into the final binaries for both binutils and gcc. For example (as of this writing): $ strings /usr/bin/ld | grep GCC: GCC: (GNU) 10.2.0 GCC: (GNU) 11.1.0 $ strings /usr/bin/gcc | grep GCC: GCC: (GNU) 10.2.0 GCC: (GNU) 11.1.0 Code from the previous toolchain is leaking into the current. This will of course sort itself out as toolchain components get rebuilt for minor revisions. But it would be nice if everything "just worked" from the get-go. One way to fix this would be a slight tweak to the current toolchain build order. The status quo has clearly served Arch well over the years (albeit with this tiny flaw) so this is merely a suggestion from the peanut gallery :) Current: linux-api-headers->glibc->binutils->gcc->binutils->glibc Proposed: linux-api-headers->glibc->binutils->gcc->glibc->binutils->gcc Because GCC is such a beast to compile, it would make sense (and is perfectly acceptable IMHO) for the first GCC to be compiled with `--disable-bootstrap' (in fact I would advocate for the both GCC's to be non-bootstrapped, but that's a separate, possibly controversial, topic for another day). Part of my thinking is based on experience building cross toolchains. Sidenote: there is a python script in the glibc sources `build-many-glibcs.py' which IMHO represents state-of-the-art methodology for building cross toolchains. If you haven't already, check it out, it's awesome! Anyway, just throwing it out there for comment. [1]: https://reproducible.archlinux.org/ |
This task depends upon
Bingo! Because that's *exactly* what happens now in the Arch toolchain bootstrap. There are intermediate packages along the way.
There's got to be a way to express these dependencies properly in terms of the package manager (e.g.: glibc-first, binutils-stage1 etc.) but so far I am yet to come up with a clean solution.
A while ago Allan pointed me to LFS for more details about the sequence, yet they recommend/use what Toolybird is proposing just above.
I dug around a bit and filed an upstream bug report[1].
A fix has been proposed[2].
Until the fix is committed, the patch can be grabbed from here[3].
[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101383
[2]: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574802.html
[3]:p17p7o-28o1-271o-6950-42oq6rnrs42@fhfr.qr/"> https://patchwork.ozlabs.org/project/gcc/patch/p17p7o-28o1-271o-6950-42oq6rnrs42@fhfr.qr/
sorry about the last link, flyspray has mangled it..
Filed an upstream bug report[1] but no response so far. Folks interested in repro should take a look.
[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101407
Still mulling on how to fix the "bootstrap whole Arch toolchain reproducibly" issue...
After thinking about this for some time, I have some ideas on how to improve the process. But the only way I can see this working is if we have an official bootstrap script. Is this on the radar? It could maybe live in devtools?
We currently build:
linux-api-headers once
binutils twice
glibc twice
gcc (fat) twice (but actually 6 times !! due to 3-stage bootstrap)
If we borrow some ideas from cross compilation procedures, we could trim this down to:
linux-api-headers once
binutils twice
glibc once
gcc (thin) once
gcc (fat) once (but actually 3 times due to 3-stage bootstrap)
I'm not suggesting we do any actual cross compilation (although, I have experimented with this and proved that a cross compiled glibc can be byte-for-byte identical to a native compiled one). BTW, I previously mentioned `build-many-glibcs.py'. I've put some introductory usage notes up here [1] for anyone who would like to dabble. Studying the sequence and the log files produced is an excellent way to learn about the inner workings of toolchains IMHO.
The reason for the bootstrap script would be to employ an ENV VAR in the gcc PKGBUILD. For example:
(pseudo)
if ARCH_BOOTSTRAP
do thin gcc
else
do full fat gcc
I know this kind of thing is normally frowned upon in PKGBUILDS, but a special case like this might be acceptable?
Any thoughts? I'm working on a proof of concept..
[1] https://gitlab.com/-/snippets/2250210
We build packages in clean chroots, so passing environmental flags is not straightforward. I was thinking of having a PKGBUILD.pass1 and PKGBUILD.pass2 in my build directly, and have a buildscript symlink it to PKGBUILD as needed. Not ideal, as you have code duplication across PKGBUILDs, but not seeing a great solution here.
Clean chroot builds shouldn't be a problem, because the env var won't be set. i.e., conditional code will be bypassed. It's only when the Arch toolchain maintainer runs the bootstrap script that the env var will take effect. Allow me to demonstrate by stealing your build script posted in the forum :) This is just an example:
build linux-api-headers
build glibc --nocheck
build binutils --nocheck
export ARCH_BOOTSTRAP=1
build gcc --nocheck
unset ARCH_BOOTSTRAP
build glibc
build binutils
build gcc
I've already tried the PKGBUILD.pass1 approach but just couldn't stomach it.
Do we enable LTO/PGO for anything but the final build? If so what does it bring us - both in terms of performance and build times?
https://gist.github.com/0849a33d8bdcb081f64274e3c6fa31f0
But we don't know for sure because the Arch Reproducible Status page [1] is currently horked WRT GCC.
"fatal: unable to access 'https://github.com/archlinux/svntogit-packages.git/': Could not resolve host: github.com"
(Could someone please fix the Arch rebuilderd instance to get it working again for GCC? Thanks!)
PGO was previously rejected here [2] and here [3]. What is different now? GCC is arguably *the most important* package that needs to be reproducible. If it pans out that PGO causes GCC to be unreproducible, then I vote we get rid of it. Any thoughts?
[1] https://reproducible.archlinux.org/
[2]
FS#49129[3]
FS#56856It's a tough one because we apparently sacrifice compiler performance for reproducibility. Which is the more important goal? I guess that's a decision for Arch leadership. It appears other distros value the former. Fedora don't seem to care much about repro. Debian do, but their site is a nightmare to navigate. Judging from above link, openSUSE seem to say their GCC is reproducible, but then they ship a profiled GCC in production?
The other interesting thing in that link is the bit about "deterministic filesystem readdir order" which is something I hadn't considered before. It might possibly be a factor in another recent GCC bug I reported upstream [2]. Great, another rabbit hole for me to go down! Anyway, it appears GCC devs do try to improve the reproducibility of profiled builds from time-to-time which can only be good.
BTW, with bootstrap instead of profiledbootstrap, my GCC build time (--nocheck) goes from 2h:23m down to 1h:51m (Ryzen 2700X, -j16). Without LTO it goes down to about 45 mins IIRC. I need a faster build box :)
[1] https://lists.reproducible-builds.org/pipermail/rb-general/2022-February/002478.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104832
It turns out that passing env vars through to the the build environment is fully supported [1]. I've finally started a new toolchain repo [2] where you can see this in action.
When bootstrapping a full toolchain, there is no absolutely no need in the *first passes* for LTO, PGO, debug pkgs or in the case of GCC, libgccjit.
Just omitting these simple things can speed up the build quite a lot. For example, on a beefy cloud VM with plenty of cores, a full bootstrap cycle (including test suites) of current Arch toolchain takes about 2h:52m:56s. With tweaks as per my repo, it goes down to 2h:21m:35s. That's a fair saving for very little effort. There is *heaps* more low hanging fruit to optimize this a lot further.
Pros: faster toolchain builds
Cons:
1. ((_ARCH_BOOTSTRAP)) && "do this or that"
sprinkled throughout the toolchain PKGBUILDs
2. a toolchain build script is mandatory when bootstrapping
I've taken Allan's build script and added stuff for my own purposes. If this idea ever becomes official then an Arch guru could conceivably polish it up for inclusion in devtools.
Regarding reproducibility, I'm still struggling with gccgo / libgo. There is also a crazy binutils issue [3] apparently exposed by the non-PIC libiberty.a oopsie. We could *really* do with a new binutils upload (hint, hint, hi freswa!)
[1] https://bbs.archlinux.org/viewtopic.php?pid=1474035#p1474035
[2] https://gitlab.com/Toolybird/toolchain
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=29042
Precisely what I meant earlier with:
> Do we need to* enable LTO/PGO for anything but the final build?
Glad to see it shaved ~20% of the runtime. Out of curiosity - any reason why you didn't short-circuit the check functions as well?
The script does
build linux-api-headers
build glibc --nocheck
build binutils --nocheck _ARCH_BOOTSTRAP=1
build gcc --nocheck _ARCH_BOOTSTRAP=1