FS#70697 - [clang] Build with -fno-semantic-interposition and -Bsymbolic-function
Attached to Project:
Arch Linux
Opened by Ray Song (MaskRay) - Monday, 03 May 2021, 19:13 GMT
Last edited by Evangelos Foutras (foutrelis) - Sunday, 06 June 2021, 18:36 GMT
Opened by Ray Song (MaskRay) - Monday, 03 May 2021, 19:13 GMT
Last edited by Evangelos Foutras (foutrelis) - Sunday, 06 June 2021, 18:36 GMT
|
Details
[re-posted to
https://bugzilla.redhat.com/show_bug.cgi?id=1956484
; I am an Arch Linux user :)]
As I mentioned on "Re: Very slow clang kernel config .." (https://lore.kernel.org/lkml/20210501235549.vugtjeb7dmd5xell@google.com/), the llvm and clang builds can use -Wl,-Bsymbolic-functions to improve performance. The performance of the built clang will be comparable to a mostly statically linked (libLLVM*.a libclang*.a) PIE clang, as evidenced by my few runs building x86_64 defconfig 'vmlinux' (-j 40): # the compile flags may be very different from the clang builds below. system gcc; LLVM= 1050.15s user 192.96s system 3015% cpu 41.219 total 1055.47s user 196.51s system 3022% cpu 41.424 total clang (libLLVM*.a libclang*.a); LLVM=1 1588.35s user 193.02s system 3223% cpu 55.259 total 1613.59s user 193.22s system 3234% cpu 55.861 total clang (libLLVM.so libclang-cpp.so); LLVM=1 1870.07s user 222.86s system 3256% cpu 1:04.26 total 1863.26s user 220.59s system 3219% cpu 1:04.73 total 1877.79s user 223.98s system 3233% cpu 1:05.00 total 1859.32s user 221.96s system 3241% cpu 1:04.20 total clang (libLLVM.so libclang-cpp.so -fno-semantic-interposition); LLVM=1 1810.47s user 222.98s system 3288% cpu 1:01.83 total 1790.46s user 219.65s system 3227% cpu 1:02.27 total 1796.46s user 220.88s system 3139% cpu 1:04.25 total 1796.55s user 221.28s system 3215% cpu 1:02.75 total clang (libLLVM.so libclang-cpp.so -fno-semantic-interposition -Wl,-Bsymbolic); LLVM=1 1608.75s user 221.39s system 3192% cpu 57.333 total 1607.85s user 220.60s system 3205% cpu 57.042 total 1598.64s user 191.21s system 3208% cpu 55.778 total clang (libLLVM.so libclang-cpp.so -fno-semantic-interposition -Wl,-Bsymbolic-functions); LLVM=1 1617.35s user 220.54s system 3217% cpu 57.115 total -fno-semantic-interposition can avoid some GOT (on x86-64 the optimization is partially covered by R_X86_64_{REX_,}GOTPCRELX) (in clang, access to external linkage default visibility non-comdat definition in the same translation unit). As of gcc 11 and clang 12, this option is a no-op on non-x86, but compilers may improve for other architectures in the future. -Bsymbolic is less safe because it can break interaction with copy relocations (https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected). |
This task depends upon
Closed by Evangelos Foutras (foutrelis)
Sunday, 06 June 2021, 18:36 GMT
Reason for closing: Implemented
Additional comments about closing: llvm 12.0.0-1 / clang 12.0.0-1
Sunday, 06 June 2021, 18:36 GMT
Reason for closing: Implemented
Additional comments about closing: llvm 12.0.0-1 / clang 12.0.0-1
Fwiw my personal inclination is to drop the shared clang/LLVM libraries and static link as the only option.
A number of projects using LLVM explicitly allow only static link, due to the unstable C++ API et al.
Using static has the best performance, as seen with the numbers above - at the expense of:
- increased disk usage
Can be mitigated by building with LTO - eg. currently we package mesa with and shared LLVM. Plus I believe LTO is becoming the default for all packages.
- distribution (rebuild) annoyances
Seemingly a non-issue on Arch - most (all?) packages using shared LLVM are rebuild, even for bugfix LLVM releases.
Using static clang/llvm should reduce the llvm vs llvm10 vs llvm9 etc annoyances.
For example: `crystal` package is the only one using llvm10 and llvm10-libs.
With this example, the disk usage factor even goes in the opposite direction - the total usage is greater as-is, than if static llvm was used.
And no, static linking won't reduce "the llvm/llvm10/llvm9 annoyances" since the shared libraries are the only parts that don't conflict so developers would still be able to only have one installed and simple users don't have any of them installed both before and after this ticket.
I'm not sure what the unstable C++ ABI has to do with this as a distribution.
Crystal is not the only package using llvm10 I bet, not if you look at the AUR too. And just because it is the only one in the repos today, does not mean it will be the only one in the future.
Maybe upstream should add this as the default? Has anyone tried discussing it with the llvm development list?
FS#52021: reducing supported targets (13->4), results in llvm-libs 47.4MiB -> 31.7MiB, llvm 132.6MiB -> 95.1MiBCasual skim through the packages depending on `llvm-libs` shows 25 unique entries, split into two categories - GL/Vulkan/compute drivers and compilers.
Both of these are targets, where personally I would value speed over size.
eschwartz - sounds like I've missed the point, if space and rebuilds aren't the main issues. Any tips/pointers would be appreciated.
What isn't getting cheaper is network bandwidth, where the benefits of a shared framework downloaded once, are very helpful. Furthermore, this particular concern is NOT mitigated by "oh, the package is only used once, if you link it in statically it gets smaller" because, when you count download size, what matters is the combination of every repeated redownload, summed up. Downloading a (random numbers) 30mb package once and a 20mb package 3 times makes for 90mb. If you assume that static linking leaves you with one 40mb package (10mb or 33% savings), 3 downloads is 120mb. Actual numbers will differ greatly, but overall you'll tend to lose.
Note that every update to llvm static requires rebuilding everything that links to it in order to have the update take effect, so you also rebuild more often. In comparison, with dynamic linking you rebuild the applications whenever you normally do, but only rebuild it against the framework when the framework changes ABI/soname. (I do not know why llvm in particular might be having full ecosystem rebuilds on, what, patch releases of llvm? Backported one-liner regression fixes? I am not sure I even believe you.)
On rebuilds:
Static linking makes rebuilds more complex, not less:
- we need to rebuild more often, every time a framework is rebuilt we rebuild the ecosystem to statically link new code, vs. dynamically sharing improvements
- we use tools like sogrep or https://archlinux.org/packages/community/x86_64/crystal/sonames/ to determine what programs link to which library packages and need to be rebuilt on version updates, but we cannot track this for static linkage
On security:
For security updates to a library, enter full on panic mode where every statically linked program is also potentially vulnerable in addition to the library itself, and must be urgently rebuilt just in case, and listed as vulnerable-but-fixed-in-version-X in security bulletins. Figuring out which packages this is, is left as an exercise to the reader^W^W^W^W^W^W haha lolnope abandon hope all ye distro devs and despair. (Okay, given that llvm *specifically* is typically used by leaf packages you can probably just blindly rebuild everything on the invisible list of reverse makedepends, which is shown on archweb but not by pacman -Sii.)
In general -- no, there are in fact good reasons for dynamically linking @world and any exceptions had better be darned good ones, not just "it's the same as -fno-semantic-interposition -Wl,-Bsymbolic-functions but nebulous benefits and tradeoffs for size depending on how many other packages use it". It should generally be a last resort.
If we talk about compiler performance (which is what this bugreport is about) then static vs dynamic linking of clang/llvm itself is what matters and whether it makes sense to statically link something that depends on clang/llvm is matter for different discussion.
Completely missed the network bandwidth side of things, thanks.
As mentioned before - currently packages using the shared library get rebuild, even when the SONAME is unchanged.
Implicitly handling the security argument.
I believe the "Required By" list for llvm-libs and llvm, is of some indication about the current state of affairs - shared vs static.
And yes, the actual numbers are closer to 25 vs 60, as opposed to the 30 vs 113 in the archweb.
Fwiw do I share the sentiments (dependency tracking, rebuilds, security, bandwidth) but I think llvm fits in the exception category, since currently:
- a number of projects explicitly (or solely) support static llvm
- it brings notable performance gains for IMHO critical components - compilers, GL/Vulkan/Compute drivers
- all packages that use it (shared or static) are essentially rebuild
- all packages that use it are essentially leaf
Hope maintainers see merit in my arguments. Either way, I'll stop drawing attention :-)
upgpkg: llvm 11.1.0-1: new upstream release (Feb. 17)
upgpkg: llvm 11.0.1-2: fix shader compilation in blender (Feb. 02)
blender did not get rebuilt at this time :p it did get rebuilt 2 weeks later for cuda though, and the day after the llvm minor release blender got rebuilt for openimagedenoise.
So anecdotally, but without actual analysis, llvm updates do NOT result in dynamically linked reverse dependencies being rebuilt. Not even clang, seemingly, except inasmuch as both llvm/clang seem to get new tagged versions at the same time (clang didn't get rebuilt for the blender-specific patch with updated pkgrel).
...
Anyway, clang is one user of llvm, and I don't see particular justification for statically linking this one user if the LDFLAGS changes suffice to help clang performance.
As for the ticket at hand: @MaskRay, thank you for the suggestions; would it be possible to bring them to upstream's attention? While they do seem safe, I would feel more comfortable to incorporate them after upstream has approved of them and hopefully integrated them into their CMake build config.
Note to LD_PRELOAD: -fno-semantic-interposition is more tricky in clang. It generally doesn't work as you might expect. Clang's longstanding behavior enables interprocedural optimizations (including inlining) on such default visibility external linkage definitions in the same translation unit. My patch has more context on this.
-fno-semantic-interposition + -Bsymbolic-functions: the performance is comparable with libLLVM*.a libclang*.a .
Shared objects aren't that bad.
This task can be closed.
As such, there's no need to wait for a new stable release upstream. We can add the options locally as a non-trivial performance benefit to users today.
foutrelis, over to you. :D
@MaskRay: Thanks for implementing this upstream. 🐈
@MaskRay: Thanks again for getting these changes upstream.