Arch Linux

Please read this before reporting a bug:

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!

FS#75104 - [zstd] Arch build of zstd does not scale with thread count due to the build system it uses

Attached to Project: Arch Linux
Opened by Arvid Norlander (VorpalGun) - Saturday, 18 June 2022, 07:13 GMT
Last edited by Antonio Rojas (arojas) - Saturday, 18 June 2022, 16:16 GMT
Task Type Bug Report
Category Packages: Core
Status Assigned
Assigned To Jelle van der Waa (jelly)
Maxime Gauduin (Alucryd)
Levente Polyak (anthraxx)
Giancarlo Razzolini (grazzolini)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 7
Private No



After reading a reading a recent Phoronix benchmark (see reference at the bottom) I decided to
investigate why Arch Linux was so much slower (10-20x) for zstd performance. Here is what I
found, and hopefully this can help improve the performance in the future!

Zstd appears to have more than one build system supported by upstream. Relevant to Linux are:
* Plain Makefile (what Ubuntu uses to build)
* CMake (what Arch uses to build)
* Meson (not sure who uses this but I tested it for completeness)

It turns out that with the CMake and Meson build systems (without any options as well as the
options Arch uses) there is *negative* scaling between -T1 (one thread) and -T6 (the number
of cores my computer has).

However for the plain Makefile the expected positive scaling with number of threads exist.

I included the full performance analysis in the upstream bug report linked below, but depending
on the outcome of that bug, I suggest Arch might want to change which build system it uses.

Additional info:
* package version(s) 1.5.2-7
* link to upstream bug report:

Steps to benchmark:
* To benchmark use path/to/zstd/binary -T<num threads> -b4 <path to large file>
* For the large file I used the FreeBSD USB stick image, as this is what Phoronix uses. Phoronix
uses an older version, for which I could not find the download link, but the same general pattern
can be reproduced with the current version.

* (a bit down the page)
This task depends upon

Comment by Antonio Rojas (arojas) - Saturday, 18 June 2022, 08:51 GMT
I'm seeing the exact same (bad) behavior when compiling with plain make, which exact command did you use to compile it?
Comment by Antonio Rojas (arojas) - Saturday, 18 June 2022, 11:12 GMT
The extra slowness with cmake is caused by the -std=c99 flag

Removing it makes the cmake-build speed similar to the plain-make one (which is still faster with -T1 vs -T6 on my machine)
Comment by Arvid Norlander (VorpalGun) - Saturday, 18 June 2022, 11:47 GMT
Interesting and weird! Not a flag I expected would have that effect. I have been out of the C/C++ world for a couple of years (used to do it professionally), what is the default C and C++ versions these days?

I will confirm that information then append it to the upstream bug report.

I don't know why make would be slow for you though. I literally just downloaded the upstream release tarball and built it with make, no CFLAGS/CXXFLAGS set. When talking on IRC with another person who had a 10 core CPU they said it only scaled up to 7 threads for them and then started going down again, so there may be memory bandwidth and/or cache effects to take into account. Consider checking if you have a "peak" like that.
Comment by urawotlol (urawotlol) - Sunday, 19 June 2022, 01:08 GMT
I am the aforementioned "another person"

64GB DDR4 3200 MT/s dual channel 4 dimms
cpu pinned to 5GHz
gov set to performance
test run 2x first run discard for thermal reasons
test file archlinux-2022.04.01-x86_64.iso

zstd packages
core/zstd (cmake) v1.5.2
upstream zstd (make) v1.5.2 (built with makepkg can provide the pkgbuild if wanted)

see results file attached you can searchfor specific test results in the format of

Test script:
Cores=(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20)
Levels=(1 2 3 4 5 6 7 8)
Bins=(/usr/bin/zstd ~/git/zstd/pkg/zstd/usr/bin/zstd)
echo "Cores:Level:Bin"
for Level in $Levels
for Core in $Cores
for Bin in $Bins
echo "$Core:$Level:$Bin"
time $Bin -T$Core -b$Level archlinux-2022.04.01-x86_64.iso
echo ""

   results (67.6 KiB)
Comment by Jan Alexander Steffens (heftig) - Sunday, 19 June 2022, 04:43 GMT
This is just a measurement error in the zstd benchmark mode caused by missing timespec support and not actually a performance issue.
Comment by urawotlol (urawotlol) - Sunday, 19 June 2022, 07:08 GMT
it appears so
Comment by Firas Khalil Khana (firasuke) - Sunday, 19 June 2022, 13:27 GMT
What's the official upstream build system for `zstd` anyways? Why not use the plain `make` build system and remove the `cmake` (and meson/ninja) dependencies alltogether?
Comment by Michel Koss (MichelKoss1) - Sunday, 19 June 2022, 13:35 GMT
According to [1] upstream uses make and the rest are 3rd party contribs which contain many inconsistencies with upstream defaults[2]. Arch switched to cmake just two weeks ago and it only brought problems, first with static libs, now broken benchmarks.

If Arch maintainers doesn't want to constantly debug issues nobody else paying attention to then maybe they should switch back to something actually supported.

Comment by Jelle van der Waa (jelly) - Monday, 20 June 2022, 07:40 GMT
zstd's cmake files are required for us to use our packaged zstd in pcsx2 unless they can be generated differently or pcsx2 can use pkgconfig to find them.
Comment by Michel Koss (MichelKoss1) - Monday, 20 June 2022, 12:10 GMT
@jelly was any of those scenarios already tested? Currently pcsx2 package doesn't depend on zstd and its version is too old to contain system libs support added in
Comment by Arvid Norlander (VorpalGun) - Monday, 20 June 2022, 12:14 GMT
1. Is it possible to generate the required cmake scripts for linking with the library without using cmake for the build system? As I understand it, the required file fills a similar role to pkg-config, and should in theory be quite simple and lightweight.
2. Isn't pcsx2 not supporting pkg-config (if that is the case) worth filing an upstream feature request about? I know for sure cmake does have support for pkg-config (I have used that myself).