FS#69021 - Split CUDA package

Attached to Project: Community Packages
Opened by Oliver Mangold (omangold) - Friday, 18 December 2020, 07:12 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Sunday, 03 January 2021, 17:04 GMT
Task Type General Gripe
Category Packages
Status Closed
Assigned To Sven-Hendrik Haase (Svenstaro)
Felix Yan (felixonmars)
Konstantin Gizdov (kgizdov)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

CUDA meanwhile has become quite bloated, being by far the largest package on my system (1.9G compressed, 5G uncompressed). As some (probably many people) need only a small part of it, I think splitting the package makes sense. One might only need the runtime, e.g. for using Blender.

A rough breakdown of the current package looks like this:

shared libs: 1.7G
static libs: 1.4G
Nsight: 1.4G
compiler+everything else: 600M

Of the shared libs 1.6G go to cufft, curand, cusparse, cublas, cusolv and npp.

Thus a basic, usable environment could be as small as 700M.

My suggestion would be splitting it into 4 packages like this:

base package: everything not mentioned below
static libs: /opt/cuda/targets/x86_64-linux/lib/*.a
extra libs: /opt/cuda/targets/x86_64-linux/lib/lib{cufft*,curand*,cusparse*,cublas*,cusolv*,npp*}.so*.
nsight: /opt/cuda/nsight*

Additional info:
* cuda 11.2.0-1
This task depends upon

Closed by  Sven-Hendrik Haase (Svenstaro)
Sunday, 03 January 2021, 17:04 GMT
Reason for closing:  Implemented
Comment by Sven-Hendrik Haase (Svenstaro) - Sunday, 20 December 2020, 06:27 GMT
Good idea. Splitting out the nsight stuff is something I've been meaning to do. However, I'm not sure about the static libs currently as nvidia themselves don't appear to be cutting their packages that way (check for instance their Debian packages: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/). It would lead to some confused software packages that expect static and dynamic libs to be there but that might mostly be a packaging problem.
Comment by Sven-Hendrik Haase (Svenstaro) - Sunday, 20 December 2020, 07:48 GMT
So turns out that the zstd-compressed package with and without the nsight parts are about the same size (both roughly 1.9G). Cutting the static libs gets it down to 1.4G. All things considered, I think the disadvantages of more package complexity and more combined download size (the separate packages are much larger than the single package with all stuff contained) appear to outweigh the advantages of a little disk space saved. Curious to hear your thoughts.
Comment by Oliver Mangold (omangold) - Sunday, 20 December 2020, 16:40 GMT
Hmm. Surprised about the finding concerning package size, so I tried it myself.

To be honest, I couldn't figure out, how one can get down the size to 1.9G in the first place. If I recompress /opt/cuda, I couldn't find a setting of zstd to get it to less than 2.2G.

In any case, I tried splitting it using 'zstd -T0 -10' couldn't reproduce the overhead. I got this:

Full package: 2336694107 bytes
Full package without Nsight: 1729433205 bytes
Nsight: 614246029 bytes
Static libs: 606885854 bytes
Shared libs (cufft, curand, cusparse, cublas, cusolv, npp only): 752247283 bytes
Everything else: 372037424
Overhead due to splitting into the above 4 packages: 8722483 bytes ^= 0.37%

If you know what settings need to be used to get the original 1.9G, I can try with that again.
Comment by Sven-Hendrik Haase (Svenstaro) - Sunday, 20 December 2020, 17:46 GMT
You can check /usr/share/devtools/makepkg-x86_64.conf from the devtools package. It has our official options. In this case, COMPRESSZST=(zstd -c -T0 --ultra -20 -). Try it with that.
Comment by Oliver Mangold (omangold) - Monday, 21 December 2020, 09:12 GMT
Okay, now with the provided setting. I still don't get it down to 1.9G, but it helped quite a bit. I assume the difference is explained by the fact that the number of CPU cores is not the same and the order of the files in the archive likely also not. In any case the behavior didn't significantly change. No meaningful difference between sum of split package sizes and combined package.

Full package: 2075588523 bytes
Full package without Nsight: 1535375528 bytes
Nsight: 540129603 bytes
Static libs: 540310749 bytes
Shared libs (cufft, curand, cusparse, cublas, cusolv, npp only): 674055326 bytes
Everything else: 325596810
Overhead due to splitting into the above 4 packages: -4503965 (less than combined package)
Comment by Sven-Hendrik Haase (Svenstaro) - Monday, 28 December 2020, 04:48 GMT
Can you attach your PKGBUILD? I'll check it out. Also, please split it like this: -static, nsight and everything else into just "cuda".
Comment by Oliver Mangold (omangold) - Monday, 28 December 2020, 09:52 GMT
Hackish, but should work as a reproducer.
Comment by Sven-Hendrik Haase (Svenstaro) - Tuesday, 29 December 2020, 06:37 GMT
Alright, I like the way you cut it. I'm going to use that almost as is with some cleaning up.
Comment by Sven-Hendrik Haase (Svenstaro) - Tuesday, 29 December 2020, 07:42 GMT
I pushed the new split package to testing. Can you check it out and see whether everything is nice and clean and in working order?
Comment by Sven-Hendrik Haase (Svenstaro) - Wednesday, 30 December 2020, 05:53 GMT
Package seems good and moved to community.
Comment by Jakub Klinkovský (lahwaacz) - Wednesday, 30 December 2020, 19:50 GMT
  • Field changed: Percent Complete (100% → 0%)
Splitting static libs was probably not so good idea (or should be done differently). As of cuda-11.2.0-2, nvcc does not work with the default flags when cuda-static is missing:

$ nvcc hello_world.cu
/usr/bin/ld: cannot find -lcudadevrt
/usr/bin/ld: cannot find -lcudart_static
collect2: error: ld returned 1 exit status

That is because nvcc defaults to linking rt and devrt as static:

--cudart {none|shared|static} (-cudart)
Specify the type of CUDA runtime library to be used: no CUDA runtime library,
shared/dynamic CUDA runtime library, or static CUDA runtime library.
Allowed values for this option: 'none','shared','static'.
Default value: 'static'.

--cudadevrt {none|static} (-cudadevrt)
Specify the type of CUDA device runtime library to be used: no CUDA device
runtime library, or static CUDA device runtime library.
Allowed values for this option: 'none','static'.
Default value: 'static'.
Comment by Eli Schwartz (eschwartz) - Wednesday, 30 December 2020, 19:56 GMT
Is this something to be solved by e.g. moving nvcc to the static libs package? cudart could be solved by making it try to link shared by default, but cudadevrt does not have that option at all...

Maybe the main cuda package should contain the compiler and static libs, but depend on "cuda-runtime", or "cuda-libs" to mimic the boost/boost-libs split.
Comment by Jakub Klinkovský (lahwaacz) - Wednesday, 30 December 2020, 20:53 GMT
Moving nvcc would also cause moving other binaries (e.g. ptxas, cudafe++ and I don't know what else is needed) as well as some header files which nvcc includes "by default. I don't think that's a good material for a *-static or *-libs package.

The simplest solution might be moving the needed static libraries (libcudadevrt.a, libcudart_static.a) back to the main package.

If we want to provide a smaller package to satisfy dependencies of things that do not need the whole toolchain, it makes sense to split the shared libraries into "cuda-libs" (which would be 1.7G after installation). But this may not work in all cases either, because CUDA binaries can contain embedded PTX which can be JIT compiled by the CUDA runtime if the binary does not contain a pre-compiled code for the specific GPU architecture it was invoked on. In this case the runtime probably invokes ptxas and may crash if it is not found (I did not actually try this, Nvidia obviously does not support it).

Considering that there are 1.7G of shared libraries and 1.4G of static libraries in total, I'm thinking that it would make sense to split just the "extra" libraries - basically all libraries except libcudart, libcudart_static, libcudadevrt and potentially other things that may be needed by default. This would lead to a base "cuda" package providing the whole toolchain and runtime, but still having only around 600M. I think it does not matter if the remaining shared and static libraries are packaged separately or together.
Comment by Sven-Hendrik Haase (Svenstaro) - Thursday, 31 December 2020, 03:24 GMT
I thought about the boost/boost-lib approach but the additional complexity of the implicit tooling requirements in cuda has me thinking that it might be easier for me and users to have just two cuda packages in the end: cuda, cuda-tools. That still removes 627M (1470M installed) from cuda so it's still worthwhile. Thoughts?
Comment by Jakub Klinkovský (lahwaacz) - Thursday, 31 December 2020, 07:57 GMT
I agree, having cuda and cuda-tools is still better than nothing and splitting the rest may not be worth the effort. Random things might silently break when Nvidia changes something.

Loading...