FS#69021 - Split CUDA package
Attached to Project:
Community Packages
Opened by Oliver Mangold (omangold) - Friday, 18 December 2020, 07:12 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Sunday, 03 January 2021, 17:04 GMT
Opened by Oliver Mangold (omangold) - Friday, 18 December 2020, 07:12 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Sunday, 03 January 2021, 17:04 GMT
|
Details
Description:
CUDA meanwhile has become quite bloated, being by far the largest package on my system (1.9G compressed, 5G uncompressed). As some (probably many people) need only a small part of it, I think splitting the package makes sense. One might only need the runtime, e.g. for using Blender. A rough breakdown of the current package looks like this: shared libs: 1.7G static libs: 1.4G Nsight: 1.4G compiler+everything else: 600M Of the shared libs 1.6G go to cufft, curand, cusparse, cublas, cusolv and npp. Thus a basic, usable environment could be as small as 700M. My suggestion would be splitting it into 4 packages like this: base package: everything not mentioned below static libs: /opt/cuda/targets/x86_64-linux/lib/*.a extra libs: /opt/cuda/targets/x86_64-linux/lib/lib{cufft*,curand*,cusparse*,cublas*,cusolv*,npp*}.so*. nsight: /opt/cuda/nsight* Additional info: * cuda 11.2.0-1 |
This task depends upon
Closed by Sven-Hendrik Haase (Svenstaro)
Sunday, 03 January 2021, 17:04 GMT
Reason for closing: Implemented
Sunday, 03 January 2021, 17:04 GMT
Reason for closing: Implemented
To be honest, I couldn't figure out, how one can get down the size to 1.9G in the first place. If I recompress /opt/cuda, I couldn't find a setting of zstd to get it to less than 2.2G.
In any case, I tried splitting it using 'zstd -T0 -10' couldn't reproduce the overhead. I got this:
Full package: 2336694107 bytes
Full package without Nsight: 1729433205 bytes
Nsight: 614246029 bytes
Static libs: 606885854 bytes
Shared libs (cufft, curand, cusparse, cublas, cusolv, npp only): 752247283 bytes
Everything else: 372037424
Overhead due to splitting into the above 4 packages: 8722483 bytes ^= 0.37%
If you know what settings need to be used to get the original 1.9G, I can try with that again.
Full package: 2075588523 bytes
Full package without Nsight: 1535375528 bytes
Nsight: 540129603 bytes
Static libs: 540310749 bytes
Shared libs (cufft, curand, cusparse, cublas, cusolv, npp only): 674055326 bytes
Everything else: 325596810
Overhead due to splitting into the above 4 packages: -4503965 (less than combined package)
$ nvcc hello_world.cu
/usr/bin/ld: cannot find -lcudadevrt
/usr/bin/ld: cannot find -lcudart_static
collect2: error: ld returned 1 exit status
That is because nvcc defaults to linking rt and devrt as static:
--cudart {none|shared|static} (-cudart)
Specify the type of CUDA runtime library to be used: no CUDA runtime library,
shared/dynamic CUDA runtime library, or static CUDA runtime library.
Allowed values for this option: 'none','shared','static'.
Default value: 'static'.
--cudadevrt {none|static} (-cudadevrt)
Specify the type of CUDA device runtime library to be used: no CUDA device
runtime library, or static CUDA device runtime library.
Allowed values for this option: 'none','static'.
Default value: 'static'.
Maybe the main cuda package should contain the compiler and static libs, but depend on "cuda-runtime", or "cuda-libs" to mimic the boost/boost-libs split.
The simplest solution might be moving the needed static libraries (libcudadevrt.a, libcudart_static.a) back to the main package.
If we want to provide a smaller package to satisfy dependencies of things that do not need the whole toolchain, it makes sense to split the shared libraries into "cuda-libs" (which would be 1.7G after installation). But this may not work in all cases either, because CUDA binaries can contain embedded PTX which can be JIT compiled by the CUDA runtime if the binary does not contain a pre-compiled code for the specific GPU architecture it was invoked on. In this case the runtime probably invokes ptxas and may crash if it is not found (I did not actually try this, Nvidia obviously does not support it).
Considering that there are 1.7G of shared libraries and 1.4G of static libraries in total, I'm thinking that it would make sense to split just the "extra" libraries - basically all libraries except libcudart, libcudart_static, libcudadevrt and potentially other things that may be needed by default. This would lead to a base "cuda" package providing the whole toolchain and runtime, but still having only around 600M. I think it does not matter if the remaining shared and static libraries are packaged separately or together.