FS#75532 - [python-tensorflow-opt-cuda] Package not built with compute capability 5.0 and higher
Attached to Project:
Community Packages
Opened by WhoseTheNerd (WhoseTheNerd) - Sunday, 07 August 2022, 12:02 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Monday, 08 August 2022, 17:11 GMT
Opened by WhoseTheNerd (WhoseTheNerd) - Sunday, 07 August 2022, 12:02 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Monday, 08 August 2022, 17:11 GMT
|
Details
Description: Running any tensorflow application with Nvidia
graphics card with cuda compute capability 5.0 results in "
./tensorflow/core/kernels/random_op_gpu.h:244]
Non-OK-status:
GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>,
num_blocks, block_size, 0, d.stream(), key, counter, gen,
data, size, dist) status: INTERNAL: no kernel image is
available for execution on the device". Earlier in the logs
it says that tensorflow wasn't built with compute capability
5.0: "W
tensorflow/core/common_runtime/gpu/gpu_device.cc:1943]
TensorFlow was not built with CUDA kernel binaries
compatible with compute capability 5.0. CUDA kernels will be
jit-compiled from PTX, which could take 30 minutes or
longer." However jit-compilation doesn't occur, cpu usage
graphs see no high utilization and stays low, about 1%.
Additional info: * package version(s) 2.9.1-2 * config and/or log files etc. * link to upstream bug report, if any Steps to reproduce: Run any tensorflow application requiring GPU, like training a neural network, with compute capability 5.0. I'm using GTX 750 Ti |
This task depends upon
Closed by Sven-Hendrik Haase (Svenstaro)
Monday, 08 August 2022, 17:11 GMT
Reason for closing: Won't fix
Additional comments about closing: We won't support officially deprecated architectures. Users should compile their own versions of tensorflow if that is required.
Monday, 08 August 2022, 17:11 GMT
Reason for closing: Won't fix
Additional comments about closing: We won't support officially deprecated architectures. Users should compile their own versions of tensorflow if that is required.
Your GPU seems older than that: https://developer.nvidia.com/cuda-gpus#compute
The set we build with is current non-deprecated set for CUDA. Building with older architectures might be possible but it's deprecated by NVIDIA and so we're keeping to the officially supported architectures. I think your best bet is to compile it yourself and hope it still works if you need to run on that old GPU for some reason.
Wish that was the case...
[whosethenerd@whosethenerd-pc ~/build/svntogit-community/tensorflow/trunk]$ nano PKGBUILD
[whosethenerd@whosethenerd-pc ~/build/svntogit-community/tensorflow/trunk]$ makepkg -si
==> Making package: tensorflow 2.9.1-2 (Sun 07 Aug 2022 19:54:13 EEST)
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> Retrieving sources...
-> Downloading tensorflow-2.9.1.tar.gz...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 63.5M 0 63.5M 0 0 5805k 0 --:--:-- 0:00:11 --:--:-- 7335k
-> Found fix-c++17-compat.patch
==> Validating source files with sha512sums...
tensorflow-2.9.1.tar.gz ... Passed
fix-c++17-compat.patch ... Passed
==> Extracting sources...
-> Extracting tensorflow-2.9.1.tar.gz with bsdtar
==> Starting prepare()...
==> Starting build()...
Building without cuda and without non-x86-64 optimizations
You have bazel 5.2.0 installed.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=mkl_aarch64 # Build with oneDNN and Compute Library for the Arm Architecture (ACL).
--config=monolithic # Config for mostly static monolithic build.
--config=numa # Build with NUMA support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
--config=v1 # Build with TensorFlow 1 API instead of TF 2 API.
Preconfigured Bazel build configs to DISABLE default on features:
--config=nogcp # Disable GCP support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
Starting local Bazel server and connecting to it...
INFO: Options provided by the client:
Inherited 'common' options: --isatty=1 --terminal_columns=236
INFO: Reading rc options for 'build' from /home/whosethenerd/build/svntogit-community/tensorflow/trunk/src/tensorflow-2.9.1/.bazelrc:
Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /home/whosethenerd/build/svntogit-community/tensorflow/trunk/src/tensorflow-2.9.1/.bazelrc:
'build' options: --define framework_shared_object=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library
INFO: Reading rc options for 'build' from /home/whosethenerd/build/svntogit-community/tensorflow/trunk/src/tensorflow-2.9.1/.tf_configure.bazelrc:
'build' options: --action_env PYTHON_BIN_PATH=/usr/bin/python --action_env PYTHON_LIB_PATH=/usr/lib/python3.10/site-packages --python_path=/usr/bin/python --define=with_xla_support=true --action_env TF_SYSTEM_LIBS=boringssl,curl,cython,gif,icu,libjpeg_turbo,lmdb,nasm,png,pybind11,zlib
INFO: Reading rc options for 'build' from /home/whosethenerd/build/svntogit-community/tensorflow/trunk/src/tensorflow-2.9.1/.bazelrc:
'build' options: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
INFO: Found applicable config definition build:short_logs in file /home/whosethenerd/build/svntogit-community/tensorflow/trunk/src/tensorflow-2.9.1/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /home/whosethenerd/build/svntogit-community/tensorflow/trunk/src/tensorflow-2.9.1/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:mkl in file /home/whosethenerd/build/svntogit-community/tensorflow/trunk/src/tensorflow-2.9.1/.bazelrc: --define=build_with_mkl=true --define=enable_mkl=true --define=tensorflow_mkldnn_contraction_kernel=0 --define=build_with_openmp=true -c opt
INFO: Found applicable config definition build:linux in file /home/whosethenerd/build/svntogit-community/tensorflow/trunk/src/tensorflow-2.9.1/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /home/whosethenerd/build/svntogit-community/tensorflow/trunk/src/tensorflow-2.9.1/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
DEBUG: Rule 'io_bazel_rules_docker' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1596824487 -0400"
DEBUG: Repository io_bazel_rules_docker instantiated at:
/home/whosethenerd/build/svntogit-community/tensorflow/trunk/src/tensorflow-2.9.1/WORKSPACE:23:14: in <toplevel>
/home/whosethenerd/build/svntogit-community/tensorflow/trunk/src/tensorflow-2.9.1/tensorflow/workspace0.bzl:107:34: in workspace
/home/whosethenerd/.cache/bazel/_bazel_whosethenerd/cb8aecb20a72b139c69318309100993b/external/bazel_toolchains/repositories/repositories.bzl:35:23: in repositories
Repository rule git_repository defined at:
/home/whosethenerd/.cache/bazel/_bazel_whosethenerd/cb8aecb20a72b139c69318309100993b/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in <toplevel>
INFO: Analyzed 4 targets (486 packages loaded, 27036 targets configured).
INFO: Found 4 targets...
[0 / 25] [Prepa] Writing file tensorflow/libtensorflow_cc.so.2.9.1-2.params
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.ExceptionInInitializerError
at com.google.devtools.build.lib.actions.ParameterFile.writeContent(ParameterFile.java:118)
at com.google.devtools.build.lib.actions.ParameterFile.writeParameterFile(ParameterFile.java:111)
at com.google.devtools.build.lib.analysis.actions.ParameterFileWriteAction$ParamFileWriter.writeOutputFile(ParameterFileWriteAction.java:170)
at com.google.devtools.build.lib.exec.FileWriteStrategy.beginWriteOutputToFile(FileWriteStrategy.java:58)
at com.google.devtools.build.lib.analysis.actions.FileWriteActionContext.beginWriteOutputToFile(FileWriteActionContext.java:49)
at com.google.devtools.build.lib.analysis.actions.AbstractFileWriteAction.beginExecution(AbstractFileWriteAction.java:66)
at com.google.devtools.build.lib.actions.Action.execute(Action.java:133)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$5.execute(SkyframeActionExecutor.java:907)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1076)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1031)
at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:152)
at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:91)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:492)
at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:856)
at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:349)
at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:169)
at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:590)
at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make java.lang.String(byte[],byte) accessible: module java.base does not "opens java.lang" to unnamed module @3daa422a
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
at java.base/java.lang.reflect.Constructor.checkCanSetAccessible(Constructor.java:191)
at java.base/java.lang.reflect.Constructor.setAccessible(Constructor.java:184)
at com.google.devtools.build.lib.unsafe.StringUnsafe.<init>(StringUnsafe.java:75)
at com.google.devtools.build.lib.unsafe.StringUnsafe.initInstance(StringUnsafe.java:56)
at com.google.devtools.build.lib.unsafe.StringUnsafe.<clinit>(StringUnsafe.java:37)
... 21 more
==> ERROR: A failure occurred in build().
Aborting...
[whosethenerd@whosethenerd-pc ~/build/svntogit-community/tensorflow/trunk]$ pacman -Qs openjdk
local/jdk-openjdk 18.0.2.u9-1
OpenJDK Java 18 development kit
local/jdk11-openjdk 11.0.16.u8-2
OpenJDK Java 11 development kit
local/jre-openjdk 18.0.2.u9-1
OpenJDK Java 18 full runtime environment
local/jre-openjdk-headless 18.0.2.u9-1
OpenJDK Java 18 headless runtime environment
local/jre11-openjdk 11.0.16.u8-2
OpenJDK Java 11 full runtime environment
local/jre11-openjdk-headless 11.0.16.u8-2
OpenJDK Java 11 headless runtime environment
[whosethenerd@whosethenerd-pc ~/build/svntogit-community/tensorflow/trunk]$