FS#62282 - [cuda] $PATH and $LD_LIBRARY_PATH were not updated automatically
Attached to Project:
Community Packages
Opened by Zhen Xi (Mayrixon) - Tuesday, 09 April 2019, 00:12 GMT
Last edited by Konstantin Gizdov (kgizdov) - Thursday, 11 April 2019, 16:50 GMT
Opened by Zhen Xi (Mayrixon) - Tuesday, 09 April 2019, 00:12 GMT
Last edited by Konstantin Gizdov (kgizdov) - Thursday, 11 April 2019, 16:50 GMT
|
Details
Description:
The package requiring cuda, such as python-tensorflow-cuda cannot find cuda *.so automatically. Additional info: * package version(s) cuda 10.1.105-6 * config and/or log files etc. * link to upstream bug report, if any Steps to reproduce: |
This task depends upon
Closed by Konstantin Gizdov (kgizdov)
Thursday, 11 April 2019, 16:50 GMT
Reason for closing: Fixed
Additional comments about closing: glibc-2.28-6
cuda-10.1.105-8
Thursday, 11 April 2019, 16:50 GMT
Reason for closing: Fixed
Additional comments about closing: glibc-2.28-6
cuda-10.1.105-8
Package versions:
cuda 10.1.105-6
python-tensorflow-version 1.13.1-4
Logs:
2019-04-09 01:14:15.966295: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-09 01:14:15.983045: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 4009500000 Hz
2019-04-09 01:14:15.983530: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x564380471ef0 executing computations on platform Host. Devices:
2019-04-09 01:14:15.983542: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
2019-04-09 01:14:16.038294: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-09 01:14:16.038739: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x56437f2f9fd0 executing computations on platform CUDA. Devices:
2019-04-09 01:14:16.038753: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
2019-04-09 01:14:16.038956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.39GiB
2019-04-09 01:14:16.038965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-09 01:14:16.282489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-09 01:14:16.282508: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-04-09 01:14:16.282512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-04-09 01:14:16.282705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7120 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-04-09 01:14:17.015797: I tensorflow/stream_executor/dso_loader.cc:142] Couldn't open CUDA library libcublas.so.10.1. LD_LIBRARY_PATH:
2019-04-09 01:14:17.015819: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Failed precondition: could not dlopen DSO: libcublas.so.10.1; dlerror: libcublas.so.10.1: cannot open shared object file: No such file or directory
The bug could be fixed by adding the following commands in .bashrc or .zshrc:
export PATH=/opt/cuda/bin:$PATH
export LD_LIBRARY_PATH=/opt/cuda/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
Please describe in detail you setup (CPU, GPU, relevant installed packages) and a full steps to reproduce the issue.
CPU: i7-6700k
GPU: GTX 1080
relevant installed packages:
linux 5.0.7.arch1-1
nvidia 418.56-7
nvidia-utils 418.56-1
cuda 10.1.105-6
cudnn 7.5.0.56-1
python 3.7.3-1
python-tensorflow-cuda 1.13.1-4
relevant settings:
/etc/mkinitcpio.conf
MODULES=(nvidia nvidia_modeset nvidia_uvm nvidia_drm)
HOOKS=(base udev autodetect modconf block filesystems keyboard fsck)
/etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet nvidia-drm.modeset=1 nowatchdog"
/etc/pacman.d/hooks
[Trigger]
Operation=Install
Operation=Upgrade
Operation=Remove
Type=Package
Target=nvidia
Target=linux
# Change the linux part above and in the Exec line if a different kernel is used
[Action]
Description=Update Nvidia module in initcpio
Depends=mkinitcpio
When=PostTransaction
NeedsTargets
Exec=/bin/sh -c 'while read -r trg; do case $trg in linux) exit 0; esac; done; /usr/bin/mkinitcpio -P'
Shell variables:
$PATH
/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/opt/cuda/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/home/zhen/.antigen/bundles/robbyrussell/oh-my-zsh/lib:/home/zhen/.antigen/bundles/robbyrussell/oh-my-zsh/plugins/command-not-found:/home/zhen/.antigen/bundles/robbyrussell/oh-my-zsh/plugins/fzf:/home/zhen/.antigen/bundles/robbyrussell/oh-my-zsh/plugins/tmux:/home/zhen/.antigen/bundles/robbyrussell/oh-my-zsh/plugins/git:/home/zhen/.antigen/bundles/robbyrussell/oh-my-zsh/plugins/gitignore:/home/zhen/.antigen/bundles/robbyrussell/oh-my-zsh/plugins/pip:/home/zhen/.antigen/bundles/robbyrussell/oh-my-zsh/plugins/colorize:/home/zhen/.antigen/bundles/robbyrussell/oh-my-zsh/plugins/history:/home/zhen/.antigen/bundles/robbyrussell/oh-my-zsh/plugins/thefuck:/home/zhen/.antigen/bundles/Vifon/deer:/home/zhen/.antigen/bundles/supercrabtree/k:/home/zhen/.antigen/bundles/zsh-users/zsh-autosuggestions:/home/zhen/.antigen/bundles/zsh-users/zsh-completions
$LD_LIBRARY_PATH (There is no this variable)
Steps to reproduce:
1. Create a python script minimum_script.py as follows:
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
2. Execute command python minimum_script.py
3. Logs as follows:
WARNING:tensorflow:From /usr/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-04-10 20:42:50.494741: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-10 20:42:50.515438: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 4009500000 Hz
2019-04-10 20:42:50.515766: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55879a667250 executing computations on platform Host. Devices:
2019-04-10 20:42:50.515781: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
2019-04-10 20:42:50.585591: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-10 20:42:50.586063: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x558799e29500 executing computations on platform CUDA. Devices:
2019-04-10 20:42:50.586075: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
2019-04-10 20:42:50.586274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 6.12GiB
2019-04-10 20:42:50.586283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-10 20:42:50.842967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-10 20:42:50.842988: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-04-10 20:42:50.842995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-04-10 20:42:50.843184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5892 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
Epoch 1/5
2019-04-10 20:42:50.989782: I tensorflow/stream_executor/dso_loader.cc:142] Couldn't open CUDA library libcublas.so.10.1. LD_LIBRARY_PATH:
2019-04-10 20:42:50.989802: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Failed precondition: could not dlopen DSO: libcublas.so.10.1; dlerror: libcublas.so.10.1: cannot open shared object file: No such file or directory
[1] 28870 abort (core dumped) python minimum_script.py
glibc-2.28-6
cuda-10.1.105-8
The problem has been solved. Thank you!