FS#77381 - [cuda] CUDA Package does not include libcudadebugger.so.1

Attached to Project: Community Packages
Opened by Kai Willett (ProtoByter) - Saturday, 04 February 2023, 09:03 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Wednesday, 08 February 2023, 17:58 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Sven-Hendrik Haase (Svenstaro)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:
Despite including cuda-gdb which relies on this shared object file, this package does not contain it.
This causes this very non-descriptive error:
`Could not find CUDA Debugger back-end. Please try upgrading/re-installing the GPU driver`

This issue has also been reported on the NVIDIA Developer forums by another user, where a member of NVIDIA staff has said to copy the shared object from the official `.run` installer:
https://forums.developer.nvidia.com/t/cant-use-cuda-gdb/235380/2

Additional info:
* package version(s)
As far as I can tell, anything above 520.56.06
* GPU being used:
NVIDIA GeForce 1650 SUPER
* NVIDIA Driver tested on:
525.85.05

Steps to reproduce:
Try use `cuda-gdb` to debug a cuda application

Following the steps outlined in the forum post fixed this, however it'd likely be better for it to be fixed in the package instead of being fixed by a hacky workaround
This task depends upon

Closed by  Sven-Hendrik Haase (Svenstaro)
Wednesday, 08 February 2023, 17:58 GMT
Reason for closing:  Fixed
Comment by Kai Willett (ProtoByter) - Saturday, 04 February 2023, 14:40 GMT
I just realised that I was mentioning nvidia driver version numbers instead of cuda version numbers, so afaict it affects cuda 11.8 (see: https://docs.nvidia.com/deploy/cuda-compatibility/index.html#installing-from-network-repo ) onwards since cuda 11.8 was the version that introduced this shared library, on that note this likely should be on the packages category instead of packages: testing since cuda 11.8 is in the normal packages, I'm not sure if I can change that myself and if so how to do it however so I haven't changed that
Comment by Sven-Hendrik Haase (Svenstaro) - Tuesday, 07 February 2023, 01:03 GMT
I can't reproduce. So I'm on cuda 11.8, I run `cuda-gdb /my/binary` and I'm supposed to see some errors? Which shared file is even supposedly missing, anyway? Can you provide a patch for the PKGBUILD that fixes the problem for you?
Comment by Kai Willett (ProtoByter) - Tuesday, 07 February 2023, 13:46 GMT
The file that was missing for me is `libcudadebugger.so.1`, as the forum post says, I can't currently test it (I'll be home in a couple of hours)
Comment by Kai Willett (ProtoByter) - Tuesday, 07 February 2023, 16:49 GMT
Upon further investigation, it appears that `libcuda` etc are provided by `nvidia-utils`, `libcudadebugger.so` is not provided by that.
I'm not sure why it isn't failing for you, since I tested both cuda 11.8 and cuda 12, which both fail

CUDA 12 Log:

kai@kaipc ~/c/o/c/bin> EPOCHS=2 cuda-gdb ~/src/verlet/cmake-build-release-cuda/cudagravity_sim
NVIDIA (R) CUDA Debugger
11.8 release
Portions Copyright (C) 2007-2022 NVIDIA Corporation
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/kai/src/verlet/cmake-build-release-cuda/cudagravity_sim...
(No debugging symbols found in /home/kai/src/verlet/cmake-build-release-cuda/cudagravity_sim)
(cuda-gdb) r
Starting program: /home/kai/src/verlet/cmake-build-release-cuda/cudagravity_sim
BFD: /lib64/ld-linux-x86-64.so.2: unknown type [0x13] section `.relr.dyn'
warning: `/lib64/ld-linux-x86-64.so.2': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
warning: `/lib64/ld-linux-x86-64.so.2': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
BFD: /usr/lib/libm.so.6: unknown type [0x13] section `.relr.dyn'
warning: `/usr/lib/libm.so.6': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
BFD: /usr/lib/libc.so.6: unknown type [0x13] section `.relr.dyn'
warning: `/usr/lib/libc.so.6': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
BFD: /usr/lib/libdl.so.2: unknown type [0x13] section `.relr.dyn'
warning: `/usr/lib/libdl.so.2': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
BFD: /usr/lib/libpthread.so.0: unknown type [0x13] section `.relr.dyn'
warning: `/usr/lib/libpthread.so.0': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
BFD: /usr/lib/librt.so.1: unknown type [0x13] section `.relr.dyn'
warning: `/usr/lib/librt.so.1': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Could not find CUDA Debugger back-end. Please try upgrading/re-installing the GPU driver
[New Thread 0x7fffef99e000 (LWP 7432)]
^C
Thread 1 "cudagravity_sim" received signal SIGINT, Interrupt.
0x00007ffff7ae5db5 in clock_nanosleep () from /usr/lib/libc.so.6
(cuda-gdb)

CUDA 11.8 Log:

kai@kaipc ~/c/o/c/bin> EPOCHS=2 ./cuda-gdb ~/src/verlet/cmake-build-release-cuda/cudagravity_sim
NVIDIA (R) CUDA Debugger
11.8 release
Portions Copyright (C) 2007-2022 NVIDIA Corporation
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/kai/src/verlet/cmake-build-release-cuda/cudagravity_sim...
(No debugging symbols found in /home/kai/src/verlet/cmake-build-release-cuda/cudagravity_sim)
(cuda-gdb) r
Starting program: /home/kai/src/verlet/cmake-build-release-cuda/cudagravity_sim
BFD: /lib64/ld-linux-x86-64.so.2: unknown type [0x13] section `.relr.dyn'
warning: `/lib64/ld-linux-x86-64.so.2': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
warning: `/lib64/ld-linux-x86-64.so.2': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
BFD: /usr/lib/libm.so.6: unknown type [0x13] section `.relr.dyn'
warning: `/usr/lib/libm.so.6': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
BFD: /usr/lib/libc.so.6: unknown type [0x13] section `.relr.dyn'
warning: `/usr/lib/libc.so.6': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
BFD: /usr/lib/libdl.so.2: unknown type [0x13] section `.relr.dyn'
warning: `/usr/lib/libdl.so.2': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
BFD: /usr/lib/libpthread.so.0: unknown type [0x13] section `.relr.dyn'
warning: `/usr/lib/libpthread.so.0': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
BFD: /usr/lib/librt.so.1: unknown type [0x13] section `.relr.dyn'
warning: `/usr/lib/librt.so.1': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Could not find CUDA Debugger back-end. Please try upgrading/re-installing the GPU driver
[New Thread 0x7fffef99e000 (LWP 7375)]
^C
Thread 1 "cudagravity_sim" received signal SIGINT, Interrupt.
0x00007ffff7ae5db5 in clock_nanosleep () from /usr/lib/libc.so.6
(cuda-gdb)
Comment by Sven-Hendrik Haase (Svenstaro) - Wednesday, 08 February 2023, 04:01 GMT
Thank you for the description, was able to repro with that. I think rel 3 of nvidia-utils fixes it. Please test!
Comment by Kai Willett (ProtoByter) - Wednesday, 08 February 2023, 08:16 GMT
Yep that fixed it! Thanks!

Loading...