FS#72062 : [cuda] nvPTXCompilerCompile segfaults in pthread_mutex

FS#72062 - [cuda] nvPTXCompilerCompile segfaults in pthread_mutex_lock

Attached to Project: Community Packages
Opened by George Stelle (stelleg) - Tuesday, 07 September 2021, 12:47 GMT
Last edited by Konstantin Gizdov (kgizdov) - Tuesday, 14 September 2021, 08:59 GMT

Task Type	Bug Report
Category	Packages
Status	Closed
Assigned To	Sven-Hendrik Haase (Svenstaro) Felix Yan (felixonmars) Konstantin Gizdov (kgizdov)
Architecture	All
Severity	Low
Priority	Normal
Reported Version
Due in Version	Undecided
Due Date	Undecided
Percent Complete
Votes	0
Private	No

Details

Description:
nvPTXCompilerCompile always segfaults when calling pthread_mutex_lock.

Additional info:
* package version(s)
cuda 11.4.1-2
glibc 2.33-5

* config and/or log files etc.
backtrace:
#0 0x00007ffff652d424 in pthread_mutex_lock () from /usr/lib/libpthread.so.0
#1 0x0000555555a8a171 in __ptx14510 ()
#2 0x000055555595088c in __cuda_CallJitEntryPoint ()
#3 0x0000555555941cd3 in nvPTXCompilerCompile ()
#4 0x00005555559412ca in main ()
mutex argument pointer is null:
(gdb) p/x $rdi
$3 = 0x0

* link to upstream bug report, if any
https://developer.nvidia.com/nvidia_bug/3374550

Steps to reproduce:
Follow instructions to build simple example here: https://docs.nvidia.com/cuda/ptx-compiler-api/index.html#sample-example

Additional thoughts:
Can't find anyone else with this issue. I thought maybe it was an issue with glibc version, but according to this: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html, glibc 2.33 is supported on Fedora. I've successfully tested this exam,ple on a centos7 box with glibc 2.17.

This task depends upon

Closed by Konstantin Gizdov (kgizdov)
Tuesday, 14 September 2021, 08:59 GMT
Reason for closing: Works for me

Comment by George Stelle (stelleg) - Wednesday, 08 September 2021, 16:55 GMT

Figured out the issue: lld was doing something wrong. Using gold fixes the problem.

Comment by Konstantin Gizdov (kgizdov) - Tuesday, 14 September 2021, 08:03 GMT

Could you try with cuda 11.4.2-1 and see if that makes a difference?

Comment by Konstantin Gizdov (kgizdov) - Tuesday, 14 September 2021, 08:52 GMT

I am also able to compile and run this code on a GTX 1070Ti with the following changes:
```
--- simpleVectorAddition-old.c 2021-09-14 11:51:02.892151530 +0300
+++ simpleVectorAddition.c 2021-09-14 11:47:24.705485183 +0300
@@ -29,8 +29,8 @@

const char *ptxCode = " \
- .version 7.0 \n \
- .target sm_50 \n \
+ .version 7.4 \n \
+ .target sm_61 \n \
.address_size 64 \n \
.visible .entry simpleVectorAdd( \n \
.param .u64 simpleVectorAdd_param_0, \n \
@@ -127,7 +127,7 @@
char *elf, *infoLog, *errorLog;
unsigned int minorVer, majorVer;

- const char* compile_options[] = { "--gpu-name=sm_70",
+ const char* compile_options[] = { "--gpu-name=sm_61",
"--verbose"
};

```
by issuing the following commands:

```
$ gcc simpleVectorAddition.c -o simpleVectorAddition -I/opt/cuda/include -L/opt/cuda/lib64/ /opt/cuda/lib64/libnvptxcompiler_static.a -lcuda -lm -lpthread -Wl,-rpath,/opt/cuda/lib64
$ ./simpleVectorAddition
Current PTX Compiler API Version : 11.4
Info log: ptxas info : 0 bytes gmem
ptxas info : Compiling entry function 'simpleVectorAdd' for 'sm_61'
ptxas info : Function properties for simpleVectorAdd
ptxas . 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 8 registers, 344 bytes cmem[0]

Result:[0]:0.000000
Result:[1]:3.000000
Result:[2]:6.000000
Result:[3]:9.000000
...
```

	Tasks related to this task (0)

Duplicate tasks of this task (0)

Arch Linux

FS#72062 - [cuda] nvPTXCompilerCompile segfaults in pthread_mutex_lock

Details

Loading...