Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in Unsupported. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#68312 - [nvidia] nvidia_uvm uses unknown symbols and breaks CUDA and OpenCL on kernel 5.9.x

Attached to Project: Arch Linux
Opened by Nick Cao (NickCao) - Sunday, 18 October 2020, 02:12 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Wednesday, 04 November 2020, 00:30 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Sven-Hendrik Haase (Svenstaro)
Felix Yan (felixonmars)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 17
Private No

Details

Description:
CUDA and OpenCL won't work on 5.9 series kernels due to nvidia_uvm module using unknown symbols.
Meanwhile, the other functionalities are unaffected.

Additional info:
nvidia 455.28-1
linux 5.9.1.zen1-1

dmesg output:
[ 1332.354872] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[ 1332.355282] nvidia_uvm: Unknown symbol set_cpus_allowed_ptr (err -2)
[ 1332.355348] nvidia_uvm: Unknown symbol mmu_notifier_unregister (err -2)
[ 1332.355528] nvidia_uvm: Unknown symbol __mmu_notifier_register (err -2)

Steps to reproduce:
Install the latest kernel and nvidia driver.
Load nvidia_uvm module, and the logs above appear in dmesg.
Try to run anything depending on CUDA or OpenCL, for example mpv with NVDEC, causes a failure.
This task depends upon

Closed by  Sven-Hendrik Haase (Svenstaro)
Wednesday, 04 November 2020, 00:30 GMT
Reason for closing:  Fixed
Comment by loqs (loqs) - Sunday, 18 October 2020, 03:58 GMT
Comment by Michael (ZeroBeat) - Sunday, 18 October 2020, 21:58 GMT
I can confirm this. Steps to reproduce:

After update from nvidia 455.28-4 to nvidia 455.28-7 run hashcat.

Running kernel 5.9.1-arch1-1 and nvidia 455.28-7:

$ hashcat -m 22000 --benchmark
hashcat (v6.1.1-120-g15bf8b730) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

cuInit(): unknown error

clGetPlatformIDs(): CL_PLATFORM_NOT_FOUND_KHR

ATTENTION! No OpenCL-compatible or CUDA-compatible platform found.

You are probably missing the OpenCL or CUDA runtime installation.

* AMD GPUs on Linux require this driver:
"RadeonOpenCompute (ROCm)" Software Platform (3.1 or later)
* Intel CPUs require this runtime:
"OpenCL Runtime for Intel Core and Intel Xeon Processors" (16.1.1 or later)
* NVIDIA GPUs require this runtime and/or driver (both):
"NVIDIA Driver" (440.64 or later)
"CUDA Toolkit" (9.0 or later)

Started: Sun Oct 18 19:11:46 2020
Stopped: Sun Oct 18 19:11:46 2020

$ uname -r
5.9.1-arch1-1

$ pacman -Q | grep nvidia
nvidia 455.28-7
nvidia-settings 455.28-1
nvidia-utils 455.28-1
opencl-nvidia 455.28-1



Running kernel 5.8.14-arch1-1 and nvidia 455.28-4, everything is fine:

$ uname -r
5.8.14-arch1-1

$ pacman -Q | grep nvidia
nvidia 455.28-4
nvidia-settings 455.28-1
nvidia-utils 455.28-1
opencl-nvidia 455.28-1


$ hashcat -m 22000 --benchmark
hashcat (v6.1.1-120-g15bf8b730) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

CUDA API (CUDA 11.1)
====================
* Device #1: GeForce GTX 1080 Ti, 10944/11175 MB, 28MCU

OpenCL API (OpenCL 1.2 CUDA 11.1.96) - Platform #1 [NVIDIA Corporation]
=======================================================================
* Device #2: GeForce GTX 1080 Ti, skipped

Benchmark relevant options:
===========================
* --optimized-kernel-enable

Hashmode: 22000 - WPA-PBKDF2-PMKID+EAPOL (Iterations: 4095)

Speed.#1.........: 631.8 kH/s (89.17ms) @ Accel:64 Loops:128 Thr:1024 Vec:1

Started: Sun Oct 18 19:06:34 2020
Stopped: Sun Oct 18 19:06:50 2020
Comment by Michael (ZeroBeat) - Monday, 19 October 2020, 12:44 GMT
BTW:
How about a small warning that an update to kernel 5.9 is not recommended for users of NVIDIA OpenCl/CUDA until mid November, here:
https://www.archlinux.org/
as mentioned here:
https://forums.developer.nvidia.com/t/nvidia-driver-not-yet-supported-for-linux-kernel-5-9/157263/3
Comment by G3ro (G3ro) - Monday, 19 October 2020, 20:04 GMT
I second ZeroBeats request for a warning.

But I guess the maintainers should have noticed it for some time, when 5.9 was in testing repo.
It was also covered by some tech news sites.

Still they decided against warning users, should maybe be handled differently in the future.
Comment by Michael (ZeroBeat) - Monday, 19 October 2020, 20:33 GMT
@G3ro: before updating, I take a look at https://www.archlinux.org/
If everything looks fine, I perform the update. That has always proven itself and prevent reporting duplicate bug reports.
Especially in this "special case" (discussed in several forums) a warning would be nice.
I fully agree, that the Arch Linux team consists of very good members. That and the excellent WiKi made me choose Arch!

BTW:
I develop penetration testing tools (especially for Arch Linux):
https://github.com/ZerBea

And the OpenCl part of them (hcxkeys) got a shell hit.
Comment by Sven-Hendrik Haase (Svenstaro) - Monday, 19 October 2020, 21:05 GMT
I proposed a news item.

In the mean time, I'd like everyone to remember that we're only two maintainers here for the nvidia module and apparently none of us noticed the breakage. You all can become official testers and help us catch those breakages in the future: https://wiki.archlinux.org/index.php/Arch_Testing_Team
Comment by Michael (ZeroBeat) - Monday, 19 October 2020, 21:35 GMT
@Sven-Hendrik Haase: As mentioned before, you're doing a great job! Thanks for everything!
Comment by G3ro (G3ro) - Monday, 19 October 2020, 21:36 GMT
@Svenstaro: Ok, sry for the harsh words from before.
Still you really seem to have a shortage on testers, if no one tested the nvidia drivers with kernel 5.9 while it was in testing
(from some hardware reports I would assume that many people use this driver).

I would have opened a report before 5.9 was released, but I thought the situation was so obvious (like I said, even tech media coverage) that I didn't do it.
Now I know better.

Nonetheless the main one to blame is Nvidia, they saw it coming, but didn't react on time.
Comment by Magnus Boman (katt) - Monday, 19 October 2020, 21:40 GMT
@G3ro: Most users won't be impacted by this at all, only those who make use of CUDA and/or OpenCL will be affected.
Comment by Eli Schwartz (eschwartz) - Monday, 19 October 2020, 21:51 GMT
Thoughts on just providing this as a post_upgrade notice? It should only affect specific applications using CUDA and/or OpenCL, so it's purely advisory / headsup, and only after a reboot.
Comment by Christopher Snowhill (kode54) - Monday, 19 October 2020, 22:00 GMT
Vulkan ray tracing is apparently also affected, from what I've seen on at least one Discord server discussing the issue. As is NVENC and possibly NVDEC.

I'd link to discussions, but these Discord servers are also performing self-help by patching the Nvidia kernel modules' source parts to lie about their licensing, which proves this is more of a legal issue than an actual technical limitation.
Comment by Michael (ZeroBeat) - Tuesday, 20 October 2020, 05:58 GMT
@Svenstaro: Unfortunately we (OpenCl) are in an ugly situation and there is nothing you can do to solve it (don't worry):

AMD only (official) support Ubuntu, CentOS RHEL and SLES 15 Service Pack 2 (maybe they really think this is Linux)
nouveau isn't really working on OpenCl
and NVIDIA is in the middle of a "political" discussion with "The Linux Kernel Team"
Comment by Martin Dünkelmann (MartinX3) - Wednesday, 21 October 2020, 06:33 GMT
I hope that they will rethink and introduce more open source parts in their nvidia driver, like AMD/intel did.
Comment by Donald Munro (hdm) - Friday, 30 October 2020, 19:21 GMT
If you don't need 5.9 then booting with the linux-lts (5.4.72-1) kernel works:
uname -a
Linux home 5.4.72-1-lts #1 SMP Sat, 17 Oct 2020 13:30:57 +0000 x86_64 GNU/Linux
./deviceQuery
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce RTX 2070"
CUDA Driver Version / Runtime Version 11.1 / 11.1
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 7982 MBytes (8369668096 bytes)
(36) Multiprocessors, ( 64) CUDA Cores/MP: 2304 CUDA Cores
GPU Max Clock rate: 1710 MHz (1.71 GHz)
Memory Clock rate: 7001 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 4194304 bytes
Comment by Scritch (Scritch) - Saturday, 31 October 2020, 12:22 GMT
Seems to be working on 5.9.2:

[scritch@scritchpc ~]$ uname -a
Linux scritchpc 5.9.2-arch1-1 #1 SMP PREEMPT Thu, 29 Oct 2020 17:01:28 +0000 x86_64 GNU/Linux

[scritch@scritchpc ~]$ pacman -Qe | grep nvidia
nvidia 455.38-1
nvidia-settings 455.38-1

[scritch@scritchpc ~]$ dmesg | grep nvidia
[ 3.582549] nvidia-gpu 0000:06:00.3: enabling device (0000 -> 0002)
[ 4.423221] nvidia: module license 'NVIDIA' taints kernel.
[ 4.466089] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[ 4.468474] nvidia 0000:06:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[ 4.703690] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 455.38 Thu Oct 22 05:57:59 UTC 2020
[ 4.708535] [drm] [nvidia-drm] [GPU ID 0x00000600] Loading driver
[ 4.708537] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:06:00.0 on minor 0
[ 4.715519] nvidia-gpu 0000:06:00.3: i2c timeout error e0000000
[ 65.425495] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[ 65.432366] nvidia-uvm: Loaded the UVM driver, major device number 234.

[scritch@scritchpc ~]$ clinfo
Number of platforms 1
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.2 CUDA 11.1.110
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid
Platform Extensions function suffix NV

Platform Name NVIDIA CUDA
Number of devices 1
Device Name GeForce GTX 1650 SUPER
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.2 CUDA
Driver Version 455.38
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Topology (NV) PCI-E, 06:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 20
Max clock frequency 1740MHz
Compute Capability (NV) 7.5
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple 32
Warp size (NV) 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 4098752512 (3.817GiB)
Error Correction support No
Max memory allocation 1024688128 (977.2MiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Global Memory cache type Read/Write
Global Memory cache size 655360 (640KiB)
Global Memory cache line size 128 bytes
Image support Yes
Max number of samplers per kernel 32
Max size for 1D images from buffer 268435456 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 32768x32768 pixels
Max 3D image size 16384x16384x16384 pixels
Max number of read image args 256
Max number of write image args 32
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 65536
Max number of constant args 9
Max constant buffer size 65536 (64KiB)
Max size of kernel argument 4352 (4.25KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop No
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Kernel execution timeout (NV) Yes
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 3
printf() buffer size 1048576 (1024KiB)
Built-in kernels (n/a)
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) NVIDIA CUDA
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [NV]
clCreateContext(NULL, ...) [default] Success [NV]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) Invalid device type for platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform

ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.13
ICD loader Profile OpenCL 3.0
NOTE: your OpenCL library declares to support OpenCL 3.0,
but it seems to support up to OpenCL 2.2 only.
Comment by Michael (ZeroBeat) - Saturday, 31 October 2020, 13:48 GMT
I can confirm that it is working:
$ uname -r
5.9.2-arch1-1

$ hashcat -m 22000 --benchmark
hashcat (v6.1.1-120-g15bf8b730) starting in benchmark mode...
CUDA API (CUDA 11.1)
====================
* Device #1: GeForce GTX 1080 Ti, 10955/11175 MB, 28MCU
OpenCL API (OpenCL 1.2 CUDA 11.1.110) - Platform #1 [NVIDIA Corporation]
========================================================================
* Device #2: GeForce GTX 1080 Ti, skipped
Benchmark relevant options:
===========================
* --optimized-kernel-enable
Hashmode: 22000 - WPA-PBKDF2-PMKID+EAPOL (Iterations: 4095)
Speed.#1.........: 591.8 kH/s (46.65ms) @ Accel:64 Loops:64 Thr:1024 Vec:1
Started: Sat Oct 31 14:43:34 2020
Stopped: Sat Oct 31 14:43:50 2020
Comment by Alexander Popov (AlexWayfer) - Saturday, 31 October 2020, 18:26 GMT
I also confirm that these versions work with OBS (NVENC).
Comment by Sven-Hendrik Haase (Svenstaro) - Wednesday, 04 November 2020, 00:30 GMT
Can confirm, works just fine now.

Loading...