FS#68312 - [nvidia] nvidia_uvm uses unknown symbols and breaks CUDA and OpenCL on kernel 5.9.x
Attached to Project:
Arch Linux
Opened by Nick Cao (NickCao) - Sunday, 18 October 2020, 02:12 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Wednesday, 04 November 2020, 00:30 GMT
Opened by Nick Cao (NickCao) - Sunday, 18 October 2020, 02:12 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Wednesday, 04 November 2020, 00:30 GMT
|
Details
Description:
CUDA and OpenCL won't work on 5.9 series kernels due to nvidia_uvm module using unknown symbols. Meanwhile, the other functionalities are unaffected. Additional info: nvidia 455.28-1 linux 5.9.1.zen1-1 dmesg output: [ 1332.354872] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint. [ 1332.355282] nvidia_uvm: Unknown symbol set_cpus_allowed_ptr (err -2) [ 1332.355348] nvidia_uvm: Unknown symbol mmu_notifier_unregister (err -2) [ 1332.355528] nvidia_uvm: Unknown symbol __mmu_notifier_register (err -2) Steps to reproduce: Install the latest kernel and nvidia driver. Load nvidia_uvm module, and the logs above appear in dmesg. Try to run anything depending on CUDA or OpenCL, for example mpv with NVDEC, causes a failure. |
This task depends upon
Closed by Sven-Hendrik Haase (Svenstaro)
Wednesday, 04 November 2020, 00:30 GMT
Reason for closing: Fixed
Wednesday, 04 November 2020, 00:30 GMT
Reason for closing: Fixed
[1] https://forums.developer.nvidia.com/t/nvidia-driver-not-yet-supported-for-linux-kernel-5-9/157263/3
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=262e6ae7081df304fc625cf368d5c2cbba2bb991
After update from nvidia 455.28-4 to nvidia 455.28-7 run hashcat.
Running kernel 5.9.1-arch1-1 and nvidia 455.28-7:
$ hashcat -m 22000 --benchmark
hashcat (v6.1.1-120-g15bf8b730) starting in benchmark mode...
Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.
cuInit(): unknown error
clGetPlatformIDs(): CL_PLATFORM_NOT_FOUND_KHR
ATTENTION! No OpenCL-compatible or CUDA-compatible platform found.
You are probably missing the OpenCL or CUDA runtime installation.
* AMD GPUs on Linux require this driver:
"RadeonOpenCompute (ROCm)" Software Platform (3.1 or later)
* Intel CPUs require this runtime:
"OpenCL Runtime for Intel Core and Intel Xeon Processors" (16.1.1 or later)
* NVIDIA GPUs require this runtime and/or driver (both):
"NVIDIA Driver" (440.64 or later)
"CUDA Toolkit" (9.0 or later)
Started: Sun Oct 18 19:11:46 2020
Stopped: Sun Oct 18 19:11:46 2020
$ uname -r
5.9.1-arch1-1
$ pacman -Q | grep nvidia
nvidia 455.28-7
nvidia-settings 455.28-1
nvidia-utils 455.28-1
opencl-nvidia 455.28-1
Running kernel 5.8.14-arch1-1 and nvidia 455.28-4, everything is fine:
$ uname -r
5.8.14-arch1-1
$ pacman -Q | grep nvidia
nvidia 455.28-4
nvidia-settings 455.28-1
nvidia-utils 455.28-1
opencl-nvidia 455.28-1
$ hashcat -m 22000 --benchmark
hashcat (v6.1.1-120-g15bf8b730) starting in benchmark mode...
Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.
CUDA API (CUDA 11.1)
====================
* Device #1: GeForce GTX 1080 Ti, 10944/11175 MB, 28MCU
OpenCL API (OpenCL 1.2 CUDA 11.1.96) - Platform #1 [NVIDIA Corporation]
=======================================================================
* Device #2: GeForce GTX 1080 Ti, skipped
Benchmark relevant options:
===========================
* --optimized-kernel-enable
Hashmode: 22000 - WPA-PBKDF2-PMKID+EAPOL (Iterations: 4095)
Speed.#1.........: 631.8 kH/s (89.17ms) @ Accel:64 Loops:128 Thr:1024 Vec:1
Started: Sun Oct 18 19:06:34 2020
Stopped: Sun Oct 18 19:06:50 2020
How about a small warning that an update to kernel 5.9 is not recommended for users of NVIDIA OpenCl/CUDA until mid November, here:
https://www.archlinux.org/
as mentioned here:
https://forums.developer.nvidia.com/t/nvidia-driver-not-yet-supported-for-linux-kernel-5-9/157263/3
But I guess the maintainers should have noticed it for some time, when 5.9 was in testing repo.
It was also covered by some tech news sites.
Still they decided against warning users, should maybe be handled differently in the future.
If everything looks fine, I perform the update. That has always proven itself and prevent reporting duplicate bug reports.
Especially in this "special case" (discussed in several forums) a warning would be nice.
I fully agree, that the Arch Linux team consists of very good members. That and the excellent WiKi made me choose Arch!
BTW:
I develop penetration testing tools (especially for Arch Linux):
https://github.com/ZerBea
And the OpenCl part of them (hcxkeys) got a shell hit.
In the mean time, I'd like everyone to remember that we're only two maintainers here for the nvidia module and apparently none of us noticed the breakage. You all can become official testers and help us catch those breakages in the future: https://wiki.archlinux.org/index.php/Arch_Testing_Team
Still you really seem to have a shortage on testers, if no one tested the nvidia drivers with kernel 5.9 while it was in testing
(from some hardware reports I would assume that many people use this driver).
I would have opened a report before 5.9 was released, but I thought the situation was so obvious (like I said, even tech media coverage) that I didn't do it.
Now I know better.
Nonetheless the main one to blame is Nvidia, they saw it coming, but didn't react on time.
I'd link to discussions, but these Discord servers are also performing self-help by patching the Nvidia kernel modules' source parts to lie about their licensing, which proves this is more of a legal issue than an actual technical limitation.
AMD only (official) support Ubuntu, CentOS RHEL and SLES 15 Service Pack 2 (maybe they really think this is Linux)
nouveau isn't really working on OpenCl
and NVIDIA is in the middle of a "political" discussion with "The Linux Kernel Team"
uname -a
Linux home 5.4.72-1-lts #1 SMP Sat, 17 Oct 2020 13:30:57 +0000 x86_64 GNU/Linux
./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce RTX 2070"
CUDA Driver Version / Runtime Version 11.1 / 11.1
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 7982 MBytes (8369668096 bytes)
(36) Multiprocessors, ( 64) CUDA Cores/MP: 2304 CUDA Cores
GPU Max Clock rate: 1710 MHz (1.71 GHz)
Memory Clock rate: 7001 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 4194304 bytes
[scritch@scritchpc ~]$ uname -a
Linux scritchpc 5.9.2-arch1-1 #1 SMP PREEMPT Thu, 29 Oct 2020 17:01:28 +0000 x86_64 GNU/Linux
[scritch@scritchpc ~]$ pacman -Qe | grep nvidia
nvidia 455.38-1
nvidia-settings 455.38-1
[scritch@scritchpc ~]$ dmesg | grep nvidia
[ 3.582549] nvidia-gpu 0000:06:00.3: enabling device (0000 -> 0002)
[ 4.423221] nvidia: module license 'NVIDIA' taints kernel.
[ 4.466089] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[ 4.468474] nvidia 0000:06:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[ 4.703690] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 455.38 Thu Oct 22 05:57:59 UTC 2020
[ 4.708535] [drm] [nvidia-drm] [GPU ID 0x00000600] Loading driver
[ 4.708537] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:06:00.0 on minor 0
[ 4.715519] nvidia-gpu 0000:06:00.3: i2c timeout error e0000000
[ 65.425495] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[ 65.432366] nvidia-uvm: Loaded the UVM driver, major device number 234.
[scritch@scritchpc ~]$ clinfo
Number of platforms 1
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.2 CUDA 11.1.110
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid
Platform Extensions function suffix NV
Platform Name NVIDIA CUDA
Number of devices 1
Device Name GeForce GTX 1650 SUPER
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.2 CUDA
Driver Version 455.38
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Topology (NV) PCI-E, 06:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 20
Max clock frequency 1740MHz
Compute Capability (NV) 7.5
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple 32
Warp size (NV) 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 4098752512 (3.817GiB)
Error Correction support No
Max memory allocation 1024688128 (977.2MiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Global Memory cache type Read/Write
Global Memory cache size 655360 (640KiB)
Global Memory cache line size 128 bytes
Image support Yes
Max number of samplers per kernel 32
Max size for 1D images from buffer 268435456 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 32768x32768 pixels
Max 3D image size 16384x16384x16384 pixels
Max number of read image args 256
Max number of write image args 32
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 65536
Max number of constant args 9
Max constant buffer size 65536 (64KiB)
Max size of kernel argument 4352 (4.25KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop No
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Kernel execution timeout (NV) Yes
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 3
printf() buffer size 1048576 (1024KiB)
Built-in kernels (n/a)
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) NVIDIA CUDA
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [NV]
clCreateContext(NULL, ...) [default] Success [NV]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) Invalid device type for platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.13
ICD loader Profile OpenCL 3.0
NOTE: your OpenCL library declares to support OpenCL 3.0,
but it seems to support up to OpenCL 2.2 only.
$ uname -r
5.9.2-arch1-1
$ hashcat -m 22000 --benchmark
hashcat (v6.1.1-120-g15bf8b730) starting in benchmark mode...
CUDA API (CUDA 11.1)
====================
* Device #1: GeForce GTX 1080 Ti, 10955/11175 MB, 28MCU
OpenCL API (OpenCL 1.2 CUDA 11.1.110) - Platform #1 [NVIDIA Corporation]
========================================================================
* Device #2: GeForce GTX 1080 Ti, skipped
Benchmark relevant options:
===========================
* --optimized-kernel-enable
Hashmode: 22000 - WPA-PBKDF2-PMKID+EAPOL (Iterations: 4095)
Speed.#1.........: 591.8 kH/s (46.65ms) @ Accel:64 Loops:64 Thr:1024 Vec:1
Started: Sat Oct 31 14:43:34 2020
Stopped: Sat Oct 31 14:43:50 2020