FS#69256 - [linux] kernel NULL pointer dereference

Attached to Project: Arch Linux
Opened by Nelson Balza (KbpG28) - Saturday, 09 January 2021, 10:17 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Wednesday, 17 February 2021, 23:45 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To No-one
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 3
Private No

Details

Description: The system suddenly crashes, even if I'm not doing anything at all. I had music in background and it continued to play, I tried to change to virtual console with ctrl+alt+f3 or shut down the xrog server with ctrl+alt+backspace, but nothing happened, the screen stayed frozen with what I was doing on screen and music playing in background, I had to force a physical shutdown. This usually happened after hours from boot, it doesn't seem to be a particular application to be causing the problem. It happened with both gnome and kde. This started to happening only from the last update of both kernel and nvidia driver.


Additional info:
* linux-5.10.5.arch1-1
* nvidia-460.32.03

Steps to reproduce:
I didn't find out what pattern is exactly causing the problem, it just happens after few hours
This task depends upon

Closed by  Sven-Hendrik Haase (Svenstaro)
Wednesday, 17 February 2021, 23:45 GMT
Reason for closing:  Upstream
Comment by loqs (loqs) - Saturday, 09 January 2021, 11:55 GMT
@KbpG28 unfortunately there is absolutely nothing Arch can do to fix issues in the binary blob part of the nvidia kernel driver. There are similar upstream reports such as [1].
I can only suggest if you can not a thread on the nvidia linux sub forum you start a new one see [2] for what to include.

[1] https://forums.developer.nvidia.com/t/bug-report-455-23-04-kernel-panic-due-to-null-pointer-dereference/155506
[2] https://forums.developer.nvidia.com/t/if-you-have-a-problem-please-read-this-first/27131
Comment by Nelson Balza (KbpG28) - Saturday, 09 January 2021, 12:51 GMT
@logs thank you, I'll try asking there
Comment by Max Pray (synthead) - Monday, 18 January 2021, 05:01 GMT
I'm experiencing this, and I'm not using the nvidia kernel module. My symptoms are verbatim to the original issue. Video locks up, I can hear fans increase. Mouse and keyboard don't respond, even when attempting to go into a console (i.e. ctrl+alt+f1). Music that was playing in a browser continues to play. I haven't tested if the machine responds to network requests, but I also do see my disk activity flash. It appears to be a video-only issue.

lsusb describes my video card as "Intel Corporation HD Graphics P530 (rev 06)". I am using the typical i915 kernel module.
Comment by Max Pray (synthead) - Monday, 18 January 2021, 11:54 GMT
I downgraded to kernel 5.10.5.arch1-1, and my machine has been up for > 5 hours. I downgraded in steps, and tried 5.10.6.arch1-1 first, but it also crashed within an hour in the same fashion.

Very likely unrelated, but worth mentioning: sound was not being switched to my speakers when a cable was removed from my headphone jack. I had to manually switch it back to my speakers after unplugging them. After downgrading, this functionality started to work again.
Comment by CodingCellist (CodingCellist) - Monday, 18 January 2021, 17:59 GMT
Strange. I was having the same issue with the 5.9.16.a-1-hardened kernel, but 5.10.8.a-1-hardened seems to have solved it for me (around 6 hours of uptime now)...
Comment by Max Pray (synthead) - Tuesday, 19 January 2021, 23:47 GMT
Just hit the same issue with 5.10.5-arch1-1. I was on battery when it happened, and I'm running tp. Maybe this has something to do with it?

The kernel I used before these issues started to happen was 5.9.14.arch1-1. I'm going to downgrade to that and see if the problems so away. This will help isolate if it's an issue with the kernel or something else.
Comment by loqs (loqs) - Wednesday, 20 January 2021, 00:05 GMT
@synthead I suggest you try using the Arch support channels to locate the cause of your problem.
kernel NULL pointer dereference is a class of error not a specific issue. As you stated your system is not using the nvidia modules it can not be the same issue.
Comment by Nelson Balza (KbpG28) - Wednesday, 20 January 2021, 00:08 GMT
I should add that I noticed it happening always when I had matlab open for some time (sometime couple hours, sometime just 15~ minutes but it was just open in the background, I wasn't using it). I don't know if this is relevant. After I stopped using matlab (which is not a solution) and this behaviour stopped happening even with more than 10 hours of uptime. I don't know if java applications can cause the problem or matlab has some bug which triggers this. Since then I also upgraded my kernel so that might have also fixed it.
Comment by Eduardo Castillo (arch-newbye) - Thursday, 04 February 2021, 19:03 GMT
I have also had these problems, that the system freezes completely. Now something strange is happening to me, when I start compressing some files, suddenly the computer shuts down. It only happens when I compress files.
Comment by marcus philpott (pha-q-2) - Saturday, 06 February 2021, 01:59 GMT
Did you have Chromium open at the time, because for me this always happens when Chromium is open?... 4 times in a month. My latest:

Feb 06 00:09:40 krabbypatty kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Feb 06 00:09:40 krabbypatty kernel: CPU: 2 PID: 986 Comm: irq/39-nvidia Tainted>
Feb 06 00:09:40 krabbypatty kernel: Hardware name: MSI MS-7913/A88XI AC (MS-791>
Feb 06 00:09:40 krabbypatty kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
Feb 06 00:09:40 krabbypatty kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 0>
Feb 06 00:09:40 krabbypatty kernel: RSP: 0018:ffffaf5d80b7bc20 EFLAGS: 00010202
Feb 06 00:09:40 krabbypatty kernel: RAX: 0000000000000020 RBX: 0000000000000020>
Feb 06 00:09:40 krabbypatty kernel: RDX: ffff905497ab7c08 RSI: ffffffffffffffff>
Feb 06 00:09:40 krabbypatty kernel: RBP: ffff9054d995d9f0 R08: ffffffffc23d53e0>
Feb 06 00:09:40 krabbypatty kernel: R10: ffff9054d9968008 R11: ffff9054d9969098>
Feb 06 00:09:40 krabbypatty kernel: R13: 0000000000000000 R14: ffff9054d995db58>
Feb 06 00:09:40 krabbypatty kernel: FS: 0000000000000000(0000) GS:ffff9054f6b0>
Feb 06 00:09:40 krabbypatty kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080>
Feb 06 00:09:40 krabbypatty kernel: CR2: 0000000000000020 CR3: 00000001eab48000>
Feb 06 00:09:40 krabbypatty kernel: Call Trace:
Feb 06 00:09:40 krabbypatty kernel: ? _nv030766rm+0x1b/0x90 [nvidia]
Feb 06 00:09:40 krabbypatty kernel: ? _nv026432rm+0x18/0x60 [nvidia]
Feb 06 00:09:40 krabbypatty kernel: ? _nv012979rm+0x13d/0x1c0 [nvidia]
lines 1-23...skipping...
-- Journal begins at Sun 2015-09-27 21:51:11 BST, ends at Sat 2021-02-06 01:30:59 GMT. --
Feb 06 00:09:40 krabbypatty kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
.... etc ....etc
Comment by BR (bartoszer) - Sunday, 07 February 2021, 10:18 GMT
Hi, I'm hitting this bug each time trying to watch video using kodi (after a few mins of playback)

kernel 5.10.13-arch1-1
Comment by Sven-Hendrik Haase (Svenstaro) - Wednesday, 17 February 2021, 23:45 GMT
Sorry, this seems like a bug inside of the nvidia binary blob. I suggest reaching out to official nvidia support channels.

Loading...