FS#61277 - [linux] Kernel panics after upgrade to 4.20

Attached to Project: Arch Linux
Opened by Dee (ddifof) - Saturday, 05 January 2019, 01:17 GMT
Last edited by Andreas Radke (AndyRTR) - Tuesday, 10 December 2019, 12:59 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Jan Alexander Steffens (heftig)
Architecture All
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
Upgraded from linux-4.19.9.arch1-1 to 4.20.arch1-1, seeing kernel traces in dmesg log, then kernel panics about a minute later. See attached text file for dmesg kernel traces.

I am not able to save the kernel panic logs to the system, but a picture I took showed the following at the end of the kernel panic:

Kernel panic - not syncing: corrupted stack end detected inside scheduler
Kernel Offset: 0x338000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-oxffffffffbfffffff)
---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---

Additional info:
The following packages were all upgraded together, with some including kernel modules (Virtualbox set for sure):

[ALPM] upgraded intel-tbb (2019-1 -> 2019.3-1)
[ALPM] upgraded libsecret (0.18.6-1 -> 0.18.7-1)
[ALPM] upgraded linux (4.19.12.arch1-1 -> 4.20.arch1-1)
[ALPM] upgraded virtualbox-host-modules-arch (5.2.22-14 -> 6.0.0-2)
[ALPM] upgraded virtualbox (5.2.22-3 -> 6.0.0-1)
[ALPM] upgraded virtualbox-guest-iso (5.2.22-1 -> 6.0.0-1)
[ALPM] upgraded vulkan-headers (1:1.1.92+111+114c354-1 -> 1:1.1.96+116+f54e45b-1)
[ALPM] upgraded vulkan-icd-loader (1.1.92+2999+abe5c2b3c-1 -> 1.1.96+3009+32d33e965-1)

Hardware:
Gigabyte Z97M-D3H (BIOS F2 04/24/2014)
Intel i5-4590
Corsair 2x4GB DDR3 RAM
AMD RX580 8GB

Steps to reproduce:
Unsure other than upgrading kernel to 4.20, this might be hardware config specific.
This task depends upon

Closed by  Andreas Radke (AndyRTR)
Tuesday, 10 December 2019, 12:59 GMT
Reason for closing:  Fixed
Comment by Dee (ddifof) - Saturday, 05 January 2019, 01:19 GMT
Sorry I just noticed I typoed the upgrade kernel in the first line. It was from linux-4.19.12.arch1-1 to 4.20.arch1-1, NOT linux-4.19.9.arch1-1
Comment by German Rios (germanlokura) - Saturday, 05 January 2019, 07:38 GMT
I have installed the kernel 4.20.0.1. I had running virtualbox, when I opened thunderbird, the entire screen was pixellated. I had the same problem with firefox.

Motherboard asus p7p55-m
nvidia gforce 630
4 gb ram.
Samsung monitor
Comment by loqs (loqs) - Sunday, 06 January 2019, 22:58 GMT
The error appears to be coming from new code introduced with 4.20.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=533059281ee594f9fbb9e58042aaec77083ef251
If you blacklist the modules provided by virtualbox-host-modules-arch does the issue continue?
Comment by Dee (ddifof) - Tuesday, 05 February 2019, 01:28 GMT
Sorry for the delay. I tested following packages

linux 4.20.6.arch1-1
virtualbox-host-modules-arch 5.2.22-3, 6.0.4-1
virtualbox-host-dkms 5.2.22-3, 6.0.4-1 (module package for LTS kernel but still builds for non-LTS)

In grub boot:
module_blacklist=vboxdrv,vboxnetadp,vboxnetflt,vboxpci,vhba

The system did not seem to kernel panic immediately but I saw again many errors in dmesg log, similar to the original attached kernel trace. It looked related to dhcpd and resolvconf, but I had an IP and was connected, so I do not think they are culprits.

I am using LTS for now but am worried when LTS goes to 4.20 that I will have same issues. I think it is something in my hardware config. Do I need to upgrade Intel microcode maybe?
Comment by loqs (loqs) - Tuesday, 05 February 2019, 23:46 GMT
ddifof could you try linux 5.0-rc5 to see if the issue has already been resolved upstreasm
https://wiki.archlinux.org/index.php/Unofficial_user_repositories#miffe contains linux-mainline alternatively you can build the linux-mainline or linux-git packages in AUR.
Comment by Dee (ddifof) - Friday, 01 March 2019, 03:35 GMT
Sorry for the delay again, I have now tested linux-mainline 5.0rc6-1 and 5.0rc7-1, both still giving me kernel traces, both showing dhcpd and resolvconf as "tainted". I do not wait to see if it kernel panics however (to avoid data corruption), but those errors and the kernel traces always eventually lead to a panic.

I couldn't build linux-git for some reason. I will also test 5.0rc8-1 from AUR soon but I doubt it will change anything.

So I am not sure what to do now. normal linux kernel is also still doing the same thing (last tested was 4.20.12.arch1-1). Should I submit bug to kernel bug tracker?
Comment by loqs (loqs) - Friday, 01 March 2019, 09:21 GMT
Yes if you can get an untainted backtrace https://www.kernel.org/doc/html/v4.15/admin-guide/tainted-kernels.html
Possibly file it under Tracing / FTrace as kprobe_trace_func is the last function call in the backtrace but I could be completely wrong on that.
Comment by Dee (ddifof) - Saturday, 18 May 2019, 23:53 GMT
Sorry for the necrobump but it looks like this problem was resolved by upgrading to kernel 5.x.
Is it still worth submitting a kernel bug report for the older 4.20 kernel? Or since it's resolved by upgrading, just leave it?

Loading...