FS#67860 - [linux] general protection faults after update to linux-5.8.7.arch1-1

Attached to Project: Arch Linux
Opened by eomanis (eomanis) - Wednesday, 09 September 2020, 22:45 GMT
Last edited by freswa (frederik) - Sunday, 13 September 2020, 14:51 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
Levente Polyak (anthraxx)
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

Updating from linux-5.8.5.arch1-1 to linux-5.8.7.arch1-1 breaks my system due to multiple general protection faults.

Seems like the amdgpu kernel module might be involved; this system uses an AMD Radeon RX 480.

The update also updated the virtualbox* packages from 6.1.12-4 to 6.1.14-1, however uninstalling all things virtualbox and disabling loading of the vboxdrv module did not help.
I am attaching a kernel log where virtualbox was already uninstalled.

As a workaround I reverted back to linux-5.8.5.arch1-1.

Steps to reproduce:

Possibly, booting a system that uses an AMD Radeon RX 480 with that kernel package version.
The system runs an AMD Ryzen 5 3600 on an AMD X570 chipset.
This task depends upon

Closed by  freswa (frederik)
Sunday, 13 September 2020, 14:51 GMT
Reason for closing:  Upstream
Additional comments about closing:  https://bugzilla.kernel.org/show_bug.cgi ?id=209239
Comment by loqs (loqs) - Wednesday, 09 September 2020, 23:24 GMT
There were no amdgpu changes in 5.8.7 [1]. Please test 5.8.8.arch1-1 currently in testing. If the issue is still present try 5.8.6.arch1-1.
Then bisect to find the causal commit and report the issue upstream if it has not already been reported upstream.

[1] https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.8.7.
Comment by eomanis (eomanis) - Thursday, 10 September 2020, 21:31 GMT
For starters here are the package versions and their state.

* Good: linux-5.8.5.arch1-1
* Good: linux-5.8.6.arch1-1
* Bad: linux-5.8.7.arch1-1
* Bad: linux-5.8.8.arch1-1

Unfortunately could not get a trace of linux-5.8.8.arch1-1, seems like journald did not get the chance to write everything down and the kernel log is truncated.
Comment by Jan Alexander Steffens (heftig) - Thursday, 10 September 2020, 22:04 GMT
You can try booting with slub_debug=FZP . Maybe it will catch some corruption.
Comment by eomanis (eomanis) - Thursday, 10 September 2020, 23:51 GMT
removed (double-post due to browser refresh)
Comment by eomanis (eomanis) - Friday, 11 September 2020, 00:44 GMT
I tried bisecting from 5.8.6-arch1 to 5.8.7-arch1 starting from the repo package [1] as described in the wiki [2].
But, this being my first shot at bisecting anything, I blew it.

On the first attempt it built a 5.8.6 kernel exhibiting the bug -- promising so far.
Unfortunately upon running "git bisect bad" after testing it, I got:

f82a57f3fbc7dc23254d5ac2ab6f62227a8a288e was both good and bad

Here is the output of "git bisect log":

git bisect start
# good: [f82a57f3fbc7dc23254d5ac2ab6f62227a8a288e] Arch Linux kernel v5.8.6-arch1
git bisect good f82a57f3fbc7dc23254d5ac2ab6f62227a8a288e
# bad: [13c2d5cc731142ffd8f22aec0e644dcd99b78940] Arch Linux kernel v5.8.7-arch1
git bisect bad 13c2d5cc731142ffd8f22aec0e644dcd99b78940
# bad: [f82a57f3fbc7dc23254d5ac2ab6f62227a8a288e] Arch Linux kernel v5.8.6-arch1
git bisect bad f82a57f3fbc7dc23254d5ac2ab6f62227a8a288e

I must be missing out on something fundamental here. Shouldn't it have started at a commit smack in the middle between 5.8.6 and 5.8.7?
Also, how come I got a bad kernel, when it was built from the exact same commit of 5.8.6-arch1? I am running the good linux-5.8.6-arch1-1 from the Arch repos as I am typing this.

Edit: I did not use "make clean" in between builds; the first build I did was a 5.8.8-arch1 just to see if the PKGBUILD works.
This might at least explain the bad 5.8.6.

[1] https://www.archlinux.org/packages/core/x86_64/linux/
[2] https://wiki.archlinux.org/index.php/Bisecting_bugs_with_Git
Comment by loqs (loqs) - Friday, 11 September 2020, 01:41 GMT
Would have expected to see the following (assuming sphinx-workaround.patch was not applied or reset and docs were not built)
git bisect start
git bisect good v5.8.6-arch1
git bisect bad v5.8.7-arch1
Bisecting: a merge base must be tested
[66534fe2b9400003b0f49cc94686a162132b64e7] Linux 5.8.6
Comment by eomanis (eomanis) - Friday, 11 September 2020, 11:10 GMT
I applied Common Sense™ to the problem with some success, I think.

Since nobody else seems to have these errors, chances are they are caused by some particularity of my setup.
There were some HID core related fixes between 5.8.6 and 5.8.7, so I figured I should boot up with some of my USB stuff unplugged.

I have two USB devices that might not be all that common:

* Sony PS3 Dualshock 3 gamepad
* Apple Magic Trackpad 2

It seems if I boot up without the trackpad, then I don't get the kernel errors, neither with 5.8.7 nor with 5.8.8. Running linux-5.8.8-arch1-1 now.
Works for me, I don't use the trackpad much anyway, I just wanted to try it as an addition to the mouse.

Guess I'll take this one upstream [1].

[1] https://bugzilla.kernel.org/show_bug.cgi?id=209239

Loading...