FS#70110 - [linux] 5.11.8 from testing crashes Gnome after I launch any Qemu related software

Attached to Project: Arch Linux
Opened by Frederic Bezies (fredbezies) - Sunday, 21 March 2021, 15:27 GMT
Last edited by Andreas Radke (AndyRTR) - Wednesday, 24 March 2021, 12:57 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To No-one
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description: I notice this bug this morning after I updated linux kernel to version 5.11.8. When I launch Virt-Manager or Qemu, Gnome crashes and restart.

The only fix I found was to downgrade to linux 5.11.7 which works flawlessly. Could be related to bug #70091.

Additional info:

My computer is based on an AMD Ryzen3 2200G with IOMMU disabled.The only kvm related patch I found in linux 5.11.8 changelog - https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.11.8 - are these ones:

"commit fae013c419c17ab5a1dac7d95e97c594e20cca82
Author: Sean Christopherson <seanjc@google.com>
Date: Thu Feb 25 12:47:26 2021 -0800

KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is enabled

[ Upstream commit 44ac5958a6c1fd91ac8810fbb37194e377d78db5 ]

Check that PML is actually enabled before setting the mask to force a
SPTE to be write-protected. The bits used for the !AD_ENABLED case are
in the upper half of the SPTE. With 64-bit paging and EPT, these bits
are ignored, but with 32-bit PAE paging they are reserved. Setting them
for L2 SPTEs without checking PML breaks NPT on 32-bit KVM.

Fixes: 1f4e5fc83a42 ("KVM: x86: fix nested guest live migration with PML")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 2c23de8cbf13adbf5f5c3e4997b2c21ec571e49b
Author: Sean Christopherson <seanjc@google.com>
Date: Fri Feb 12 16:50:08 2021 -0800

KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect()

[ Upstream commit 2855f98265dc579bd2becb79ce0156d08e0df813 ]

Expand the comment about need to use write-protection for nested EPT
when PML is enabled to clarify that the tagging is a nop when PML is
_not_ enabled. Without the clarification, omitting the PML check looks
wrong at first^Wfifth glance."

I attached a lspci output to give you any hardware details.

Steps to reproduce:

Update to linux 5.11.8 using a computer with an AMD Ryzen CPU and IOMMU deactivated. Launch VirtManager and wait a little. Gnome will crashes.
This task depends upon

Closed by  Andreas Radke (AndyRTR)
Wednesday, 24 March 2021, 12:57 GMT
Reason for closing:  Duplicate
Additional comments about closing:   FS#70117 
Comment by loqs (loqs) - Sunday, 21 March 2021, 18:42 GMT
Please post the journal from a crash.
Comment by Frederic Bezies (fredbezies) - Sunday, 21 March 2021, 18:50 GMT
I cannot attach a crash directly, it is too big -> 27 Mb.

Journals are too big too (8 Mb each).

Here are some links to the crash and journal:

* Gnome shell core dump -> https://mega.nz/file/O3hWiDTK#_-FeUHbqnTNtUEieW9dZ9G5NhQ0txYrS8sBJYYKxui0
* System journal -> https://mega.nz/file/my5GTRQJ#MHCtBTN22SnuusseVfShhwhAnCQqtCh8baiW2BqaI28
* User journal -> https://mega.nz/file/2qwi1LZA#I9OogcsDDFmKBmdgTa-JgGnMJ9fHP5aNSGGRrIJqm7M

Comment by loqs (loqs) - Sunday, 21 March 2021, 19:06 GMT Comment by Frederic Bezies (fredbezies) - Sunday, 21 March 2021, 19:09 GMT
Still crashing. If you need another crash dump, tell me.
Comment by loqs (loqs) - Sunday, 21 March 2021, 19:27 GMT
Please post the kernel messages from system journal for a crashing boot e.g. if the last boot had a crash (output is piped to xz to compress then written to a file)
# journalctl -b -k | xz -9 - > journal.txt.xz
Comment by Frederic Bezies (fredbezies) - Sunday, 21 March 2021, 19:41 GMT
Journal you asked for. I noticed crashes happens right after I want to launch another software while qemu is running in VirtManager.

Used kernel:

$ uname -a
Linux fredo-arch-gnome 5.11.8-arch1-2 #1 SMP PREEMPT Sun, 21 Mar 2021 18:51:19 +0000 x86_64 GNU/Linux

Side effect: when I login into gnome after it crashed, all my extensions are disabled.

I hope this journal will help.
Comment by Frederic Bezies (fredbezies) - Sunday, 21 March 2021, 19:54 GMT
I wanted to see if my gnome extensions where guilty or not.

I had two other crashes:

And I got this in a new journal.txt.xz:

mars 21 20:37:49 fredo-arch-gnome kernel: traps: gnome-shell[1057] general protection fault ip:7f77fe3a548a sp:7ffff9c48b10 error:0 in libmozjs-78.so[7f77fe2c2000+b72000]

mars 21 20:46:35 fredo-arch-gnome kernel: traps: gnome-shell[6087] general protection fault ip:7f7c7401fb07 sp:7ffebfbc3cf0 error:0 in libgjs.so.0.0.0[7f7c73fe7000+84000]

With kernel 5.11.7, I have no problems with my Gnome Shell and its extensions. With kernel 5.11.8, it crashes a lot.

Adding another journal.
Comment by loqs (loqs) - Sunday, 21 March 2021, 19:55 GMT
Nothing from the kernel showing any issues.
From the system journal I am seeing
Mar 21 18:19:22 fredo-arch-gnome kernel: traps: gnome-shell[1018] general protection fault ip:7f24706b0b07 sp:7ffe4b2ae3a0 error:0 in libgjs.so.0.0.0[7f2470678000+84000]
Mar 21 18:19:23 fredo-arch-gnome libvirtd[479]: End of file while reading data: Erreur d'entrée/sortie

What is the backtrace from the coredump [1]

https://drive.google.com/file/d/1zERS0v-jT_N1Wu9pn5KCmXMTGEB-aIKh/view?usp=sharing reverts the two commits you noted in your first post.

[1] https://wiki.archlinux.org/index.php/Core_dump#Examining_a_core_dump
Comment by Frederic Bezies (fredbezies) - Sunday, 21 March 2021, 20:03 GMT
I will try your kernel, but I think I got the guilty package here.

Here is my pacman.log from this morning update, when I got linux 5.11.8 at first:

[2021-03-21T08:49:24+0100] [ALPM] upgraded gjs (2:1.66.2-1 -> 2:1.68.0-1)
[2021-03-21T08:49:24+0100] [ALPM] upgraded libldap (2.4.57-1 -> 2.4.58-1)
[2021-03-21T08:49:24+0100] [ALPM] upgraded gnome-calculator (3.38.2-1 -> 40.0-1)
[2021-03-21T08:49:24+0100] [ALPM] upgraded gnome-screenshot (3.38.0-1 -> 40.0-2)
[2021-03-21T08:49:24+0100] [ALPM] upgraded gnome-system-monitor (3.38.0-1 -> 40.0-1)
[2021-03-21T08:49:24+0100] [ALPM] upgraded imagemagick (7.0.11.3-1 -> 7.0.11.4-1)
[2021-03-21T08:49:24+0100] [ALPM] upgraded libvirt (1:7.0.0-3 -> 1:7.1.0-3)
[2021-03-21T08:49:24+0100] [ALPM] upgraded libvirt-glib (3.0.0-2 -> 4.0.0-1)
[2021-03-21T08:49:24+0100] [ALPM] upgraded libvirt-python (1:6.4.0-3 -> 1:7.1.0-1)
[2021-03-21T08:49:25+0100] [ALPM] upgraded linux (5.11.7.arch1-1 -> 5.11.8.arch1-1)
[2021-03-21T08:49:29+0100] [ALPM] upgraded linux-headers (5.11.7.arch1-1 -> 5.11.8.arch1-1)
[2021-03-21T08:49:29+0100] [ALPM] upgraded virtualbox-host-modules-arch (6.1.18-19 -> 6.1.18-20)
[2021-03-21T08:49:29+0100] [ALPM] upgraded yelp-xsl (3.38.3-1 -> 40.0-1)
[2021-03-21T08:49:29+0100] [ALPM] upgraded yelp-tools (3.38.0-1 -> 40.0-1)

Could it be a bug in gjs?

Anyway, I'll test your kernel asap.
Comment by Frederic Bezies (fredbezies) - Sunday, 21 March 2021, 20:06 GMT
Adding the info related to the coredump as coredump.log And the backtrace in coredump-bt.log.


Comment by loqs (loqs) - Sunday, 21 March 2021, 20:15 GMT
Could well be an issue in gjs. You downgraded only the linux package and the issue stopped? So some change in the kernel would seem to be involved.
Comment by Frederic Bezies (fredbezies) - Sunday, 21 March 2021, 20:20 GMT
With Linux fredo-arch-gnome 5.11.8-arch1-3 #1 SMP PREEMPT Sun, 21 Mar 2021 19:29:52 +0000 x86_64 GNU/Linux, still crashing.

Adding a new journal, and all infos you'll need: coredump and gdb output in two parts.

I noticed that gjs was upgraded this morning with the kernel. Seeing the crashes in libgjs and libmozjs makes me think it could be here. And I revert all the changes made this morning. So, it could be a bug in gjs.

My thought was oriented to the kernel, not another component until I looked at the output of journaldb.
Comment by loqs (loqs) - Sunday, 21 March 2021, 20:22 GMT
Revert everything from the last update apart from the kernel packages. Is the issue still present?
Edit:
 FS#70117 
Comment by Frederic Bezies (fredbezies) - Sunday, 21 March 2021, 20:31 GMT
I try to downgrade gjs and... it seems to be fixed. What a vicious bug!

Loading...