FS#57920 - [intel-ucode] New Intel microcode is causing PTI errors

Attached to Project: Arch Linux
Opened by Martin Rys (C0rn3j) - Thursday, 22 March 2018, 23:14 GMT
Last edited by Christian Hesse (eworm) - Tuesday, 11 December 2018, 09:37 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Christian Hesse (eworm)
Architecture All
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:
My device enters a state where booting a VM(Arch) via virt-manager just ends with a black screen and it throws an error in dmesg.

This has started happening since the new intel-ucode was pushed.

Additional info:
package: intel-ucode 20180312-1
dmesg: [ 0.000000] microcode: microcode updated early to revision 0x24, date = 2018-01-21
CPU: i7-4790K
dmesg excerpt as an attachment.

Steps to reproduce:
[wait/somehow trigger the issue]
Boot a KVM VM in virt-manager. It will hang after a moment. It hangs instantly when booting the hardened kernel.

This is hard to troubleshoot because I have no idea what triggers the issue (but it is persistent once it is triggered).
This task depends upon

Closed by  Christian Hesse (eworm)
Tuesday, 11 December 2018, 09:37 GMT
Reason for closing:  Works for me
Comment by loqs (loqs) - Friday, 23 March 2018, 11:33 GMT
Have you reported this upstream to the linux-mm or linux-kernel mailing lists to see if upstream confirms it appears to be a microcode issue rather than a kernel issue?
Comment by Martin Rys (C0rn3j) - Friday, 23 March 2018, 11:37 GMT
Nope, only reported it here.

Was hoping I'd figure a way to properly have it trigger and retest with the older microcode to confirm.
Comment by loqs (loqs) - Friday, 23 March 2018, 15:24 GMT
Does the hardened kernel always trigger it? If so https://bugs.archlinux.org/task/54700#comment159181 might provide some help in comparing the two kernels.
Comment by Martin Rys (C0rn3j) - Sunday, 25 March 2018, 16:31 GMT
Posting in case this is a kernel issue - happened again with the new ucode on 4.5.11 (previously on 4.5.10).
Did an -Syu and downgraded the ucode package(intel-ucode-20180108-1), so let's see if this triggers again with old ucode on 4.5.12.

Once the issue is happening, running the VM and booting the hardened kernel fails 100% of the time, yes.
Comment by Martin Rys (C0rn3j) - Wednesday, 28 March 2018, 21:47 GMT
[ 0.000000] microcode: microcode updated early to revision 0x23, date = 2017-11-20

I got the issue again on 4.5.12 (can't even attempt to boot up a W10 VM). So this is not due to the microcode update.

Looks like another person is having this issue but triggered by closing laptop lid.

https://bbs.archlinux.org/viewtopic.php?id=234537

How should I proceed with this?
Comment by loqs (loqs) - Friday, 30 March 2018, 21:52 GMT
Either wait for 4.16 to be released most likely this Sunday and test if the issue is still present or report the issue upstream now to the mailing lists I noted in my first response.
Comment by Martin Rys (C0rn3j) - Tuesday, 11 December 2018, 09:17 GMT
Issue either gone or way less present. Requested closure.

Loading...