FS#61733 - [linux]Xorg freezes on mesa packages up to 18.2.5-1.

Attached to Project: Arch Linux
Opened by Chris (h8h) - Tuesday, 12 February 2019, 18:11 GMT
Last edited by Andreas Radke (AndyRTR) - Tuesday, 01 March 2022, 21:35 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Jan Alexander Steffens (heftig)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
Xorg freezes on mesa packages up to 18.2.5-1.

Additional info:
* package version(s):
```pacman -Qs mesa
local/glu 9.0.0-5
Mesa OpenGL Utility library
local/lib32-glu 9.0.0-4
Mesa OpenGL utility library (32 bits)
local/lib32-mesa 18.3.3-1
An open-source implementation of the OpenGL specification (32-bit)
local/libva-mesa-driver 18.3.3-2
VA-API implementation for gallium
local/mesa 18.2.5-1
An open-source implementation of the OpenGL specification
local/mesa-demos 8.4.0-1
Mesa demos and tools incl. glxinfo + glxgears
```
* system specs
```
lspci | grep VGA
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev c3)
```
```
dmesg | grep VGA
[ 0.974177] pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 1.291532] fb0: EFI VGA frame buffer device
[ 1.291909] ACPI: Video Device [VGA] (multi-head: yes rom: no post: no)
[ 16.625767] fb0: switching to amdgpudrmfb from EFI VGA
```
```Linux version 4.20.7-arch1-1-ARCH (builduser@heftig-13691) (gcc version 8.2.1 20181127 (GCC)) #1 SMP PREEMPT Wed Feb 6 18:42:40 UTC 2019
Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=12bf63f3-878c-48d5-9840-e245b1e456b6 rw acpi_backlight=vendor acpi_osi=Linux amdgpu.dc=1 amd_iommu=pt ivrs_ioapic[32]=00:14.0 cry>

* config and/or log files etc.
```
Feb 12 18:31:12 lee485 kernel: amdgpu 0000:05:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:1 pasid:32768, for process Xorg pid 1242 thread Xorg:cs0 pid 1244)
Feb 12 18:31:12 lee485 kernel: amdgpu 0000:05:00.0: in page starting at address 0x0000800100020000 from 18
Feb 12 18:31:12 lee485 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010013C
Feb 12 18:31:12 lee485 kernel: amdgpu 0000:05:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:1 pasid:32768, for process Xorg pid 1242 thread Xorg:cs0 pid 1244)
Feb 12 18:31:12 lee485 kernel: amdgpu 0000:05:00.0: in page starting at address 0x0000800100020000 from 18
Feb 12 18:31:12 lee485 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010013C
Feb 12 18:31:12 lee485 kernel: amdgpu 0000:05:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:1 pasid:32768, for process Xorg pid 1242 thread Xorg:cs0 pid 1244)
Feb 12 18:31:12 lee485 kernel: amdgpu 0000:05:00.0: in page starting at address 0x0000800100020000 from 18
Feb 12 18:31:12 lee485 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010013C
Feb 12 18:31:12 lee485 kernel: mce: [Hardware Error]: Machine check events logged
Feb 12 18:31:12 lee485 kernel: [Hardware Error]: Deferred error, no action required.
Feb 12 18:31:12 lee485 kernel: [Hardware Error]: CPU:0 (17:11:0) MC20_STATUS[-|-|MiscV|-|AddrV
```

* system: lenovo e485

Steps to reproduce:
1. Upgrade to mesa 18.3.3-2
2. Start Xorg (startx / xinit)
3. Xorg freezes, does not even Log. Cold boot System
4. Downgrade to mesa 18.2.5-1
5. Start Xorg (startx / xinit)
6. Xorg works as expected.

If you need further data, don't hesitate to ask me.

Thank you.
This task depends upon

Closed by  Andreas Radke (AndyRTR)
Tuesday, 01 March 2022, 21:35 GMT
Reason for closing:  No response
Comment by loqs (loqs) - Tuesday, 12 February 2019, 18:20 GMT
The dmesg output is similar to https://bugzilla.kernel.org/show_bug.cgi?id=201727 is the issue present if you test linux 5.0-rc6 or boot with the option amd_iommu=off ?
Comment by Chris (h8h) - Tuesday, 12 February 2019, 19:18 GMT
Without the amd_iommu my computer doesn't boot anyway.

But using "iommu=soft" will do the trick.

I missed that update:
https://wiki.archlinux.org/index.php?title=Laptop%2FLenovo&type=revision&diff=564607&oldid=564605

I think we can close this, but is using iommu=soft the prefered way to run a system?
Comment by loqs (loqs) - Tuesday, 12 February 2019, 20:15 GMT Comment by Chris (h8h) - Wednesday, 13 February 2019, 19:25 GMT
I will try but I could'n get this thing wrapped up.

I've downloaded https://git.archlinux.org/linux.git/snapshot/linux-4.20.7-arch1.tar.gz, compare to my current kernel "4.20.7-arch1-1-ARCH".

Then I dowloaded this patch https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/?id=1c1eba86339c8517814863bc7dd21e2661a84e77 you mentioned and patch the module.

I've got insmod: ERROR: could not insert module drivers/gpu/drm/amd/amdgpu/amdgpu.ko: Invalid module format

modinfo:
filename: /home/h8h/kernelbuild/linux-4.20.7-arch1/drivers/gpu/drm/amd/amdgpu/amdgpu.ko
license: GPL and additional rights
description: AMD GPU
author: AMD linux driver team
vermagic: 4.20.7-arch1 SMP preempt mod_unload

I guess the vermagic is wrong but EXTRAVERSION is already set in Makefile to "EXTRAVERSION = -arch1"
Comment by loqs (loqs) - Wednesday, 13 February 2019, 19:42 GMT
Thank you for trying to test the patch.
Did you build the whole kernel with the patch applied or just the amdgpu module?
Comment by Chris (h8h) - Wednesday, 13 February 2019, 20:16 GMT
Just the amdgpu module
make M=drivers/gpu/drm/amd/amdgpu
Comment by loqs (loqs) - Wednesday, 13 February 2019, 20:38 GMT
Can you load the module with modprobe --force-vermagic ?
Comment by loqs (loqs) - Wednesday, 13 February 2019, 21:44 GMT
make EXTRAVERSION=-arch1 prepare
make M=drivers/gpu/drm/amd/amdgpu
The above should set EXTRAVERSION back to "-arch1"
Comment by loqs (loqs) - Monday, 18 February 2019, 21:59 GMT Comment by Chris (h8h) - Wednesday, 20 February 2019, 18:00 GMT
Okay.

Finally I've got the module to fly.

EXTRAVERSION=-arch1-1-ARCH

did the trick.

The patchhttps://bugzilla.kernel.org/show_bug.cgi?id=201727 work for me, without the kernel flag "iommu=soft"

Neat.
Comment by Chris (h8h) - Wednesday, 20 February 2019, 18:08 GMT
I think it is worth to mention that i've updated my kernel to 4.20.10-arch1-1-ARCH. But this does not solve the problem. I had to build the module including the patch and then it works.
Comment by Andreas Radke (AndyRTR) - Tuesday, 10 December 2019, 16:22 GMT
Is this still an issue?

Loading...