FS#72134 - linux 5.14.* gpu bug with i915 DRM

Attached to Project: Arch Linux
Opened by Nicolas Joyard (njoyard) - Tuesday, 14 September 2021, 16:07 GMT
Last edited by Toolybird (Toolybird) - Tuesday, 06 June 2023, 04:01 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Jan Alexander Steffens (heftig)
Levente Polyak (anthraxx)
Architecture x86_64
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 4
Private No

Details

Bug is present both in linux-5.14.2arch1-2 and linux-5.14.3arch1-1.
Fixed by reverting to linux-5.13.13.arch1-1

Hardware: Dell XPS15 2-in-1 9575 with Intel UHD 630 iGPU and AMD Vega M dGPU

> 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 630 (rev 04)
> 01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Polaris 22 XL [Radeon RX Vega M GL] (rev c0)

Symptoms: on first reboot after upgrade to 5.14.2, after a few (normal) boot messages display flickers with graphical glitches all over the place (mostly blank screen, blueish lines sweeping from the top of the screen).

Ruled out plymouth, gdm and sway: same issue with those removed and booting to a tty.
Ruled out a hardware failure: Ubuntu live USB boots fine. Both GPUs are usable.

Boot to tty is possible when using i915.modeset=0 (or nomodeset) but then no DRM is available and I cannot run sway.

The only relevant thing I noticed is the following kernel message (not present with 5.13):

> i915 0000:00:02.0: [drm] *ERROR* CPU pipe A FIFO underrun

Attaching the output of lspci -vvnn and a full journalctl dump for the kernel in a failing boot.

Not sure which other details are relevant, feel free to ask for more.
This task depends upon

Closed by  Toolybird (Toolybird)
Tuesday, 06 June 2023, 04:01 GMT
Reason for closing:  Fixed
Additional comments about closing:  See comments
Comment by Lee (caramilk) - Sunday, 19 September 2021, 12:51 GMT
I had a very similar problem with Intel UHD 630 as well, but for me it occurs earlier since kernel 5.12.0, 5.11.x are fine. Just recently I found a workaround by adding i915.fastboot=0 in kernel parameters, that garbled display will occur for a few seconds and then cleared up and it's fine showing kernel booting messages, wayland and X11 both runs fine after booting.
Comment by Nicolas Joyard (njoyard) - Monday, 20 September 2021, 14:16 GMT
Issue is still present with 5.14.6.

@Lee's workaround (adding i915.fastboot=0) fixes the problem.
Comment by Bruno Pagani (ArchangeGabriel) - Thursday, 30 September 2021, 18:25 GMT
Ah, so I’m not the only one with this issue. @Nicolas I have a precision 5530 2-in-1 which is almost the same laptop (but I use neither plymouth, gdm nor sway in case you were still in doubt).

Until now I had stayed on 5.12.x because of https://bugzilla.kernel.org/show_bug.cgi?id=213823 (don’t/didn’t you had this issue too?), but I wanted to check whether it might have been fixed at some point. Instead, I ended up with a non-booting 5.14 kernel with exactly the same symptoms as you describe. I’ve moved to LTS for now on since I lack time to investigate this.

However this is not the right place to report this issue, it should be at the linked above bug tracker. I’ll try to do this in ~2 weeks if you don’t do it before.
Comment by s (soshial) - Monday, 11 October 2021, 08:57 GMT
Thank you so much for posting the temporary solution: it worked. I also have XPS 9575 2-in-1 laptop and Manjaro Linux and I also expereinced the problem starting with kernel v5.14. I am also surprised that I am not the only one still using this Dell model on Linux. Honestly, I was so fed up with Windows' faulty and bloatware behaviour, that Linux gave this computer a second chance. I would appriceate if you submit a bug about this driver regression.
Comment by Nicolas Joyard (njoyard) - Monday, 11 October 2021, 09:09 GMT
@Bruno I didn't check, but that would explain some loss of battery autonomy that I had not debugged yet. I'll try to find out.

I wanted to do a git bisect on the kernel but this takes a lot of time, so I haven't got to it yet. Plus I'm less motivated now because my laptop is getting replaced with a 9500 soon (due to keyboard keys breaking every 6 months on the 9575). Maybe I'll find the motivation back and make a nice bug report.
Comment by s (soshial) - Monday, 11 October 2021, 09:18 GMT
@njoyard thank you so much in advance. BTW, I also have problems with keyboard buttons being stuck, but each time I manage to resolve this easily with a special compressed air bottle I bought for around €4.
Comment by Bruno Pagani (ArchangeGabriel) - Monday, 18 October 2021, 09:41 GMT
So after some update (but apparently not the kernel, because downgrading didn’t work), I’m again stuck with the AMD GPU constantly on (you can check that with `cat /sys/bus/pci/devices/0000:01:00.0/power_state` —assuming the GPU is at PCI address 0000:01:00.0). I can work around this with `sudo tee /sys/bus/pci/devices/0000:01:00.0/remove <<< 1`, but I’d like to investigate this further. Do you also have this issue or not?

P.S.: On my side, no issue with the keyboard whatsoever.
Comment by s (soshial) - Tuesday, 19 October 2021, 18:15 GMT
@bruno_pagani. Both cards return D0.
Comment by Bruno Pagani (ArchangeGabriel) - Tuesday, 19 October 2021, 18:21 GMT
So you do have the same issue. Just so we can rule out maybe other components, what DE/WM/display server do you use?
Comment by s (soshial) - Tuesday, 19 October 2021, 19:11 GMT
Can we switch to another bug report, please? Because, I feel that we are spamming other people here with not a directly relevant problem.
Comment by Bruno Pagani (ArchangeGabriel) - Tuesday, 19 October 2021, 19:54 GMT
Well given the issue is a kernel issue specific to this laptop, it does not look like unrelated to me. But feel free to reach me over email. ;)
Comment by s (soshial) - Thursday, 18 November 2021, 05:55 GMT
Hey guys. Has anyone a chance to look at the bug and dissect it?
Comment by Nicolas Joyard (njoyard) - Thursday, 18 November 2021, 11:41 GMT
I haven't and I won't. Unfortunately I no longer have a laptop where the issue is present.
Comment by heinrich5991 (heinrich5991) - Monday, 18 April 2022, 17:26 GMT
I think this is the upstream bug report: https://gitlab.freedesktop.org/drm/intel/-/issues/4952.
Comment by Bruno Pagani (ArchangeGabriel) - Sunday, 02 October 2022, 17:42 GMT
Apparently fixed in 5.19 according to this link. Will try on next boot.

Loading...