FS#64725 - i915, linux: Resetting rcs0 for hang on rcs0
Attached to Project:
Arch Linux
Opened by Robert (fuero) - Wednesday, 04 December 2019, 09:51 GMT
Last edited by freswa (frederik) - Thursday, 30 April 2020, 11:38 GMT
Opened by Robert (fuero) - Wednesday, 04 December 2019, 09:51 GMT
Last edited by freswa (frederik) - Thursday, 30 April 2020, 11:38 GMT
|
Details
Description:
I'm experiencing hangs several times a day, producing this in dmesg: [ 1696.869719] i915 0000:00:02.0: GPU HANG: ecode 9:0:0x00000000, hang on rcs0 [ 1696.869736] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 1696.869736] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 1696.869736] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 1696.869737] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 1696.869737] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 1696.870744] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 Messing with i915.* settings didn't help Additional info: * Hardware: HP EliteDesk 800 G4 SFF (2US83AV) GPU as reported by lshw: *-display description: VGA compatible controller product: UHD Graphics 630 (Desktop) vendor: Intel Corporation physical id: 2 bus info: pci@0000:00:02.0 logical name: /dev/fb0 version: 00 width: 64 bits clock: 33MHz capabilities: pciexpress msi pm vga_controller bus_master cap_list rom fb configuration: depth=32 driver=i915 latency=0 mode=1920x1080 visual=truecolor xres=1920 yres=1080 resources: iomemory:400-3ff irq:140 memory:4000000000-4000ffffff memory:d0000000-dfffffff ioport:3000(size=64) memory:c0000-dffff * package version(s): linux-hardened-5.3.13.a-1, linux-firmware-20191022.2b016af-3 * config and/or log files etc.: # cat /proc/cmdline pti=on page_alloc.shuffle=1 BOOT_IMAGE=/vmlinuz-linux-hardened root=UUID=8697eb87-c34f-4f5f-bbfa-bb738086dbee rw quiet apparmor=1 security=apparmor audit=1 intel_iommu=igfx_off i915.modeset=1 i915.enable_rc6=1 i915.enable_fbc=1 i915.enable_guc_loading=1 i915.enable_guc_submission=1 i915.enable_huc=1 i915.enable_psr=1 i915.disable_power_well=0 i915.semaphores=1 see gpu-error.txt for the contents of /sys/class/drm/card0/error |
This task depends upon
Archive contains backport applied to 5.4.2 please test.
I'll wait for a proper backport to 5.4.
FS#64895"https://lore.kernel.org/stable/20191230111530.3750048-1-chris@chris-wilson.co.uk/"
I have installed the linux-mainline package from AUR to test this and it works well with version 5.5rc3 onwards.
Did not yet verify it's fixed, though.
As can be seen on the related tasks, this issue appears to be general, and is reported here:
https://gitlab.freedesktop.org/drm/intel/issues/673#login-pane
At this time, apparently it is fixed in kernel 5.5, but not yet backported to versions 5.4.XX.
The issue continued to be reported as occurring [4] [5] [6] [7] [8] [9] [10]
The first upstream response I can find to a post that the backport does not work is [11]
but the issue can not be reproduced on drm-tip. The commits that add offline error capture
[12] do not apply cleanly to 5.4.Y. Possibly building the kernel with DRM_I915_PREEMPT_TIMEOUT=0
to disable the forced preemption might provide a trace.
[1] https://lore.kernel.org/stable/20191230111530.3750048-1-chris%40chris-wilson.co.uk/
[2] https://git.archlinux.org/linux.git/log/?h=v5.4.7-arch1
[3] https://git.archlinux.org/linux.git/log/?h=v5.4.11-arch1
[4] https://gitlab.freedesktop.org/drm/intel/issues/673#note_373802
[5] https://gitlab.freedesktop.org/drm/intel/issues/673#note_374650
[6] https://gitlab.freedesktop.org/drm/intel/issues/673#note_378360
[7] https://gitlab.freedesktop.org/drm/intel/issues/673#note_381214
[8] https://gitlab.freedesktop.org/drm/intel/issues/673#note_381568
[9] https://gitlab.freedesktop.org/drm/intel/issues/673#note_381639
[10] https://gitlab.freedesktop.org/drm/intel/issues/673#note_382044
[11] https://gitlab.freedesktop.org/drm/intel/issues/1003#note_391081
[12] https://cgit.freedesktop.org/drm-tip/commit/?id=672c368f9398042b629740cc9816e8e939eff2db
[12] https://cgit.freedesktop.org/drm-tip/commit/?id=32ff621fd74496f0c33644125fb69ff175859b1f
[12] https://cgit.freedesktop.org/drm-tip/commit/?id=748317386afb235e11616098d2c7772e49776b58
Been using linux-zen 5.5 since testing, the issue does not occur anymore.
which will include https://cgit.freedesktop.org/drm-tip/commit/?id=5ba32c7be81e53ea8a27190b0f6be98e6c6779af
This is a full kernel, right? Will all my dkms modules work with that or rather not?
Also, what do I need to report if there is any issues?
Yes report if the issue is still present running that kernel. If it is follow https://gitlab.freedesktop.org/drm/intel/wikis/How-to-file-file-i915-bugs
Is there any way to get this log/errors in a proper way?
[le@y730]: ~>$ journalctl -k -b -1
Data from the specified boot (-1) is not available: No data available
To see if the i915 module is the cause module_blacklist=i915
Boot with i915 blacklisted fails quite early, on loading the initramfs. it just gets stuck there. (no addidtional log output)
I have a TB3 dock from lenovo connected to my computer and the external screen is connected via HDMI on that dock. In this configuration I encounter the lockups as well as the inability to boot the drm-tip kernel.
As soon as this is disconnected (or TB3 dock is connected w/o HDMI, the device boots.) Also Connecting HDMI directly to the computer works. (with or without TB3 dock additionality connected)
If I connect TB3 with HDMI attached after booting I have an instant freeze.
---> The only non-working configuration seems to be HDMI attached to TB3. This is also the configuration where I encountered random lockups before. (stable kernel)
I hope this helps. Let me know if I can provide more detailed information.
3.jpg (539.6 KiB)
https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-file-i915-bugs
Seems incorrect (see screenshot)
Is the correct link as I copy pasted it not sure how it got changed.
Edit:
Could be you encountered a more severe version of https://gitlab.freedesktop.org/drm/intel/issues/1141
Then, there are people creating tickets specifying they have the issue only in TB3 mode and they are being closed as a duplicate of "the big ticket"
My impression is that this is definitely not fixed and thus probably not a duplicate of '673'....
So what's the best thing to do now. I would like to help as well as getting my problems fixed, but also don't feel like wasting mine and anyones others time by creating another duplicate ticket....
Seems like it's time for an upstream Ticket.
X.Org X Server 1.20.6
[306118.822] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[306118.822] (II) modeset(0): using drv /dev/dri/card0
Feb 15 20:28:37 blep kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
Feb 15 20:28:37 blep kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Feb 15 20:28:37 blep kernel: Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Feb 15 20:28:37 blep kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Feb 15 20:28:37 blep kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Feb 15 20:28:37 blep kernel: GPU crash dump saved to /sys/class/drm/card0/error
Feb 15 20:28:37 blep kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 15 20:28:37 blep kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Feb 15 20:28:37 blep kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0
Feb 15 20:28:37 blep kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Feb 15 20:28:37 blep kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Feb 15 20:28:44 blep kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Edit/Added information (2020-02-16):
Updated kernel on 2020-02-12: linux-lts (4.19.101-1 -> 5.4.18-1). Before updating, I had not experienced this kind of lockup ever. After updating the kernel, this lockup happened on 2020-02-15. I was watching video at the time, and audio kept playing, but everything on the screen froze. Power button did nothing. VT-switching did nothing. Will adjusting volume with keyboard AND/OR Magic SysRq AND/OR blindly switching console&logging in&powering off with keyboard -- if this happens again.
DP-1 connected primary 2560x1440+0+0 (normal left inverted right x axis y axis) 725mm x 428mm
2560x1440 59.95*+ 74.97
HDMI-2 connected 1920x1080+2560+0 (normal left inverted right x axis y axis) 531mm x 299mm
1920x1080 60.00*+ 50.00 59.94
CPU: i5-8400
Motherboard: ROG STRIX Z370-G GAMING, BIOS Version: 2401
can you provide more information. external screen? docking station? etc?
Reading the latest comments in https://bugs.archlinux.org/task/65392 some i915 patches have been pushed into the 5.5.4.arch1-1 linux package which is available in "testing" today (2020-02-15), so you could try that particular version and see if it works for you.
Other online sources show that these i915 patches cannot as yet be incorporated into earlier kernel releases, which is affecting linux-lts which is currently at version 5.4.19-1 so this effectively appears to be making the linux-lts 5.4 kernel releases useless if you are using any Intel graphics.
An alternative to the testing linux package is rolling back to the 4.19 series which was linux-lts until fairly recently - see https://wiki.archlinux.org/index.php/Arch_Linux_Archive if you don't have a locally saved copy of the last 4.19.* version.
Cheers
Paul.
thanks for this quite accurate summary. This will help people juming in here at this point. Yes, the new LTS is very frustrating. However, the old LTS is quite old and brings other problems for me unfortunately.
I tested the drm-tip some days ago and it made things even worse. HDMI on my dock was 100% unusable.
Is there an 'easy' way to install the 5.5.4-arch1-1? The only package I can find is linux-pds, which is probably not what I want right?
Furthermore, I still see a big difference between using the internal HDMI port and the on on my TB3-dock and I am quite unsure how related that is and what the best way to contribute here is.
Cheers
Lukas
https://www.archlinux.org/packages/testing/x86_64/linux/
and select "Download From Mirror" from the RHS box.
Good luck!
@paulkerry: I installed "linux 5.5.4.arch1-1" and am running it now. Booted correctly and seems to run without problems. No errors in logs during boot. I'll report in a few days how things've been going. (Though I had the previous buggy kernel running for three days straight and only encountered the bug once, so..my input might be useless (delayed)).
some had been applied upstream but not all of them.
[1] https://github.com/zen-kernel/zen-kernel/commits/v5.5.4-zen1
[2] https://git.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/linux-lts&id=8903c370bc711fb61b65f6e3b870672fc32487f1
[3] https://git.archlinux.org/linux.git/log/?h=v5.4.15-arch1
5.4, cannot HDMI
5.5 cannot Flutter
https://github.com/flutter/flutter/issues/49185
Thanks for the Links btw :)
Linux hostfd 5.4.20-1-lts #1 SMP Sat, 15 Feb 2020 00:19:19 +0000 x86_64 GNU/Linux
greping CMDLINE at /etc/default/grub:
GRUB_CMDLINE_LINUX="cryptdevice=/dev/sda2:hostfdmain transparent_hugepage=never"
Using a Dell Latitude E5470
Now waiting for the proper fixes (whatever they are) to be backported to linux-lts. There's a lot going on with some recent bug(fixe)s at the i915 issues list [1].
[1] https://gitlab.freedesktop.org/drm/intel/issues?scope=all&sort=updated_desc&state=all&utf8=%E2%9C%93
Looking at the Arch Linux Archive, the last 4.19 build as linux-lts was linux-lts-4.19.101-2-x86_64.pkg.tar.zst
Cheers
Paul.
@paul cool, that's helpful. thanks for the notification