FS#61950 - Freeze after suspend/resume with kernel 5.0

Attached to Project: Arch Linux
Opened by Simone (tigerjack) - Friday, 08 March 2019, 10:55 GMT
Last edited by Antonio Rojas (arojas) - Sunday, 08 September 2019, 09:19 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To No-one
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description:
I am not totally sure that the kernel update is the culprit here, but it seems the major candidate. Since the update of a few days ago to 5.0.0-arch1-1-ARCH, every time I suspend the laptop, the screen is completely frozen on resume and I have to do an hard shutdown.

Steps to reproduce:
Two different ways.
A)
* start xserver with startx
* invoke systemctl suspend
* resume -> screen freezes

B)
* From tty invoke systemctl suspend
* resume
* invoke startx -> screen freezes

What I have tried:
1) Disabling systemd services, namely NetworkManager, wpa_supplicant, acpid, laptop-mode-tools, cpu-power, thermald
2) Completely removed laptop-mode-tools package
3) Deleted user X-related files such as .Xresources, .xinitrc, .xprofile and started with a fresh X environment
4) Use root user to observe if the behavior is user-config dependant and tried method B above (log attached)
5) Updated drivers for AMD gpu to latest xf86-video-amdgpu 19.0.0-1
6) Completely removed battery and drained capacitors

The problem still remains
This task depends upon

Closed by  Antonio Rojas (arojas)
Sunday, 08 September 2019, 09:19 GMT
Reason for closing:  Fixed
Comment by Simone (tigerjack) - Thursday, 14 March 2019, 10:15 GMT
Still present in 5.0.1-arch1-1-ARCH
Comment by Simone (tigerjack) - Friday, 22 March 2019, 08:16 GMT
Ok, the situation did not change with both 5.0.2 and 5.0.3, but today I found other messages in the log related to pam failing to release the session and drm signalling a hung up. The whole log is attached.
Comment by Vladimir (Vlad1m1r) - Tuesday, 30 April 2019, 14:55 GMT
I have the same issue with my HP laptop. Today i'm tryed 5.0.10 kernel, and problem is still persist.
Comment by Vladimir (Vlad1m1r) - Wednesday, 15 May 2019, 10:12 GMT
Still persist in 5.0.13 kernel
Comment by Simone (tigerjack) - Thursday, 16 May 2019, 16:03 GMT
Well, 5.1.2 is also affected.
Comment by Brian Fox (foxbrian) - Wednesday, 05 June 2019, 13:10 GMT
I'm also affected on a Dell Inspiron 3180 with AMD A6 and R4 graphics
Comment by Brian Fox (foxbrian) - Wednesday, 05 June 2019, 15:39 GMT
accidental duplicate comment
Comment by Vladimir (Vlad1m1r) - Wednesday, 10 July 2019, 15:49 GMT
5.1.16 kernel is also affected
Comment by irvin hernandez (irum-virus) - Wednesday, 10 July 2019, 21:36 GMT
I have the same issue and it appears that the problem is still available in kernel 5.2.
Comment by loqs (loqs) - Wednesday, 10 July 2019, 23:42 GMT
Is everyone affected seeing the following after suspend?

[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2, emitted seq=3
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 647 thread Xorg:cs0 pid 648
[drm] IP block:gfx_v8_0 is hung!
[drm] GPU recovery disabled.

Has anyone affected tried bisecting between 4.20 and 5.0 to find the causal commit?
Comment by Simone (tigerjack) - Thursday, 11 July 2019, 07:34 GMT
@loqs positive, each of this 4 message appears every 10 second in the log after the resume.
By the way, I've attached the latest log related to the suspend/resume cycle: 09.22.xx are the messages appearing after `systemctl suspend`, 09.23.xx those after the resume.
   a.log (24.2 KiB)
Comment by loqs (loqs) - Thursday, 11 July 2019, 11:23 GMT
Attached step by step instructions for bisecting the kernel to try and identify the causal commit.
Comment by Simone (tigerjack) - Sunday, 14 July 2019, 08:52 GMT
@loqs could you clarify what we are supposed to do?
Comment by loqs (loqs) - Sunday, 14 July 2019, 18:34 GMT
[1] explains git bisection which the instructions implement. How far did you get with the attached instructions?

[1] https://wiki.archlinux.org/index.php/Bisecting_bugs_with_Git
Comment by Vladimir (Vlad1m1r) - Sunday, 11 August 2019, 19:20 GMT
I made a bisect in branch linux-5.0.y between the tags v5.0-rc1 and 4.20. I found that there are two bugs with the screen after resuming from suspend. The first bug is the one that is present in the kernel v.5 now and earlier to version 5.0-rc1. There is a bug with amdgpu_job_timedout in the log. The second is the black screen after resuming from suspend and the absence of the amdgpu_job_timedout error in the log.
The second bug first appears in commit [262485a50fd4532a8d71165190adc7a0a19bcc9e] drm/amd/display: Expand dc to use 16.16 bit backlight. Log - blackscreen.log; bisect log - bisect-blackscreen.log
The first bug with amdgpu_job_timedout first appears in the commit [106c7d6148e5aadd394e6701f7e498df49b869d1] drm/amdgpu: abstract the function of enter/exit safe mode for RLC. Log - amdgpu_error.log
During the bisect searching for the first error, I went through the following stages sequentially: good (resuming from suspend was successful), error 2, good, error 2, error 1.
Comment by loqs (loqs) - Tuesday, 13 August 2019, 00:16 GMT
Thank you for taking the time to perform the bisection Vlad1m1r could you please post your findings as a new bug on https://bugs.freedesktop.org Product DRI Component DRM/AMDgpu
Comment by Vladimir (Vlad1m1r) - Wednesday, 14 August 2019, 09:43 GMT
I made bugreport on freedesktop.org - https://bugs.freedesktop.org/show_bug.cgi?id=111399
Comment by Vladimir (Vlad1m1r) - Wednesday, 14 August 2019, 10:32 GMT
My bug report is a duplicate. A solution was found here: https://bugs.freedesktop.org/show_bug.cgi?id=110258

Fix will be in kernel 5.3
Comment by loqs (loqs) - Wednesday, 14 August 2019, 15:33 GMT
You could cherry-pick 72cda9bb5e219aea0f2f62f56ae05198c59022a7 onto 5.2.8.
If that fixes the issue arch could pick up the patch until 5.3 as it is not marked for stable.
Comment by Vladimir (Vlad1m1r) - Friday, 16 August 2019, 12:56 GMT
I'm cherry-pick the commit 72cda9bb5e219aea0f2f62f56ae05198c59022a7 onto 5.2.8 and that fixed the issue. it will be good if this patch is included in the next version of kernel in arch.
Comment by loqs (loqs) - Friday, 16 August 2019, 17:11 GMT
If you can not find heftig on IRC I would suggest emailing him asking if he will will cherry-pick the commit for the 5.2 series as it is not marked for stable.
Comment by Vladimir (Vlad1m1r) - Saturday, 07 September 2019, 20:06 GMT
I'm updated to 5.2.11 kernel and all is working well now. This request may be closed.

Loading...