FS#69764 - [linux] Upgrade to 5.11 - desktop fails to wake from sleep
Attached to Project:
Arch Linux
Opened by James (thx1138) - Wednesday, 24 February 2021, 20:05 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Wednesday, 09 March 2022, 02:10 GMT
Opened by James (thx1138) - Wednesday, 24 February 2021, 20:05 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Wednesday, 09 March 2022, 02:10 GMT
|
Details
linux 5.11.1.arch1-1
After upgrade from 5.10 to 5.11, on an a laptop with Intel Core2 and ATI Mobility Radeon X1600, running the lxqt desktop, sleep seems to work normally, but on wake from sleep, the desktop is frozen. The mouse still works, caps lock still works, but the tray clock is frozen, and there is no response from application windows. Non-desktop processes still work normally. For instance, ssh recovers from sleep and wake. After sleep and wake, `ps wax` does show a seemingly large number of kworker processes remaining in state I, "Idle kernel thread", of the form `[kworker/u4:21-events_unbound]`. Reverting to the lts kernel, linux-lts 5.10.18-1, desktop processes work as expected after sleep and wake. Ideas? Suggestions? |
This task depends upon
Closed by Sven-Hendrik Haase (Svenstaro)
Wednesday, 09 March 2022, 02:10 GMT
Reason for closing: Fixed
Additional comments about closing: 2022-03-03: A task closure has been requested. Reason for request: All three separately mentioned and investigated issues got patched, merged and reported as fixed (by 5.11.12/5.12-rc4; then 5.12.4; then 5.13). Details/links in my last comment.
Wednesday, 09 March 2022, 02:10 GMT
Reason for closing: Fixed
Additional comments about closing: 2022-03-03: A task closure has been requested. Reason for request: All three separately mentioned and investigated issues got patched, merged and reported as fixed (by 5.11.12/5.12-rc4; then 5.12.4; then 5.13). Details/links in my last comment.
the screen never turns on after sleep.
doesn't respond to any input.
usb-connected smartphone seems to successfully establish tethering but if i select file sharing, then the phone stops to be even charged.
Another LXQt user with some GPU not Radeon did not see any problem with sleep and wake.
I see that the ryzen 4500u redmibook has the Radeon RX Vega 6 integrated GPU.
So, it seems that this may be a radeon driver issue, which would be consistent with the screen freezing, and most everything else still working.
I sent a note upstream for the radeon driver.
```
$ git bisect bad
0b8793f6e7fc097c112f1848aa7dab60b9ede5a7 is the first bad commit
commit 0b8793f6e7fc097c112f1848aa7dab60b9ede5a7
Author: Christian König <christian.koenig@amd.com>
Date: Mon Sep 21 13:18:02 2020 +0200
drm/radeon: switch over to the new pin interface
Stop using TTM_PL_FLAG_NO_EVICT.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Link: https://patchwork.freedesktop.org/patch/391610/?series=81973&rev=1
drivers/gpu/drm/radeon/radeon.h | 1 -
drivers/gpu/drm/radeon/radeon_display.c | 9 ++------
drivers/gpu/drm/radeon/radeon_object.c | 37 ++++++++-------------------------
drivers/gpu/drm/radeon/radeon_object.h | 2 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
5 files changed, 13 insertions(+), 38 deletions(-)
```
and the system log is showing:
```
kernel: WARNING: CPU: 1 PID: 799 at include/drm/ttm/ttm_bo_api.h:608 radeon_bo_unpin+0x47/0x60 [radeon]
...
kernel: CPU: 1 PID: 799 Comm: kworker/u4:17 Not tainted 5.9.0-rc5-1 #11
kernel: Hardware name: Hewlett-Packard /309F, BIOS 68YAF Ver. F.1D 07/11/2008
kernel: Workqueue: events_unbound async_run_entry_fn
kernel: RIP: 0010:radeon_bo_unpin+0x47/0x60 [radeon]
...
kernel: Call Trace:
kernel: radeon_gart_table_vram_unpin+0x47/0xa0 [radeon]
kernel: r520_resume+0x74/0xb0 [radeon]
kernel: radeon_resume_kms+0x5c/0x350 [radeon]
kernel: ? pci_pm_restore+0xe0/0xe0
kernel: dpm_run_callback+0x4f/0x180
kernel: device_resume+0xa7/0x200
kernel: async_resume+0x19/0x30
kernel: async_run_entry_fn+0x37/0x140
kernel: process_one_work+0x1da/0x3d0
kernel: worker_thread+0x4d/0x3d0
kernel: ? rescuer_thread+0x410/0x410
kernel: kthread+0x133/0x150
kernel: ? __kthread_bind_mask+0x60/0x60
kernel: ret_from_fork+0x22/0x30
kernel: ---[ end trace 8908b03655c5613e ]---
```
The commit is one of a series, 08/11, as you can see at the patchwork link. The amdgpu driver is addressed in 09/11. The amdgpu driver has similar functions, amdgpu_bo_unpin() and amdgpu_gart_table_vram_unpin(). I have only the radeon hardware to test. The patch set changes the functions radeon_bo_unpin() and amdgpu_bo_unpin() and changes their return type from `int` to `void`, but amdgpu_object.c still includes the comment:
```
* Returns:
* 0 for success or a negative error code on failure.
```
I will also note that resume from hibernate on 5.11.2 works as expected.
dmesg.log (6.1 KiB)
It landed in the kernel last week:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6c5403173a13a08ff61dbdafa4c0ed4a9dedbfe0
Just needs to go to 5.11 stable.
[1] https://lore.kernel.org/lkml/8c3da8bc-0bf3-496f-1fd6-4f65a07b2d13%40amd.com/
> Am 25.03.21 um 10:01 schrieb Greg KH:
> > On Thu, Mar 25, 2021 at 09:57:04AM +0100, Christian König wrote:
> >> This one here can be kept. It is unrelated to the warning caused by the
> >> other patch.
> > It causes a revert issue with the other patch, which is why I dropped
> > both of them.
>
> Ah, of course.
>
> > I'll gladly take this one, if someone wants to provide a working
> > backport
>
> Going to add that to my TODO list.
This version adds the commit message back. Upstream does not accept anonymous commits. Which is perfectly understandable.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/queue-5.11/drm-ttm-make-ttm_bo_unpin-more-defensive.patch?id=9f79ad6b2c278b69fed5ce22955904ebd428ea8e
The patch resolves the wake from sleep issue for the radeon driver and my ATI Mobility Radeon X1600, though there is still the "radeon_bo_unpin" warning, which is incidental and not catastrophic.
Is this update resolving the issue when using the amdgpu driver?
@zaxmyth - is there still an issue when using the Intel UHD Graphics 630? Or, is that a completely different issue?
If I understand, the source file patched, "include/drm/ttm/ttm_bo_api.h", is not just AMD/ATI specific.
@alyst, I don't know that it would affect the screen backlight, but then, I don't know that it would not. The Lenovo website says "Intel HD Graphics 520" for the Thinkpad T470s. You might check the 5.11.12 changelog for anything else that might be suspicious - https://lwn.net/Articles/851870/
@thx1138 is the warning present under 5.12-rc6?
I have not checked, but my impression has been that fixing buffer object pinning is something still on the "to-do" list for the radeon driver developers.
Someone having sleep-wake problems with the Intel GPU may need to do a bisect.
May 10 16:04:46 tempest kernel: pci 0000:00:00.2: can't derive routing for PCI INT A
May 10 16:04:46 tempest kernel: pci 0000:00:00.2: PCI INT A: no GSI
May 10 16:04:46 tempest kernel: nvme nvme0: 15/0/0 default/read/poll queues
May 10 16:04:46 tempest kernel: nvme nvme1: Shutdown timeout set to 8 seconds
May 10 16:04:46 tempest kernel: nvme nvme1: 12/0/0 default/read/poll queues
May 10 16:04:46 tempest kernel: amdgpu 0000:05:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
May 10 16:04:46 tempest kernel: amdgpu 0000:05:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
May 10 16:04:46 tempest kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
May 10 16:04:46 tempest kernel: amdgpu 0000:05:00.0: amdgpu: SMU is resuming...
May 10 16:04:46 tempest kernel: amdgpu 0000:05:00.0: amdgpu: dpm has been disabled
May 10 16:04:46 tempest kernel: amdgpu 0000:05:00.0: amdgpu: SMU is resumed successfully!
May 10 16:04:46 tempest kernel: amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
May 10 16:04:46 tempest kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
May 10 16:04:46 tempest kernel: amdgpu 0000:05:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
May 10 16:04:46 tempest kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0x1c0 returns -110
May 10 16:04:46 tempest kernel: amdgpu 0000:05:00.0: PM: failed to resume async: error -110
May 10 16:04:46 tempest kernel: acpi LNXPOWER:08: Turning OFF
May 10 16:04:46 tempest kernel: acpi LNXPOWER:07: Turning OFF
May 10 16:04:46 tempest kernel: acpi LNXPOWER:05: Turning OFF
I patched the bug and submitted it to the amd-gfx mailing list. I don't know though when it will be merged into mainline.
See Mailing List: https://lists.freedesktop.org/archives/amd-gfx/2021-March/060754.html
Alex Deucher's (one of the AMDGPU Maintainers) drm-next branch: https://gitlab.freedesktop.org/agd5f/linux/-/commit/7df4ceb60fa9a3c5160cfd5b696657291934a2c9
So backporting that might fix the issue
Also the screen brightness control seems to be restored.
Bye the way, as for the "old hardware", I am responsible for quite my share of fixes to regressions in the kernel running on that "old hardware". As long as other people are not having problems with their newer hardware, that's great. Software always "just works" - until it doesn't.
- Fix merged to Linux 5.11.12: https://lwn.net/Articles/851870/
- Fix merged to Linux 5.12-rc4: https://lwn.net/Articles/849985/
Next issue (Ryzen 5, amdgpu, linux-5.11.x) @firewalker: Patch "drm/amd/display: check fb of primary plane"
- Fix merged to Linux 5.12.4: https://lwn.net/Articles/856267/
New issue mentioned by @korikori (Ryzen 4500u, amdgpu, linux-5-12) (10 May 2021):
- Reported fixed by Linux 5.13 (7 Jul 2021): https://bbs.archlinux.org/viewtopic.php?id=266108
- Issue linked (fixed, closed): https://gitlab.freedesktop.org/drm/amd/-/issues/1230
All mentioned issues have been patched, merged and reported as fixed.