FS#77879 - AMDGPU causing erratic shutdowns

Attached to Project: Arch Linux
Opened by Jonas Jefe (jonaslorincz) - Friday, 17 March 2023, 01:36 GMT
Last edited by Toolybird (Toolybird) - Monday, 17 April 2023, 22:10 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To No-one
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description:

This happens and is caused by the kernel.

In the amdgpu driver, the kernel will shut down the system due to a Critical Thermal Fault (CTF). This happens when the junction temperature reported by the GPU exceeds 105C. However, this is problematic on certain hardware.

Specifically, for ASUS G513QY, the GPU will automatically thermal throttle in accordance with the rate of cooling and current temperature. It is designed to maintain a stable 100C temperature under heavy load. However, this does not make the temperature completely consistent; it will fluctuate, and sometimes it will cause a range of temperature changes to occur between 95C-105C. This is expected and is within the range of normal operation.

Additional info:

kernel 6.2.6

This is a workaround for this issue: https://hst.sh/ufahikadap.patch

It will remove the code causing the shutdown and instead use the GPU's automatic thermal management.

Steps to reproduce:

Use a ASUS G513QY laptop (or another device that has similar characteristics) and operate the GPU on heavy load (i.e. running a game).
This task depends upon

Closed by  Toolybird (Toolybird)
Monday, 17 April 2023, 22:10 GMT
Reason for closing:  Upstream
Additional comments about closing:  It's in the hands of upstream. Let's hope they get around to addressing it.
Comment by Toolybird (Toolybird) - Friday, 17 March 2023, 20:51 GMT
Removing code that is designed to prevent hardware damage doesn't seem wise. You really need to contact upstream about this. amdgpu issues can be submitted here [1]. Please let us know how you get on.

[1] https://gitlab.freedesktop.org/drm/amd
Comment by Jonas Jefe (jonaslorincz) - Saturday, 18 March 2023, 04:31 GMT
This is the related upstream report: https://gitlab.freedesktop.org/drm/amd/-/issues/1267

Loading...