FS#77340 - Linux kernel >=6.1 suffer from AMD fTPM stutter

Attached to Project: Arch Linux
Opened by Jonas Jefe (jonaslorincz) - Tuesday, 31 January 2023, 05:28 GMT
Last edited by Toolybird (Toolybird) - Tuesday, 14 March 2023, 04:04 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To No-one
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 7
Private No

Details

Description:

Linux kernel >=6.1 exhibits a stuttering issue that occurs once every few hours. See https://www.reddit.com/r/archlinux/comments/zvgev0/audio_stuttering_issues_with_kernel_611/ https://www.reddit.com/r/linux_gaming/comments/zzqaf7/having_intermittent_stutters_with_a_ryzen_cpu/ https://bbs.archlinux.org/viewtopic.php?id=282333 for detailed information.

The stutter causes the framerate of the display to decrease dramatically and causes bursts in the audio output.

Additional info:
* linux 6.1.0 or later

Steps to reproduce:
* Use Linux kernel >=6.1
* Use AMD Ryzen CPU with fTPM enabled
* Wait for a few hours

Here is a Rust program that will monitor for a potential stutter:
```rs
use std::{
thread::sleep,
time::{Duration, Instant},
};

use chrono::Local;

fn main() {
let test = Duration::from_millis(10);
let expected_offset = Duration::from_millis(5);

loop {
let start = Instant::now();
sleep(test);
let elapsed = start.elapsed();
if elapsed > test + expected_offset {
println!(
"[{}] stutter: took {:?}, expected <{:?}",
Local::now().format("%Y-%d-%m %H:%M:%S"),
elapsed - test,
expected_offset
);
}
}
}
```

Many lines printed from this program will indicate a stutter (the stutter will last 1-2 seconds and causes ~30ms extra delay reported by the program).
This task depends upon

Closed by  Toolybird (Toolybird)
Tuesday, 14 March 2023, 04:04 GMT
Reason for closing:  Fixed
Additional comments about closing:  linux 6.2.6.arch1-1
Comment by Toolybird (Toolybird) - Tuesday, 31 January 2023, 06:26 GMT
It smells like an issue with the platform itself. Is your firmware up-to-date? Either way, it's an upstream problem so you'll need to report it to the kernel folks.
Comment by Jonas Jefe (jonaslorincz) - Tuesday, 31 January 2023, 17:49 GMT
Do you know which kernel subsystem is related to this?
Comment by Yuri Cherio (cherio) - Thursday, 02 February 2023, 01:06 GMT
I am so glad to find out this is a global issue, because all this time I though this was my corrupt setup :)

It is worth to mention that, if TPM is not used on your system (I suspect most Arch desktops), there may be an easy way to simply disable it in BIOS as a workaround. Otherwise look for a BIOS updates that fixes the issue (e.g. https://www.techspot.com/news/94939-bios-update-amd-pcs-fixes-ftpm-related-performance.html).
Comment by Jonas Jefe (jonaslorincz) - Thursday, 02 February 2023, 02:32 GMT
Unfortunately, I enabled TPM encryption to unlock LUKS keys on startup. Additionally, I couldn't find any option in BIOS to disable the TPM.

Additionally, my computer (ASUS G513QY)'s latest BIOS update does not contain the fix.

What's strange is that this issue doesn't happen in kernel 6.0.19, which I'm using currently.
Comment by Echo (DodoGTA) - Thursday, 02 February 2023, 19:30 GMT
I wonder if 23393c6461422df5bf8084a086ada9a7e17dc2ba and/or f4cd18c5b2000df0c382f6530eeca9141ea41faf commits could be a problem (based on a rough guess)
Comment by Bell West (BellnPell) - Saturday, 04 February 2023, 12:40 GMT
Found a temporary solution for those who can't disable ftpm: build your own kernel with CONFIG_HW_RANDOM_TPM=n in the build config.
I know we should look for which part of the code causing it. At least now we have a clue.
So what exactly happened from 6.0.x -> 6.1.x? Because this build config is enabled in 6.0.x too. Need a bit more time to find out.
Comment by Echo (DodoGTA) - Saturday, 04 February 2023, 14:45 GMT
@BellnPell Can you try reverting this commit?: https://github.com/torvalds/linux/commit/b006c439d58db625318bf2207feabf847510a8a6

I looked around more thoroughly and found this gem 🐸
Comment by Bell West (BellnPell) - Sunday, 05 February 2023, 02:44 GMT
@DodoGTA Tanks! I am reading documents and articles to find out how this hw_random part works.
Every clue I found until now are point to this random/tpm part.
Will do some investigation later, I am now still in git bisect to test the kernel.
Comment by Bell West (BellnPell) - Sunday, 05 February 2023, 07:10 GMT
ok, I found a way to trigger this bug in kernel 6.0.x.
run "sudo cat /dev/hwrng > /dev/null" for around 5-15 minutes, and here you go.
so there must be something that keeps calling the hardware random numbers generator in 6.1.x
as time goes on, at a certain point, minor error stack together and lead to another error (something overflow I guess?)
plus, if you use the rust monitoring program, you will notice timeout errors in 6.1.x even when you do nothing. However, this isn't the case in 6.0.x as long as you not calling hwrng.
Comment by Bell West (BellnPell) - Sunday, 05 February 2023, 10:17 GMT
@DodoGTA You got it right!
The git bisect result shows exactly the same!
I am writing a report in the upstream bug report now, hope this problem can be fixed soon.
Comment by Echo (DodoGTA) - Sunday, 05 February 2023, 16:05 GMT
The bug is now reported upstream: https://bugzilla.kernel.org/show_bug.cgi?id=216989
Comment by loqs (loqs) - Tuesday, 14 February 2023, 21:55 GMT Comment by loqs (loqs) - Sunday, 12 March 2023, 23:57 GMT

Loading...