FS#75624 - HDA ATI HDMI audio amdgpu device missing starting on kernel 5.19.1, still works via radeon module

Attached to Project: Arch Linux
Opened by Swyter (swyter) - Tuesday, 16 August 2022, 20:41 GMT
Last edited by Toolybird (Toolybird) - Thursday, 22 September 2022, 01:01 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To No-one
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:

My Kaveri HDMI/DP Audio Controller has stopped working after upgrading to Linux 5.19.1, the device is completely missing. I only get a «Dummy Device» in pipewire.

I have been force-loading amdgpu in this integrated CIK GPU via kernel command-line for years with «radeon.cik_support=0 amdgpu.cik_support=1 amdgpu.dc=1 amdgpu.powerplay=1».
If I remove that then radeon loads by default and I get HDMI sound back, but I lose Vulkan and better graphics performance. Which are some terrible drawbacks.


Additional info:

I have bisected the regression as much as possible via precompiled packages, and this was definitely introduced with the 5.19 series, as it was working until a few days ago. I tested the lts and hardened kernels, which are a bit behind, and they also work fine. Then I rolled back to the normal linux-5.18.16 and also confirmed that it was fine. So to sum things up:

* Doesn't work with:
Linux 5.19.1-arch2-1 (latest)

* Works:
Linux 5.18.16-arch1-1
Linux 5.18.17-hardened1-1-hardened
Linux 5.15.60-1-lts

relevant dmesg output for the working 5.18.16, missing in Linux 5.19.1:
[ 4.440347] snd_hda_intel 0000:00:01.1: Force to non-snoop mode
[ 4.463167] snd_hda_intel 0000:00:01.1: bound 0000:00:01.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[ 4.464826] input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/sound/card0/input11
[ 4.464881] input: HDA ATI HDMI HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/sound/card0/input12
[ 4.555371] snd_hda_intel 0000:00:14.2: device 1849:7662 is on the power_save denylist, forcing power_save to 0


lspci output:
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) I/O Memory Management Unit
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Kaveri [Radeon R7 Graphics] (rev d4)
00:01.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Kaveri HDMI/DP Audio Controller
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Root Port
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Root Port
00:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Root Port
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Root Port
00:10.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller (rev 09)
00:10.1 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller (rev 09)
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 40)
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 11)
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 11)
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 11)
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 11)
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 16)
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller (rev 01)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 11)
00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] FCH PCI Bridge (rev 40)
00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 11)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 5
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 11)

$ pactl list short sinks (missing in Linux 5.19.1)
0 alsa_output.pci-0000_00_01.1.hdmi-stereo module-alsa-card.c s16le 2ch 44100Hz SUSPENDED

$ pactl list cards (attached for 5.18.16)
This task depends upon

Closed by  Toolybird (Toolybird)
Thursday, 22 September 2022, 01:01 GMT
Reason for closing:  Upstream
Comment by Toolybird (Toolybird) - Friday, 19 August 2022, 07:03 GMT
> I have been force-loading amdgpu

What you are doing sounds a bit non-standard, so there might not be many folks in the same situation which limits your chances somewhat.

> I have bisected the regression as much as possible via precompiled packages

So the next step is to perform a git bisection [1]. It's a fair bit of work but quite rewarding if you crack the case and find the commit that caused the regression.

[1] https://wiki.archlinux.org/title/Bisecting_bugs_with_Git
Comment by Swyter (swyter) - Sunday, 21 August 2022, 12:45 GMT
Yeah, thanks for the heads up. :)

I started the bisection, now that I get how it works within the Arch package build system I'm making good progress.
I'll make a copy here of the bisect revisions as I go, just in case:

$ git bisect log
git bisect start
# status: waiting for both good and bad commits
# good: [4b0986a3613c92f4ec1bdc7f60ec66fea135991f] Linux 5.18
git bisect good 4b0986a3613c92f4ec1bdc7f60ec66fea135991f
# status: waiting for bad commit, 1 good commit known
# bad: [3d7cb6b04c3f3115719235cc6866b10326de34cd] Linux 5.19
git bisect bad 3d7cb6b04c3f3115719235cc6866b10326de34cd
# good: [c011dd537ffe47462051930413fed07dbdc80313] Merge tag 'arm-soc-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect good c011dd537ffe47462051930413fed07dbdc80313
Comment by Swyter (swyter) - Sunday, 21 August 2022, 19:28 GMT
Unfortunately booting is completely borked in early 5.19-rcX versions, they fixed booting at 5.19-rc3 so I can't narrow or bisect it properly any further.

* amdgpu's HDMI doesn't work with these, but the kernel boots normally:
Linux 6.0-rc1
Linux 5.19-rc4 (03c765b0e3b4cb5063276b086c76f7a612856a9a)
Linux 5.19-rc3 (a111daf0c53ae91e71fd2bfe7497862d14132e3e)

* Freezes right on boot, milliseconds after kernel init, if I use loglevel=7 for the kernel parameters it seems to stop right after initializing the Net subsystem hashtable:
Linux 5.19-rc4
Linux 5.19-rc3
5d4af9c1f04ab0411ba5818baad9a68e87f33099
0737e018a05e2aa352828c52bdeed3b02cff2930
1e308c6fb7127371f48a0fb9770ea0b30a6b5698

* Everything works:
c011dd537ffe47462051930413fed07dbdc80313
bf23729c7a5f44f0e863666b9364a64741fd3241
Linux 5.18 (4b0986a3613c92f4ec1bdc7f60ec66fea135991f)
Comment by Swyter (swyter) - Sunday, 21 August 2022, 19:37 GMT
Okay, this is pretty silly. But after a lot of fiddling managed to get HDMI sound to work again in 5.19 and up by just *not* preloading the `amdgpu` module in mkinitcpio.conf:

I had it like this, which has worked right until 5.18:
MODULES="radeon amdgpu"

Any of these work, doesn't matter, as long as amdgpu is no longer preloaded the HDMI sound sink for my monitor reappears:
MODULES="radeon"
MODULES=""

I'll probably open a bug report in freedesktop.org to let them know that they changed something by mistake. This is still a bug, and my fix a workaround.
Hopefully this helps other people and saves them the hours of excruciating pain and waits recompiling a kernel that ended up being even more broken in release candidates. ¯\_(ツ)_/¯
Comment by Swyter (swyter) - Sunday, 21 August 2022, 20:06 GMT
Here is the matching bug report on the AMD side, tried to summarize the information above: https://gitlab.freedesktop.org/drm/amd/-/issues/2132
Comment by Toolybird (Toolybird) - Thursday, 22 September 2022, 01:01 GMT
There is ongoing activity in the upstream bug report so will close this for now.

Loading...