Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#71605 - amdgpu: smu firmware loading failed

Attached to Project: Arch Linux
Opened by Siddharth J Singh (dante666) - Saturday, 24 July 2021, 09:55 GMT
Last edited by Andreas Radke (AndyRTR) - Sunday, 01 August 2021, 08:46 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Jan Alexander Steffens (heftig)
Levente Polyak (anthraxx)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
After upgrading to linux 5.13.4.arch1-1,
I reach the user target and I can see my CLI login for a second before the display becomes non-responsive.

Jul 23 21:14:33 vishwakarma-portable kernel: amdgpu: probe of 0000:03:00.0 failed with error -95
Jul 23 21:14:33 vishwakarma-portable kernel: amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
Jul 23 21:14:33 vishwakarma-portable kernel: amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed
Jul 23 21:14:33 vishwakarma-portable kernel: amdgpu: smu firmware loading failed

I've attached the log with amdgpu grepped. The first boot is with an iso image and shows

Jul 24 20:10:08 archiso kernel: amdgpu: Topology: Add dGPU node [0x164c:0x1002]
Jul 24 20:10:08 archiso kernel: amdgpu 0000:03:00.0: amdgpu: SMU is initialized successfully!

Steps to reproduce:
I am using HP15s Laptop with Ryzen 5500u and Vega 7 graphics
113-LUCIENNE-014

I had earlier tried to install the upstream kernel and(5.13-rc*) faced this issue back then also.
   journal.txt (493.9 KiB)
This task depends upon

Closed by  Andreas Radke (AndyRTR)
Sunday, 01 August 2021, 08:46 GMT
Reason for closing:  None
Additional comments about closing:  It was not working because of a configuration setting.
Comment by Siddharth J Singh (dante666) - Saturday, 24 July 2021, 14:48 GMT
Jul 24 20:08:39 vishwakarma-portable kernel: [drm] amdgpu: ttm finalized
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu_driver_release_kms+0x12/0x30 [amdgpu f9e43f4b779c45447d655d5aba951a6b6f83103a]
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu_device_fini_sw+0xb6/0x2d0 [amdgpu f9e43f4b779c45447d655d5aba951a6b6f83103a]
Jul 24 20:08:39 vishwakarma-portable kernel: gmc_v9_0_sw_fini+0x3a/0x40 [amdgpu f9e43f4b779c45447d655d5aba951a6b6f83103a]
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu_ttm_fini+0x9c/0x100 [amdgpu f9e43f4b779c45447d655d5aba951a6b6f83103a]
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu_gtt_mgr_fini+0x79/0xe0 [amdgpu f9e43f4b779c45447d655d5aba951a6b6f83103a]
Jul 24 20:08:39 vishwakarma-portable kernel: Modules linked in: amdgpu(+) intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm snd_acp3x_pdm_dma snd_soc_dmic snd_acp3x_rn snd_soc_core irqbypass crct10dif_pclmul snd_compress crc32_pclmul ac97_bus hp_wmi ghash_clmulni_intel snd_pcm_dmaengine platform_profile snd_pcm sparse_keymap gpu_sched i2c_algo_bit drm_ttm_helper snd_timer ttm aesni_intel snd rfkill wmi_bmof drm_kms_helper crypto_simd soundcore cryptd cec pcspkr agpgart ccp snd_rn_pci_acp3x syscopyarea k10temp sysfillrect snd_pci_acp3x sysimgblt fb_sys_fops tpm_crb sp5100_tco i2c_piix4 tpm_tis wmi tpm_tis_core tpm rng_core i2c_hid_acpi pinctrl_amd i2c_hid amd_pmc acpi_tad drm fuse crypto_user ip_tables x_tables ext4 crc16 mbcache jbd2 crc32c_intel
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu_driver_release_kms+0x12/0x30 [amdgpu f9e43f4b779c45447d655d5aba951a6b6f83103a]
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu_device_fini_sw+0xb6/0x2d0 [amdgpu f9e43f4b779c45447d655d5aba951a6b6f83103a]
Jul 24 20:08:39 vishwakarma-portable kernel: gmc_v9_0_sw_fini+0x3a/0x40 [amdgpu f9e43f4b779c45447d655d5aba951a6b6f83103a]
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu_ttm_fini+0x94/0x100 [amdgpu f9e43f4b779c45447d655d5aba951a6b6f83103a]
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu_vram_mgr_fini+0xe0/0x150 [amdgpu f9e43f4b779c45447d655d5aba951a6b6f83103a]
Jul 24 20:08:39 vishwakarma-portable kernel: Modules linked in: amdgpu(+) intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm snd_acp3x_pdm_dma snd_soc_dmic snd_acp3x_rn snd_soc_core irqbypass crct10dif_pclmul snd_compress crc32_pclmul ac97_bus hp_wmi ghash_clmulni_intel snd_pcm_dmaengine platform_profile snd_pcm sparse_keymap gpu_sched i2c_algo_bit drm_ttm_helper snd_timer ttm aesni_intel snd rfkill wmi_bmof drm_kms_helper crypto_simd soundcore cryptd cec pcspkr agpgart ccp snd_rn_pci_acp3x syscopyarea k10temp sysfillrect snd_pci_acp3x sysimgblt fb_sys_fops tpm_crb sp5100_tco i2c_piix4 tpm_tis wmi tpm_tis_core tpm rng_core i2c_hid_acpi pinctrl_amd i2c_hid amd_pmc acpi_tad drm fuse crypto_user ip_tables x_tables ext4 crc16 mbcache jbd2 crc32c_intel
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu_driver_release_kms+0x12/0x30 [amdgpu f9e43f4b779c45447d655d5aba951a6b6f83103a]
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu_device_fini_sw+0xb6/0x2d0 [amdgpu f9e43f4b779c45447d655d5aba951a6b6f83103a]
Jul 24 20:08:39 vishwakarma-portable kernel: gfx_v9_0_sw_fini+0xc8/0x190 [amdgpu f9e43f4b779c45447d655d5aba951a6b6f83103a]
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu_bo_unref+0x1a/0x30 [amdgpu f9e43f4b779c45447d655d5aba951a6b6f83103a]
Jul 24 20:08:39 vishwakarma-portable kernel: Modules linked in: amdgpu(+) intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm snd_acp3x_pdm_dma snd_soc_dmic snd_acp3x_rn snd_soc_core irqbypass crct10dif_pclmul snd_compress crc32_pclmul ac97_bus hp_wmi ghash_clmulni_intel snd_pcm_dmaengine platform_profile snd_pcm sparse_keymap gpu_sched i2c_algo_bit drm_ttm_helper snd_timer ttm aesni_intel snd rfkill wmi_bmof drm_kms_helper crypto_simd soundcore cryptd cec pcspkr agpgart ccp snd_rn_pci_acp3x syscopyarea k10temp sysfillrect snd_pci_acp3x sysimgblt fb_sys_fops tpm_crb sp5100_tco i2c_piix4 tpm_tis wmi tpm_tis_core tpm rng_core i2c_hid_acpi pinctrl_amd i2c_hid amd_pmc acpi_tad drm fuse crypto_user ip_tables x_tables ext4 crc16 mbcache jbd2 crc32c_intel
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu: probe of 0000:03:00.0 failed with error -95
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed
Jul 24 20:08:39 vishwakarma-portable kernel: amdgpu: smu firmware loading failed

I enabled some logs in 5.14-rc1 and this shows up.
Comment by Jonathon (jonathon) - Tuesday, 27 July 2021, 12:24 GMT
Could you check with linux-mainline 5.14rc3? A number of amdgpu-related issues have been resolved in rc3.
Comment by Siddharth J Singh (dante666) - Tuesday, 27 July 2021, 16:07 GMT
I just checked, it's giving me the same error.
on 5.12 it works fine. I have reported this over at drm/amd too.

https://gitlab.freedesktop.org/drm/amd/-/issues/1662

I'll printk debug this on my end and see what more information I can gather.
Comment by Siddharth J Singh (dante666) - Tuesday, 27 July 2021, 16:07 GMT
I just checked, it's giving me the same error.
on 5.12 it works fine. I have reported this over at drm/amd too.

https://gitlab.freedesktop.org/drm/amd/-/issues/1662

I'll printk debug this on my end and see what more information I can gather.
Comment by Siddharth J Singh (dante666) - Wednesday, 28 July 2021, 16:01 GMT
I got why it was not working.

Earlier I was facing some error while suspend-resume where doing it twice would not bring up the amdgpu module.

I had enabled this option amdgpu.dpm = 0 to not let the gpu sleep.

With the updates, this was causing one condition to fail in the source and smu initialization failing.

Now it is still causing some error while resuming but gpu is not the problem.
I can see screen on resume but it stops Xserver and I can see some failures while systemd is trying to log output.

I'll try to figure out what's happening now.

Sorry for the false alarm.

Loading...