FS#74405 - [kernel][AMD][IOMMU] Bug in AMD IOMMU on Ryzen leads to suspend to RAM not resuming properly
Attached to Project:
Arch Linux
Opened by Lahfa Samy (AkechiShiro) - Friday, 08 April 2022, 13:22 GMT
Last edited by Toolybird (Toolybird) - Thursday, 14 September 2023, 07:12 GMT
Opened by Lahfa Samy (AkechiShiro) - Friday, 08 April 2022, 13:22 GMT
Last edited by Toolybird (Toolybird) - Thursday, 14 September 2023, 07:12 GMT
|
Details
Description:
ArchLinux does not resume properly after suspend from RAM due to an AMD IOMMU bug/oops about interrupts enabling. Additional info: * linux 5.17.1 mainline kernel * Thinkpad T495 Ryzen 7 3700U with Radeon Vega RX10 (iGPU) * I'm planning to report this upstream to the Linux kernel (Bugzilla) on the IOMMU driver. * This issue started very recently on this kernel, I believe the oldest working one was 5.16.16 maybe the regression was introduced by the 5.17 kernel. Steps to reproduce: - Boot on linux 5.17.1 - systemctl suspend - Push power button. - The issue thus is triggered if any X11 graphic server was started the system cannot resume from suspend to RAM (black screen) and a force reboot is needed. The output from dmesg given here was done using `no_console_suspend`, `initcall_debug` and `ignore_loglevel`. Here is the relevant output : [ 82.540316] ACPI: PM: Preparing to enter system sleep state S3 [ 82.547782] ACPI: EC: event blocked [ 82.547784] ACPI: EC: EC stopped [ 82.547785] ACPI: PM: Saving platform NVS memory [ 82.548228] Disabling non-boot CPUs ... [ 82.550506] smpboot: CPU 1 is now offline [ 82.553132] smpboot: CPU 2 is now offline [ 82.555485] smpboot: CPU 3 is now offline [ 82.557593] smpboot: CPU 4 is now offline [ 82.559873] smpboot: CPU 5 is now offline [ 82.561829] smpboot: CPU 6 is now offline [ 82.563933] smpboot: CPU 7 is now offline [ 82.565077] ACPI: PM: Low-level resume complete [ 82.565107] ACPI: EC: EC started [ 82.565108] ACPI: PM: Restoring platform NVS memory [ 83.718277] ------------[ cut here ]------------ [ 83.718278] WARNING: CPU: 0 PID: 2572 at drivers/iommu/amd/init.c:851 amd_iommu_enable_interrupts+0x34d/0x420 [ 83.718290] Modules linked in: ccm cmac algif_hash algif_skcipher af_alg bnep lm92 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc btusb btrtl btbcm btintel btmtk bluetooth intel_rapl_msr ecdh_generic joydev mousedev intel_rapl_common crc16 edac_mce_amd snd_sof_amd_renoir snd_acp_config kvm_amd iwlmvm snd_sof_amd_acp kvm snd_sof_pci irqbypass snd_sof mac80211 snd_ctl_led snd_soc_acpi crct10dif_pclmul snd_hda_codec_realtek think_lmi crc32_pclmul libarc4 snd_hda_codec_hdmi snd_hda_codec_generic firmware_attributes_class crc32c_intel snd_soc_core ghash_clmulni_intel snd_hda_intel aesni_intel wmi_bmof snd_compress snd_intel_dspcfg iwlwifi snd_intel_sdw_acpi crypto_simd ac97_bus vfat snd_hda_codec snd_pcm_dmaengine cryptd iwlmei fat rapl snd_hda_core snd_pci_acp6x thinkpad_acpi snd_pci_acp5x snd_hwdep tpm_crb ledtrig_audio snd_pcm cfg80211 psmouse sp5100_tco platform_profile snd_rn_pci_acp3x ucsi_acpi zenpower(OE) snd_timer tpm_tis rfkill i2c_piix4 [ 83.718366] typec_ucsi snd ipmi_devintf typec snd_pci_acp3x tpm_tis_core ccp mei ipmi_msghandler r8168(OE) soundcore roles wmi tpm video rng_core i2c_scmi pinctrl_amd mac_hid acpi_cpufreq sg crypto_user acpi_call(OE) fuse bpf_preload ip_tables x_tables usbhid zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) serio_raw atkbd libps2 sdhci_pci cqhci sdhci xhci_pci xhci_pci_renesas mmc_core i8042 serio radeon amdgpu gpu_sched drm_ttm_helper ttm [ 83.718413] CPU: 0 PID: 2572 Comm: systemd-sleep Tainted: P OE 5.17.1-arch1-1 #1 0ea933cb6bfe82a8dc16ab834a4bccdd297f98b7 [ 83.718418] Hardware name: LENOVO 20NKS28F00/20NKS28F00, BIOS R12ET55W(1.25 ) 07/06/2020 [ 83.718421] RIP: 0010:amd_iommu_enable_interrupts+0x34d/0x420 [ 83.718427] Code: ff ff 49 8b 7f 18 89 04 24 e8 9f 36 ee ff 8b 04 24 e9 4b fd ff ff 0f 0b 4d 8b 3f 49 81 ff 50 09 56 99 0f 85 05 fd ff ff eb 96 <0f> 0b 4d 8b 3f 49 81 ff 50 09 56 99 0f 85 f1 fc ff ff eb 82 31 f6 [ 83.718429] RSP: 0018:ffffa787405cbc68 EFLAGS: 00010046 [ 83.718432] RAX: 00000001262cdc89 RBX: 0000000000000000 RCX: 0000000000000000 [ 83.718434] RDX: 000000000000607e RSI: 00000000000059ae RDI: 00000001262c7c0b [ 83.718436] RBP: 0000000080000000 R08: 0000000000000000 R09: 000000000000000f [ 83.718437] R10: 0000000079726f6d R11: 000000006d656d20 R12: 000ffffffffffff8 [ 83.718439] R13: 0800000000000000 R14: ffffa787405cbc70 R15: ffff95d48004a800 [ 83.718441] FS: 00007fb3d354fe80(0000) GS:ffff95d76fa00000(0000) knlGS:0000000000000000 [ 83.718443] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 83.718445] CR2: 00007f42204d6ad0 CR3: 000000012dbe8000 CR4: 00000000003506f0 [ 83.718447] Call Trace: [ 83.718450] <TASK> [ 83.718455] ? early_enable_iommus+0x1c5/0x300 [ 83.718460] ? enable_iommus_v2+0x8e/0x130 [ 83.718464] syscore_resume+0x4b/0x160 [ 83.718469] suspend_devices_and_enter+0x6d3/0x7d0 [ 83.718476] pm_suspend.cold+0x2fb/0x342 [ 83.718482] state_store+0x71/0xd0 [ 83.718487] kernfs_fop_write_iter+0x11c/0x1b0 [ 83.718493] new_sync_write+0x15c/0x1f0 [ 83.718500] vfs_write+0x1eb/0x280 [ 83.718503] ksys_write+0x67/0xe0 [ 83.718506] do_syscall_64+0x5c/0x80 [ 83.718511] ? do_syscall_64+0x69/0x80 [ 83.718513] ? exc_page_fault+0x72/0x170 [ 83.718517] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 83.718522] RIP: 0033:0x7fb3d3f44257 [ 83.718526] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 83.718528] RSP: 002b:00007ffeda5645a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 83.718531] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fb3d3f44257 [ 83.718532] RDX: 0000000000000004 RSI: 00007ffeda564690 RDI: 0000000000000004 [ 83.718534] RBP: 00007ffeda564690 R08: 000055ba9c2d1230 R09: 0000000000000000 [ 83.718535] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 [ 83.718536] R13: 000055ba9c2cd3c0 R14: 0000000000000004 R15: 00007fb3d403d7a0 [ 83.718540] </TASK> [ 83.718541] ---[ end trace 0000000000000000 ]--- |
This task depends upon
Closed by Toolybird (Toolybird)
Thursday, 14 September 2023, 07:12 GMT
Reason for closing: Fixed
Additional comments about closing: See comments
Thursday, 14 September 2023, 07:12 GMT
Reason for closing: Fixed
Additional comments about closing: See comments
https://forums.lenovo.com/searchpage/tab/posts?fid=27&q=resume&sort=date&matedata=&page=1
Edit:
See also
FS#74285I've attached the new logs, the oops seems to be still in the same function but elsewhere.