FS#64064 - AMDGPU oopses with kernel 5.3.5

Attached to Project: Arch Linux
Opened by Martin (mort96) - Tuesday, 08 October 2019, 14:32 GMT
Last edited by freswa (frederik) - Friday, 21 February 2020, 22:03 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To No-one
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description: After the kernel upgrade, the amdgpu driver oopses and doesn't work. I assume this is an upstream issue. I may experiment with upstream kernels when I get time. This is happening on an Intel i7-8705G which is a CPU with built-in hybrid Intel HD graphics/AMD VEGA M graphics. The Intel graphics still works.

The full dmesg output is attached, but here's the interesting part:

[ 148.325147] amdgpu 0000:01:00.0: GPU pci config reset
[ 150.078566] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 150.317303] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
[ 150.317329] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v8_0> failed -110
[ 150.317352] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110).
[ 150.334132] [drm] schedsdma0 is not ready, skipping
[ 150.334132] [drm] schedsdma1 is not ready, skipping
[ 150.352651] Move buffer fallback to memcpy unavailable
[ 150.352691] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
[ 150.353143] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 150.353145] #PF: supervisor read access in kernel mode
[ 150.353146] #PF: error_code(0x0000) - not-present page
[ 150.353147] PGD 0 P4D 0
[ 150.353149] Oops: 0000 [#1] PREEMPT SMP PTI
[ 150.353153] CPU: 7 PID: 2230 Comm: Renderer Tainted: G OE 5.3.5-arch1-1-ARCH #1
[ 150.353154] Hardware name: Dell Inc. XPS 15 9575/0C32VW, BIOS 1.5.1 03/25/2019
[ 150.353228] RIP: 0010:amdgpu_vm_sdma_commit+0x4e/0x120 [amdgpu]
[ 150.353229] Code: 00 00 48 89 44 24 08 31 c0 48 8b 47 08 48 c7 04 24 00 00 00 00 4c 8b a2 88 01 00 00 4c 8b a8 80 00 00 00 48 8b 80 c8 00 00 00 <4c> 8b 70 08 41 8b 44 24 08 4d 8d 7e 88 85 c0 0f 84 a2 55 1f 00 49
[ 150.353231] RSP: 0018:ffff93a884bafae8 EFLAGS: 00010246
[ 150.353232] RAX: 0000000000000000 RBX: ffff93a884bafb30 RCX: 0000000000119c00
[ 150.353233] RDX: ffff8d6e575ac800 RSI: ffff93a884bafbb8 RDI: ffff93a884bafb30
[ 150.353233] RBP: ffff93a884bafbb8 R08: 0000000000001000 R09: 000000000000009d
[ 150.353234] R10: 000000000000009b R11: 0000000000000099 R12: ffff8d6e575ac9f8
[ 150.353235] R13: ffff8d6f2a2ea000 R14: ffff8d6f2a2e8800 R15: ffff8d6f3f7b6900
[ 150.353236] FS: 00007f24868d5700(0000) GS:ffff8d6f5f5c0000(0000) knlGS:0000000000000000
[ 150.353237] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 150.353237] CR2: 0000000000000008 CR3: 0000000419146006 CR4: 00000000003606e0
[ 150.353238] Call Trace:
[ 150.353290] amdgpu_vm_bo_update_mapping+0xcd/0xe0 [amdgpu]
[ 150.353332] amdgpu_vm_clear_freed+0xc0/0x190 [amdgpu]
[ 150.353358] amdgpu_gem_va_ioctl+0x47a/0x5a0 [amdgpu]
[ 150.353386] ? amdgpu_gem_metadata_ioctl+0x190/0x190 [amdgpu]
[ 150.353395] drm_ioctl_kernel+0xb8/0x100 [drm]
[ 150.353401] drm_ioctl+0x23d/0x3d0 [drm]
[ 150.353427] ? amdgpu_gem_metadata_ioctl+0x190/0x190 [amdgpu]
[ 150.353451] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 150.353454] do_vfs_ioctl+0x43d/0x6c0
[ 150.353457] ? syscall_trace_enter+0x1f2/0x2e0
[ 150.353458] ksys_ioctl+0x5e/0x90
[ 150.353460] __x64_sys_ioctl+0x16/0x20
[ 150.353461] do_syscall_64+0x5f/0x1c0
[ 150.353464] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 150.353465] RIP: 0033:0x7f249cdf425b
[ 150.353467] Code: 0f 1e fa 48 8b 05 25 9c 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f5 9b 0c 00 f7 d8 64 89 01 48
[ 150.353467] RSP: 002b:00007f24868cf808 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 150.353468] RAX: ffffffffffffffda RBX: 00007f24868cf850 RCX: 00007f249cdf425b
[ 150.353469] RDX: 00007f24868cf850 RSI: 00000000c0286448 RDI: 000000000000002d
[ 150.353470] RBP: 00000000c0286448 R08: 0000000117c00000 R09: 000000000000000e
[ 150.353470] R10: 000000000000001a R11: 0000000000000246 R12: 0000000000000000
[ 150.353471] R13: 000000000000002d R14: 0000000000000002 R15: 00007f24516fd550
[ 150.353472] Modules linked in: uhid ccm rfcomm fuse ftdi_sio cmac algif_hash algif_skcipher af_alg bnep btusb btrtl btbcm btintel bluetooth ecdh_generic ecc snd_hda_codec_hdmi snd_hda_codec_realtek hid_multitouch snd_hda_codec_generic hid_sensor_incl_3d hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_accel_3d hid_sensor_rotation hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio hid_sensor_hub intel_ishtp_loader intel_ishtp_hid cros_ec_ishtp cros_ec_core mousedev joydev wacom usbhid iTCO_wdt hid_generic iTCO_vendor_support mei_wdt mei_hdcp uvcvideo wmi_bmof dell_wmi intel_rapl_msr videobuf2_vmalloc x86_pkg_temp_thermal videobuf2_memops videobuf2_v4l2 intel_powerclamp intel_wmi_thunderbolt coretemp videobuf2_common dell_laptop msr ledtrig_audio kvm_intel videodev mc dell_smbios ath10k_pci kvm dell_wmi_descriptor snd_hda_intel amdgpu ath10k_core irqbypass dcdbas snd_hda_codec ath crct10dif_pclmul snd_hda_core crc32_pclmul mac80211
[ 150.353491] ghash_clmulni_intel nls_iso8859_1 snd_hwdep nls_cp437 squashfs aesni_intel vfat snd_pcm fat aes_x86_64 crypto_simd cryptd i915 glue_helper loop snd_timer intel_cstate intel_uncore intel_rapl_perf snd gpu_sched input_leds pcspkr cfg80211 i2c_i801 soundcore ttm i2c_algo_bit drm_kms_helper mei_me rtsx_pci_ms rfkill memstick drm libarc4 mei idma64 intel_gtt agpgart intel_lpss_pci intel_lpss ucsi_acpi processor_thermal_device syscopyarea intel_ish_ipc typec_ucsi intel_rapl_common i2c_hid sysfillrect intel_pch_thermal sysimgblt intel_ishtp intel_soc_dts_iosf fb_sys_fops typec wmi hid battery tpm_crb int3403_thermal int340x_thermal_zone evdev intel_vbtn mac_hid soc_button_array tpm_tis tpm_tis_core tpm int3400_thermal acpi_thermal_rel intel_hid rng_core sparse_keymap ac vboxnetflt(OE) vboxnetadp(OE) vboxpci(OE) vboxdrv(OE) sg scsi_mod crypto_user acpi_call(OE) ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 rtsx_pci_sdmmc serio_raw mmc_core atkbd xhci_pci libps2 xhci_hcd
[ 150.353558] crc32c_intel rtsx_pci i8042 serio
[ 150.353564] CR2: 0000000000000008
[ 150.353566] ---[ end trace 01b69332417a38b4 ]---
[ 150.353621] RIP: 0010:amdgpu_vm_sdma_commit+0x4e/0x120 [amdgpu]
[ 150.353623] Code: 00 00 48 89 44 24 08 31 c0 48 8b 47 08 48 c7 04 24 00 00 00 00 4c 8b a2 88 01 00 00 4c 8b a8 80 00 00 00 48 8b 80 c8 00 00 00 <4c> 8b 70 08 41 8b 44 24 08 4d 8d 7e 88 85 c0 0f 84 a2 55 1f 00 49
[ 150.353624] RSP: 0018:ffff93a884bafae8 EFLAGS: 00010246
[ 150.353626] RAX: 0000000000000000 RBX: ffff93a884bafb30 RCX: 0000000000119c00
[ 150.353628] RDX: ffff8d6e575ac800 RSI: ffff93a884bafbb8 RDI: ffff93a884bafb30
[ 150.353629] RBP: ffff93a884bafbb8 R08: 0000000000001000 R09: 000000000000009d
[ 150.353630] R10: 000000000000009b R11: 0000000000000099 R12: ffff8d6e575ac9f8
[ 150.353632] R13: ffff8d6f2a2ea000 R14: ffff8d6f2a2e8800 R15: ffff8d6f3f7b6900
[ 150.353634] FS: 00007f24868d5700(0000) GS:ffff8d6f5f5c0000(0000) knlGS:0000000000000000
[ 150.353635] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 150.353636] CR2: 0000000000000008 CR3: 0000000419146006 CR4: 00000000003606e0
   dmesg.log (99.2 KiB)
This task depends upon

Closed by  freswa (frederik)
Friday, 21 February 2020, 22:03 GMT
Reason for closing:  None
Additional comments about closing:  This seems pretty stalled to me. If it's still an issue, please fill a re-open request. Thank you :)
Comment by loqs (loqs) - Tuesday, 08 October 2019, 16:16 GMT
Appears to be the same as https://bbs.archlinux.org/viewtopic.php?id=249438
Please report upstream to https://bugs.freedesktop.org product DRI component DRM/AMDgpu
Please also if the issue is present in https://aur.archlinux.org/packages/linux-amd-staging-drm-next-git/
If the issue was not present in an older kernel please consider bisecting the kernel to find the causal commit.
See also https://www.kernel.org/doc/html/latest/admin-guide/reporting-bugs.html

Loading...