FS#51478 - [linux] drm/radeon/radeonsi: atom execute table_locked stuck - can't use dedicated gpu anymore

Attached to Project: Arch Linux
Opened by Alif (alive4ever) - Friday, 21 October 2016, 11:39 GMT
Last edited by Eli Schwartz (eschwartz) - Friday, 14 July 2017, 01:53 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
After 4.8.2 upgrade, radeon drm stopped working (0xCAFEDEAD). This causes PRIME gpu offloading fails to run.

Additional info:
* package version(s): linux-4.8.2
* config and/or log files etc.

[ 74.632315] alive-pc kernel: [drm] probing gen 2 caps for device 8086:9c18 = 5323c42/0
[ 74.632321] alive-pc kernel: [drm] PCIE gen 2 link speeds already enabled
[ 74.639212] alive-pc kernel: [drm] PCIE GART of 2048M enabled (table at 0x0000000000040000).
[ 74.639373] alive-pc kernel: radeon 0000:03:00.0: WB enabled
[ 74.639378] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff8801bdbebc00
[ 74.639381] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff8801bdbebc04
[ 74.639383] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff8801bdbebc08
[ 74.639386] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff8801bdbebc0c
[ 74.639388] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff8801bdbebc10
[ 75.224011] alive-pc kernel: [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD)
[ 75.224040] alive-pc kernel: [drm:si_resume [radeon]] *ERROR* si startup failed on resume
[ 96.563925] alive-pc kernel: [drm:atom_op_jump [radeon]] *ERROR* atombios stuck in loop for more than 5secs aborting
[ 96.563941] alive-pc kernel: [drm:atom_execute_table_locked [radeon]] *ERROR* atombios stuck executing 6FD8 (len 237, WS 0, PS 4) @ 0x6FE6
[ 96.563954] alive-pc kernel: [drm:atom_execute_table_locked [radeon]] *ERROR* atombios stuck executing 6860 (len 94, WS 12, PS 8) @ 0x68A9
[ 96.573131] alive-pc kernel: [drm] probing gen 2 caps for device 8086:9c18 = 5323c42/0
[ 96.573135] alive-pc kernel: [drm] PCIE gen 2 link speeds already enabled
[ 96.951370] alive-pc kernel: radeon 0000:03:00.0: Wait for MC idle timedout !
[ 97.140284] alive-pc kernel: radeon 0000:03:00.0: Wait for MC idle timedout !
[ 97.146234] alive-pc kernel: [drm] PCIE GART of 2048M enabled (table at 0x0000000000040000).
[ 97.146413] alive-pc kernel: radeon 0000:03:00.0: WB enabled
[ 97.146416] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff8801bdbebc00
[ 97.146418] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff8801bdbebc04
[ 97.146419] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff8801bdbebc08
[ 97.146421] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff8801bdbebc0c
[ 97.146422] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff8801bdbebc10
[ 97.731392] alive-pc kernel: [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD)
[ 97.731415] alive-pc kernel: [drm:si_resume [radeon]] *ERROR* si startup failed on resume
[ 121.235280] alive-pc kernel: [drm:atom_op_jump [radeon]] *ERROR* atombios stuck in loop for more than 5secs aborting
[ 121.235304] alive-pc kernel: [drm:atom_execute_table_locked [radeon]] *ERROR* atombios stuck executing 6FD8 (len 237, WS 0, PS 4) @ 0x6FE6
[ 121.235318] alive-pc kernel: [drm:atom_execute_table_locked [radeon]] *ERROR* atombios stuck executing 6860 (len 94, WS 12, PS 8) @ 0x68A9
[ 121.244211] alive-pc kernel: [drm] probing gen 2 caps for device 8086:9c18 = 5323c42/0
[ 121.244214] alive-pc kernel: [drm] PCIE gen 2 link speeds already enabled
[ 121.622297] alive-pc kernel: radeon 0000:03:00.0: Wait for MC idle timedout !
[ 121.811953] alive-pc kernel: radeon 0000:03:00.0: Wait for MC idle timedout !
[ 121.817901] alive-pc kernel: [drm] PCIE GART of 2048M enabled (table at 0x0000000000040000).
[ 121.818100] alive-pc kernel: radeon 0000:03:00.0: WB enabled
[ 121.818103] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff8801bdbebc00
[ 121.818105] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff8801bdbebc04
[ 121.818106] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff8801bdbebc08
[ 121.818107] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff8801bdbebc0c
[ 121.818109] alive-pc kernel: radeon 0000:03:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff8801bdbebc10
[ 122.403198] alive-pc kernel: [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD)
[ 122.403221] alive-pc kernel: [drm:si_resume [radeon]] *ERROR* si startup failed on resume


Steps to reproduce:
1. Update to linux 4.8.2 on a laptop that has hybrid amd-intel graphics.
2. Enable prime gpu offloading on X11[1] or use wayland compositor.
3. Test if the dedicated gpu works by running 'DRI_PRIME=1 glxinfo'. glxinfo is from mesa-demos package.

[1] https://wiki.archlinux.org/index.php/PRIME#PRIME_GPU_offloading

Downgrading to linux 4.7.6 temporarily fixes the issue for me, i.e. the dgpu works with prime gpu offloading.

This task depends upon

Closed by  Eli Schwartz (eschwartz)
Friday, 14 July 2017, 01:53 GMT
Reason for closing:  Not a bug
Additional comments about closing:  Seems to be a configuration issue
Comment by Alif (alive4ever) - Monday, 23 January 2017, 16:22 GMT
Here is kernel part of kernel log, with radeon blacklisted on kernel 4.9.5 (testing)

[ 13.750736] alive-pc kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
[ 13.750787] alive-pc kernel: IP: [<ffffffffa082b211>] gmc_v6_0_hw_init+0x1f1/0x660 [amdgpu]
[ 13.750873] alive-pc kernel: PGD 0
[ 13.750884] alive-pc kernel:
[ 13.750898] alive-pc kernel: Oops: 0000 [#1] PREEMPT SMP
[ 13.750919] alive-pc kernel: Modules linked in: ip6t_REJECT ipt_REJECT nf_reject_ipv6 nf_reject_ipv4 snd_hda_codec_hdmi ghash_clmulni_intel aesni_intel nf_conntrack_ipv4 nf_conntrack_ipv6 nf_defrag_ipv6 nf_defrag_ipv4 aes_x86_64 lrw xt_conntrack nf_conntrack arc4 gf128mul ath3k btusb amdkfd ath9k btrtl glue_helper btbcm btintel amd_iommu_v2 snd_hda_codec_conexant snd_hda_codec_generic ablk_helper ath9k_common cryptd ath9k_hw snd_hda_intel snd_soc_rt5640 intel_cstate intel_rapl_perf ath bluetooth amdgpu(+) ttm mac80211 snd_hda_codec psmouse ip6table_filter i915 snd_soc_rl6231 r8169 snd_soc_core ideapad_laptop sparse_keymap drm_kms_helper snd_compress wmi snd_pcm_dmaengine ac97_bus drm mii i2c_hid iptable_filter ip6_tables hid cfg80211 rfkill elan_i2c 8250_dw video battery i2c_i801 snd_hda_core intel_gtt snd_hwdep
[ 13.751419] alive-pc kernel: snd_pcm snd_timer snd syscopyarea i2c_designware_platform sysfillrect soundcore sysimgblt snd_soc_sst_acpi snd_soc_sst_match fb_sys_fops mei_me fjes mei parport_pc parport shpchp lpc_ich soc_button_array i2c_smbus i2c_designware_core spi_pxa2xx_platform i2c_algo_bit evdev input_leds tpm_tis tpm_tis_core tpm ac mac_hid button sch_fq_codel sg ip_tables x_tables ext4 crc16 jbd2 fscrypto mbcache sr_mod cdrom sd_mod rtsx_usb_sdmmc rtsx_usb serio_raw atkbd libps2 xhci_pci xhci_hcd ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common i8042 serio sdhci_acpi sdhci led_class mmc_core
[ 13.751823] alive-pc kernel: CPU: 1 PID: 197 Comm: systemd-udevd Not tainted 4.9.5-1-ARCH #1
[ 13.751856] alive-pc kernel: Hardware name: LENOVO 20369/Lancer 4A2, BIOS 9ACN32WW 07/20/2015
[ 13.751889] alive-pc kernel: task: ffff8801c5485580 task.stack: ffffc900010d8000
[ 13.751917] alive-pc kernel: RIP: 0010:[<ffffffffa082b211>] [<ffffffffa082b211>] gmc_v6_0_hw_init+0x1f1/0x660 [amdgpu]
[ 13.752006] alive-pc kernel: RSP: 0018:ffffc900010db8f8 EFLAGS: 00010246
[ 13.752032] alive-pc kernel: RAX: 0000000000000000 RBX: ffff8801be9d0000 RCX: 0000000000000000
[ 13.752064] alive-pc kernel: RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffff8801be9d0000
[ 13.752097] alive-pc kernel: RBP: ffffc900010db940 R08: 0000000000000001 R09: 0000000000000000
[ 13.752130] alive-pc kernel: R10: ffffffff811c8901 R11: 0000000000000000 R12: 0000000000000bc5
[ 13.752163] alive-pc kernel: R13: 0000000000000018 R14: 0000000000000000 R15: ffff8801c5f7ac00
[ 13.752197] alive-pc kernel: FS: 00007ff3bcba9400(0000) GS:ffff8801cf240000(0000) knlGS:0000000000000000
[ 13.753514] alive-pc kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 13.754925] alive-pc kernel: CR2: 0000000000000000 CR3: 00000001c48d0000 CR4: 00000000001406e0
[ 13.757205] alive-pc kernel: Stack:
[ 13.759439] alive-pc kernel: ffff8801c5f7ac00 ffff8801be9d2588 0000000000000018 00000000ea229640
[ 13.761632] alive-pc kernel: ffff8801be9d0000 0000000000000005 0000000000000018 0000000000000000
[ 13.763823] alive-pc kernel: ffff8801c5f7ac00 ffffc900010db9d8 ffffffffa07d7397 0000000000000000
[ 13.766021] alive-pc kernel: Call Trace:
[ 13.768236] alive-pc kernel: [<ffffffffa07d7397>] amdgpu_device_init+0x1017/0x15b0 [amdgpu]
[ 13.770458] alive-pc kernel: [<ffffffff811aaf7e>] ? kmalloc_order_trace+0x2e/0xf0
[ 13.772723] alive-pc kernel: [<ffffffffa07d9c3b>] amdgpu_driver_load_kms+0x5b/0x1f0 [amdgpu]
[ 13.774962] alive-pc kernel: [<ffffffffa04dcac7>] drm_dev_register+0xa7/0xd0 [drm]
[ 13.777197] alive-pc kernel: [<ffffffffa04de630>] drm_get_pci_dev+0xe0/0x1d0 [drm]
[ 13.779474] alive-pc kernel: [<ffffffffa07d44ac>] amdgpu_pci_probe+0xbc/0xe0 [amdgpu]
[ 13.781713] alive-pc kernel: [<ffffffff81359eb5>] local_pci_probe+0x45/0xa0
[ 13.783955] alive-pc kernel: [<ffffffff81359e00>] ? pci_match_device+0xe0/0x110
[ 13.786197] alive-pc kernel: [<ffffffff8135b0c9>] pci_device_probe+0x109/0x160
[ 13.788443] alive-pc kernel: [<ffffffff81450053>] driver_probe_device+0x223/0x430
[ 13.790681] alive-pc kernel: [<ffffffff8145033f>] __driver_attach+0xdf/0xf0
[ 13.792921] alive-pc kernel: [<ffffffff81450260>] ? driver_probe_device+0x430/0x430
[ 13.795163] alive-pc kernel: [<ffffffff8144db8c>] bus_for_each_dev+0x6c/0xc0
[ 13.797384] alive-pc kernel: [<ffffffff8144f79e>] driver_attach+0x1e/0x20
[ 13.799578] alive-pc kernel: [<ffffffff8144f1c0>] bus_add_driver+0x170/0x270
[ 13.801753] alive-pc kernel: [<ffffffffa0943000>] ? 0xffffffffa0943000
[ 13.803904] alive-pc kernel: [<ffffffff81450d00>] driver_register+0x60/0xe0
[ 13.806033] alive-pc kernel: [<ffffffffa0943000>] ? 0xffffffffa0943000
[ 13.808140] alive-pc kernel: [<ffffffff8135965c>] __pci_register_driver+0x4c/0x50
[ 13.810250] alive-pc kernel: [<ffffffffa04de80b>] drm_pci_init+0xeb/0x100 [drm]
[ 13.812362] alive-pc kernel: [<ffffffff8144844a>] ? vga_switcheroo_register_handler+0x6a/0x90
[ 13.814423] alive-pc kernel: [<ffffffffa0943000>] ? 0xffffffffa0943000
[ 13.816443] alive-pc kernel: [<ffffffffa0943095>] amdgpu_init+0x95/0xa8 [amdgpu]
[ 13.818361] alive-pc kernel: [<ffffffff81002190>] do_one_initcall+0x50/0x180
[ 13.820210] alive-pc kernel: [<ffffffff811ca521>] ? __vunmap+0x81/0xd0
[ 13.821983] alive-pc kernel: [<ffffffff811ca5de>] ? vfree+0x2e/0x70
[ 13.823678] alive-pc kernel: [<ffffffff8117de2e>] do_init_module+0x5f/0x1f1
[ 13.825304] alive-pc kernel: [<ffffffff8110cbd4>] load_module+0x2384/0x2a50
[ 13.826856] alive-pc kernel: [<ffffffff81109ab0>] ? symbol_put_addr+0x50/0x50
[ 13.828336] alive-pc kernel: [<ffffffff811c997a>] ? vmap_page_range_noflush+0x25a/0x350
[ 13.829804] alive-pc kernel: [<ffffffff8110d414>] SyS_init_module+0x174/0x190
[ 13.831256] alive-pc kernel: [<ffffffff81003b64>] do_syscall_64+0x54/0xc0
[ 13.832687] alive-pc kernel: [<ffffffff81608e6b>] entry_SYSCALL64_slow_path+0x25/0x25
[ 13.834096] alive-pc kernel: Code: 85 c0 74 0f 48 8b 3b 48 c7 c6 40 19 91 a0 e8 87 0c c2 e0 48 8d 75 c0 48 89 df e8 1b fc ff ff 48 8b 83 98 28 00 00 31 f6 48 89 df <ff> 10 f6 43 5a 02 75 5e 48 8b 83 d8 08 00 00 48 85 c0 0f 84 18
[ 13.837362] alive-pc kernel: RIP [<ffffffffa082b211>] gmc_v6_0_hw_init+0x1f1/0x660 [amdgpu]
[ 13.838948] alive-pc kernel: RSP <ffffc900010db8f8>
[ 13.840470] alive-pc kernel: CR2: 0000000000000000
[ 13.842022] alive-pc kernel: ---[ end trace 6b516140e9f1dc2c ]---
Comment by Alif (alive4ever) - Wednesday, 05 April 2017, 14:11 GMT
It seems that I need to add 'options radeon runpm=0' on /etc/modprobe.d/99-amd.conf to get radeon driver working.
Closing.

Loading...