FS#53042 - [linux-firmware] 20170217.12987ca-1 breaks AMD Fury power management and potentially others
Attached to Project:
Arch Linux
Opened by Joshua Gwinn (JDGBOLT) - Tuesday, 21 February 2017, 21:39 GMT
Last edited by Laurent Carlier (lordheavy) - Thursday, 09 March 2017, 15:34 GMT
Opened by Joshua Gwinn (JDGBOLT) - Tuesday, 21 February 2017, 21:39 GMT
Last edited by Laurent Carlier (lordheavy) - Thursday, 09 March 2017, 15:34 GMT
|
Details
Description: New linux-firmware version 20170217.12987ca-1
breaks amdgpu ib-ring-tests and prevents lower power states
and core clock speed lowering on an AMD Fury Sapphire Fury
4GB.
Additional info: * package version(s): 20170217.12987ca-1 * config and/or log files etc. During bootup the kernel log spits out this message: [ 7.738946] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out. [ 7.739108] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 1 (-110). [ 8.352971] [drm] RC6 on [ 8.752283] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out. [ 8.752445] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 2 (-110). [ 9.765625] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out. [ 9.765786] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 3 (-110). [ 10.778960] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out. [ 10.779122] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 4 (-110). [ 11.792299] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out. [ 11.792459] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 5 (-110). [ 12.805643] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out. [ 12.805806] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 6 (-110). [ 13.818979] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out. [ 13.819142] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 7 (-110). [ 14.832324] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out. [ 14.832486] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 8 (-110). [ 14.832606] [drm] ib test on ring 9 succeeded [ 14.832633] [drm] ib test on ring 10 succeeded [ 14.834199] [drm] ib test on ring 11 succeeded [ 14.835054] [drm] ib test on ring 12 succeeded [ 14.835130] [drm:amdgpu_device_init [amdgpu]] *ERROR* ib ring test failed (-110). Steps to reproduce: May only affect my hardware, only have this one AMD card to test, but downgrading to linux-firmware-20161222.4b9559f-2 fixes the issue, even with the 4.9.11 kernel. Perhaps the AMDGPU firmware is expecting a 4.10 kernel? Or just bugged firmware is also possible. I can use radeon-profile to change the mode of the clock from auto to low to get to 300mhz core clock, but if left in auto goes the full 1050mhz, which causes my card to start idling in the mid 70's C. So something broke with the new firmware. I was able to get it to work again by copying the /usr/lib/firmware/amdgpu/fiji* files from the 20161222.4b9559f-2 package, but it may also affect other cards too. I am limited to how I can test as I have only the one AMD card. It looks like most of the firmwares for AMDGPU were upgraded in these commits: https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/commit/?id=7a110b85a46d7f884f4ac712ff52e02ed57234bd , https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/commit/?id=51911f57cda2ce6f290123974bfbe872d1f9dd65 . So it's possible that those firmwares want new functionality from the amdgpu kernel module that is not present in the 4.9.11 one. I would recommend to revert those firmwares until you have a 4.10 kernel in stable, just to be safe. I'm sure there is at least some correlation, bad firmware, missing functionality, perhaps something else. I have also uploaded the two kernel logs with the old and new as well. |
This task depends upon
Closed by Laurent Carlier (lordheavy)
Thursday, 09 March 2017, 15:34 GMT
Reason for closing: Fixed
Additional comments about closing: linux-firmware 20170227.5abb924-1
Thursday, 09 March 2017, 15:34 GMT
Reason for closing: Fixed
Additional comments about closing: linux-firmware 20170227.5abb924-1
[imbjr@pc ~]$ sudo journalctl -b | grep drm:
Feb 21 21:39:50 pc kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 21 21:39:50 pc kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 1 (-110).
Feb 21 21:39:51 pc kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 21 21:39:51 pc kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 2 (-110).
Feb 21 21:39:52 pc kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 21 21:39:52 pc kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 3 (-110).
Feb 21 21:39:53 pc kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 21 21:39:53 pc kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 4 (-110).
Feb 21 21:39:54 pc kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 21 21:39:54 pc kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 5 (-110).
Feb 21 21:39:55 pc kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 21 21:39:55 pc kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 6 (-110).
Feb 21 21:39:56 pc kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 21 21:39:56 pc kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 7 (-110).
Feb 21 21:39:57 pc kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 21 21:39:57 pc kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 8 (-110).
Feb 21 21:39:57 pc kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* ib ring test failed (-110).
Feb 21 21:39:59 pc kernel: [drm:dce_v11_0_afmt_setmode [amdgpu]] *ERROR* Couldn't read SADs: 0
I have a 480RX card.
I have an radeon hd7770 too inside (radeon driver) but it was not a problem previously.
EDIT: Probably should file a kernel bug report for this? I'm not sure of the proper procedure here given linux-firmware is apparently a git tree only (I'm surprised for something crucial like this there aren't stable releases for a kernel.) Not sure what can be done other than reverting the one commit (https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu/carrizo_ce.bin?id=7a110b85a46d7f884f4ac712ff52e02ed57234bd) on affected kernel versions.
EDIT2: Fedora Rawhide's using 20170213-71.git6d3bc888.fc26 which is before the git commit in question that breaks AMD stuff. Might be a good revision to use instead.
Feb 22 02:00:21 westlake kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 22 02:00:21 westlake kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 1 (-110).
Feb 22 02:00:22 westlake kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 22 02:00:22 westlake kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 2 (-110).
Feb 22 02:00:23 westlake kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 22 02:00:23 westlake kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 3 (-110).
Feb 22 02:00:24 westlake kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 22 02:00:24 westlake kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 4 (-110).
Feb 22 02:00:25 westlake kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 22 02:00:25 westlake kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 5 (-110).
Feb 22 02:00:26 westlake kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 22 02:00:26 westlake kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 6 (-110).
Feb 22 02:00:27 westlake kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 22 02:00:27 westlake kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 7 (-110).
Feb 22 02:00:28 westlake kernel: [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
Feb 22 02:00:28 westlake kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 8 (-110).
Feb 22 02:00:28 westlake kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* ib ring test failed (-110).
Started occurring after updating to linux 4.9.11-1 and linux-firmware 20170217.12987ca-1. Downgrading to linux 4.9.9-1 did not work, but downgrading to linux-firmware 20161222.4b9559f-2 worked.
Downgrading to linux-firmware 20161222.4b9559f-2 and keeping linux 4.9.11-1 worked, but downgrading to linux-4.9.9-1 and keeping the latest linux-firmware did not work.
dmesg | grep -i amdgpu
[ 2.654580] [drm] amdgpu kernel modesetting enabled.
[ 2.666883] fb: switching to amdgpudrmfb from VESA VGA
[ 2.667192] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
[ 2.675184] amdgpu 0000:01:00.0: VRAM: 4096M 0x0000000000000000 - 0x00000000FFFFFFFF (4096M used)
[ 2.675185] amdgpu 0000:01:00.0: GTT: 8002M 0x0000000100000000 - 0x00000002F42587FF
[ 2.675198] [drm] amdgpu: 4096M of VRAM memory ready
[ 2.675199] [drm] amdgpu: 8002M of GTT memory ready.
[ 2.676497] amdgpu 0000:01:00.0: amdgpu: using MSI.
[ 2.676514] [drm] amdgpu: irq initialized.
[ 2.676629] amdgpu: powerplay initialized
[ 2.676781] [drm] AMDGPU Display Connectors
[ 2.682141] amdgpu 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000100000008, cpu addr 0xffff88043dcde008
[ 2.682177] amdgpu 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000100000018, cpu addr 0xffff88043dcde018
[ 2.682205] amdgpu 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000100000028, cpu addr 0xffff88043dcde028
[ 2.682231] amdgpu 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000100000038, cpu addr 0xffff88043dcde038
[ 2.682260] amdgpu 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000100000048, cpu addr 0xffff88043dcde048
[ 2.682289] amdgpu 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000100000058, cpu addr 0xffff88043dcde058
[ 2.682317] amdgpu 0000:01:00.0: fence driver on ring 6 use gpu addr 0x0000000100000068, cpu addr 0xffff88043dcde068
[ 2.682345] amdgpu 0000:01:00.0: fence driver on ring 7 use gpu addr 0x0000000100000078, cpu addr 0xffff88043dcde078
[ 2.682375] amdgpu 0000:01:00.0: fence driver on ring 8 use gpu addr 0x0000000100000088, cpu addr 0xffff88043dcde088
[ 2.683387] amdgpu 0000:01:00.0: fence driver on ring 9 use gpu addr 0x0000000100000098, cpu addr 0xffff88043dcde098
[ 2.683427] amdgpu 0000:01:00.0: fence driver on ring 10 use gpu addr 0x00000001000000a8, cpu addr 0xffff88043dcde0a8
[ 2.685277] amdgpu 0000:01:00.0: fence driver on ring 11 use gpu addr 0x000000000103e420, cpu addr 0xffffc9000225a420
[ 2.686334] amdgpu 0000:01:00.0: fence driver on ring 12 use gpu addr 0x00000001000000c8, cpu addr 0xffff88043dcde0c8
[ 2.686376] amdgpu 0000:01:00.0: fence driver on ring 13 use gpu addr 0x00000001000000d8, cpu addr 0xffff88043dcde0d8
[ 3.077677] fbcon: amdgpudrmfb (fb0) is primary device
[ 3.482899] amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device
[ 4.509501] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
[ 4.509648] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 1 (-110).
[ 5.522803] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
[ 5.522934] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 2 (-110).
[ 6.536128] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
[ 6.536259] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 3 (-110).
[ 7.549463] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
[ 7.549537] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 4 (-110).
[ 8.562862] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
[ 8.562993] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 5 (-110).
[ 9.576135] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
[ 9.576284] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 6 (-110).
[ 10.589449] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
[ 10.589581] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 7 (-110).
[ 11.602805] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
[ 11.602936] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 8 (-110).
[ 11.606134] [drm:amdgpu_device_init [amdgpu]] *ERROR* ib ring test failed (-110).
[ 11.607067] [drm] Initialized amdgpu 3.8.0 20150101 for 0000:01:00.0 on minor 0
Edit: Comment in question: https://bugs.freedesktop.org/show_bug.cgi?id=99907#c10
Sapphire Nitro+ RX480 4GB (so polaris 10 is affected too)
Linux 4.9.9 & 4.9.11 with firmware pkg 20161222.4b9559f-2 is OK.
cat /sys/class/drm/card0/device/pp_dpm_*
0: 300Mhz *
1: 1750Mhz
0: 2.5GB, x8 *
1: 5.0GB, x16
0: 300Mhz *
1: 608Mhz
2: 910Mhz
3: 1077Mhz
4: 1145Mhz
5: 1191Mhz
6: 1236Mhz
7: 1306Mhz
idle temp was 31C.
Linux 4.9.11 with firmware pkg 20170217.12987ca-1
cat /sys/class/drm/card0/device/pp_dpm_*
0: 300Mhz *
1: 1750Mhz
0: 2.5GB, x8
1: 5.0GB, x16 *
0: 300Mhz
1: 608Mhz
2: 910Mhz
3: 1077Mhz
4: 1145Mhz
5: 1191Mhz
6: 1236Mhz
7: 1306Mhz *
idle temp hit 66C.
Had similar IB test fails as above but there was no crashing and everything seemed to work fine.
I had to downgrade to linux-firmware 20161222.4b9559f-2 and Linux 4.9.9 for the system to be stable again.
No IB test fails at all, the system would hang randomly, sometimes while sddm was starting, sometimes waking monitor from standby, other times just starting an application would do it.
I could connect with SSH, there were no errors shown whatsoever, but radeontop showed 100% usage on the card.
I am running Linux 4.9.8; does 20170217.12987ca-2 require a more recent kernel?