FS#79815 - [python-pytorch-rocm-2.0.1-10] has support for gfx906 only

Attached to Project: Arch Linux
Opened by c (grinness) - Friday, 29 September 2023, 08:35 GMT
Last edited by Toolybird (Toolybird) - Friday, 29 September 2023, 08:48 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To No-one
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

Starting with python-pytorch-rocm-2.0.1-9 and including python-pytorch-rocm-2.0.1-10 the package has support for gfx906 only -- any other GPU (gfx 10XX and gfx 11XX tested) will segfault

With a rx 6800 (gfx1030), pytorch segfault with the following error:

:1:hip_code_object.cpp :505 : 0281284311 us: 4168 : [tid:0x7fc6a466e740] hipErrorNoBinaryForGpu: Unable to find code object for all current devices!
:1:hip_code_object.cpp :507 : 0281284321 us: 4168 : [tid:0x7fc6a466e740] Devices:
:1:hip_code_object.cpp :509 : 0281284323 us: 4168 : [tid:0x7fc6a466e740] amdgcn-amd-amdhsa--gfx1030 - [Not Found]
:1:hip_code_object.cpp :514 : 0281284324 us: 4168 : [tid:0x7fc6a466e740] Bundled Code Objects:
:1:hip_code_object.cpp :530 : 0281284326 us: 4168 : [tid:0x7fc6a466e740] host-x86_64-unknown-linux - [Unsupported]
:1:hip_code_object.cpp :527 : 0281284328 us: 4168 : [tid:0x7fc6a466e740] hipv4-amdgcn-amd-amdhsa--gfx906 - [code object targetID is amdgcn-amd-amdhsa--gfx906]
:1:hip_code_object.cpp :534 : 0281284330 us: 4168 : [tid:0x7fc6a466e740] hipErrorNoBinaryForGpu: Unable to find code object for all current devices! - 209
:1:hip_fatbin.cpp :267 : 0281284333 us: 4168 : [tid:0x7fc6a466e740] hipErrorNoBinaryForGpu: Couldn't find binary for current devices! - 209


Additional info:
* package version(s)
python-pytorch-rocm-2.0.1-10

ROCm stack 5.6.1

Steps to reproduce:

The following code:
----
#!/usr/bin/env python
# coding: utf-8

import torch


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if device.type == 'cuda':
print(torch.cuda.get_device_name(0))
else: print('NO GPU!')
----

Produces:
----
> AMD_LOG_LEVEL=1 python ./test-init-torch.py
:1:hip_code_object.cpp :505 : 0800220342 us: 4710 : [tid:0x7f4208097740] hipErrorNoBinaryForGpu: Unable to find code object for all current devices!
:1:hip_code_object.cpp :507 : 0800220352 us: 4710 : [tid:0x7f4208097740] Devices:
:1:hip_code_object.cpp :509 : 0800220354 us: 4710 : [tid:0x7f4208097740] amdgcn-amd-amdhsa--gfx1030 - [Not Found]
:1:hip_code_object.cpp :514 : 0800220356 us: 4710 : [tid:0x7f4208097740] Bundled Code Objects:
:1:hip_code_object.cpp :530 : 0800220358 us: 4710 : [tid:0x7f4208097740] host-x86_64-unknown-linux - [Unsupported]
:1:hip_code_object.cpp :527 : 0800220360 us: 4710 : [tid:0x7f4208097740] hipv4-amdgcn-amd-amdhsa--gfx906 - [code object targetID is amdgcn-amd-amdhsa--gfx906]
:1:hip_code_object.cpp :534 : 0800220362 us: 4710 : [tid:0x7f4208097740] hipErrorNoBinaryForGpu: Unable to find code object for all current devices! - 209
:1:hip_fatbin.cpp :267 : 0800220365 us: 4710 : [tid:0x7f4208097740] hipErrorNoBinaryForGpu: Couldn't find binary for current devices! - 209
AMD Radeon RX 6800

See also discussion in:
https://bugs.archlinux.org/task/79725#comment222303
This task depends upon

Closed by  Toolybird (Toolybird)
Friday, 29 September 2023, 08:48 GMT
Reason for closing:  Duplicate
Additional comments about closing:  How many tickets do we need? Please use the existing ticket  FS#79725 . If any edits are needed, please mention in the comments.

Loading...