FS#77594 - [python-pytorch-opt-rocm] crash when creating a gpu tensor

Attached to Project: Community Packages
Opened by Jitao Lu (dianlujitao) - Tuesday, 21 February 2023, 08:20 GMT
Last edited by Toolybird (Toolybird) - Wednesday, 22 February 2023, 20:55 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Torsten Keßler (tpkessler)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

My GPU is Radeon RX 580 2048SP. When creating a GPU tensor in pytorch, python crashes with the stack trace pointing to a rocm function.

Additional info:
* package version(s): python-pytorch-opt-rocm 1.13.1-1
* config and/or log files etc:
`coredumpctl gdb` output is as follows:
```
#0 0x00007f1c234d9cd8 in () at /opt/rocm/hip/lib/libamdhip64.so.5
#1 0x00007f1c234a9d5f in () at /opt/rocm/hip/lib/libamdhip64.so.5
#2 0x00007f1c235fb2a3 in () at /opt/rocm/hip/lib/libamdhip64.so.5
#3 0x00007f1c235db3f2 in () at /opt/rocm/hip/lib/libamdhip64.so.5
#4 0x00007f1c235dd1bb in hipLaunchKernel () at /opt/rocm/hip/lib/libamdhip64.so.5
#5 0x00007f1c25399e07 in void at::native::gpu_kernel_impl<at::native::AbsFunctor<float> >(at::TensorIteratorBase&, at::native::AbsFunctor<float> const&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_hip.so
#6 0x00007f1c2538f5b4 in at::native::abs_kernel_cuda(at::TensorIteratorBase&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_hip.so
#7 0x00007f1c5183032e in () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#8 0x00007f1c26bafeb2 in () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_hip.so
#9 0x00007f1c51dbfc48 in at::_ops::abs_out::call(at::Tensor const&, at::Tensor&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#10 0x00007f1c5182fa1a in at::native::abs(at::Tensor const&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#11 0x00007f1c52430116 in () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#12 0x00007f1c51d60889 in at::_ops::abs::redispatch(c10::DispatchKeySet, at::Tensor const&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#13 0x00007f1c54334f9e in () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#14 0x00007f1c54335698 in () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#15 0x00007f1c51db49f8 in at::_ops::abs::call(at::Tensor const&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#16 0x00007f1c517afcd9 in at::native::isfinite(at::Tensor const&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#17 0x00007f1c525f78c6 in () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#18 0x00007f1c5209a7b8 in at::_ops::isfinite::call(at::Tensor const&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#19 0x00007f1c64777df8 in () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_python.so
#20 0x00007f1c70555e21 in () at /usr/lib/libpython3.10.so.1.0
#21 0x00007f1c7054f4eb in _PyObject_MakeTpCall () at /usr/lib/libpython3.10.so.1.0
#22 0x00007f1c7054a8ee in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.10.so.1.0
......
```

Steps to reproduce:

```python
import torch
a = torch.rand(5, 3)
a.cuda()
```
This task depends upon

Closed by  Toolybird (Toolybird)
Wednesday, 22 February 2023, 20:55 GMT
Reason for closing:  Fixed
Additional comments about closing:  python-pytorch-opt-rocm 1.13.1-2
Comment by Torsten Keßler (tpkessler) - Wednesday, 22 February 2023, 08:38 GMT
Support for Polaris GPUs was added to pytorch-rocm in [testing]. Could you please try this package?
Comment by Jitao Lu (dianlujitao) - Wednesday, 22 February 2023, 09:02 GMT
Just tried and it works. Thanks for the update.

Loading...