FS#77594 - [python-pytorch-opt-rocm] crash when creating a gpu tensor
Attached to Project:
Community Packages
Opened by Jitao Lu (dianlujitao) - Tuesday, 21 February 2023, 08:20 GMT
Last edited by Toolybird (Toolybird) - Wednesday, 22 February 2023, 20:55 GMT
Opened by Jitao Lu (dianlujitao) - Tuesday, 21 February 2023, 08:20 GMT
Last edited by Toolybird (Toolybird) - Wednesday, 22 February 2023, 20:55 GMT
|
Details
Description:
My GPU is Radeon RX 580 2048SP. When creating a GPU tensor in pytorch, python crashes with the stack trace pointing to a rocm function. Additional info: * package version(s): python-pytorch-opt-rocm 1.13.1-1 * config and/or log files etc: `coredumpctl gdb` output is as follows: ``` #0 0x00007f1c234d9cd8 in () at /opt/rocm/hip/lib/libamdhip64.so.5 #1 0x00007f1c234a9d5f in () at /opt/rocm/hip/lib/libamdhip64.so.5 #2 0x00007f1c235fb2a3 in () at /opt/rocm/hip/lib/libamdhip64.so.5 #3 0x00007f1c235db3f2 in () at /opt/rocm/hip/lib/libamdhip64.so.5 #4 0x00007f1c235dd1bb in hipLaunchKernel () at /opt/rocm/hip/lib/libamdhip64.so.5 #5 0x00007f1c25399e07 in void at::native::gpu_kernel_impl<at::native::AbsFunctor<float> >(at::TensorIteratorBase&, at::native::AbsFunctor<float> const&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_hip.so #6 0x00007f1c2538f5b4 in at::native::abs_kernel_cuda(at::TensorIteratorBase&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_hip.so #7 0x00007f1c5183032e in () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so #8 0x00007f1c26bafeb2 in () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_hip.so #9 0x00007f1c51dbfc48 in at::_ops::abs_out::call(at::Tensor const&, at::Tensor&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so #10 0x00007f1c5182fa1a in at::native::abs(at::Tensor const&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so #11 0x00007f1c52430116 in () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so #12 0x00007f1c51d60889 in at::_ops::abs::redispatch(c10::DispatchKeySet, at::Tensor const&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so #13 0x00007f1c54334f9e in () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so #14 0x00007f1c54335698 in () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so #15 0x00007f1c51db49f8 in at::_ops::abs::call(at::Tensor const&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so #16 0x00007f1c517afcd9 in at::native::isfinite(at::Tensor const&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so #17 0x00007f1c525f78c6 in () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so #18 0x00007f1c5209a7b8 in at::_ops::isfinite::call(at::Tensor const&) () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so #19 0x00007f1c64777df8 in () at /usr/lib/python3.10/site-packages/torch/lib/libtorch_python.so #20 0x00007f1c70555e21 in () at /usr/lib/libpython3.10.so.1.0 #21 0x00007f1c7054f4eb in _PyObject_MakeTpCall () at /usr/lib/libpython3.10.so.1.0 #22 0x00007f1c7054a8ee in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.10.so.1.0 ...... ``` Steps to reproduce: ```python import torch a = torch.rand(5, 3) a.cuda() ``` |
This task depends upon
Closed by Toolybird (Toolybird)
Wednesday, 22 February 2023, 20:55 GMT
Reason for closing: Fixed
Additional comments about closing: python-pytorch-opt-rocm 1.13.1-2
Wednesday, 22 February 2023, 20:55 GMT
Reason for closing: Fixed
Additional comments about closing: python-pytorch-opt-rocm 1.13.1-2
Comment by
Torsten Keßler (tpkessler) -
Wednesday, 22 February 2023, 08:38 GMT
Comment by Jitao Lu (dianlujitao) -
Wednesday, 22 February 2023, 09:02 GMT
Support for Polaris GPUs was added to pytorch-rocm in [testing].
Could you please try this package?
Just tried and it works. Thanks for the update.