FS#77609 - [python-pytorch-rocm] "from torch.utils.cpp_extension import ROCM_HOME" returns empty string
Attached to Project:
Community Packages
Opened by wuxxin (wuxxin) - Tuesday, 21 February 2023, 23:04 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:05 GMT
Opened by wuxxin (wuxxin) - Tuesday, 21 February 2023, 23:04 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:05 GMT
|
Details
* this breaks detection of ROCM Version in some torch
depending packages
eg.: https://github.com/microsoft/DeepSpeed/blob/b5750b64497cacbaf469f02db9cd76d9eda495c9/op_builder/builder.py#L203 Steps to reproduce: ``` python -c "from torch.utils.cpp_extension import ROCM_HOME; print(ROCM_HOME)" ``` |
This task depends upon
Closed by Buggy McBugFace (bugbot)
Saturday, 25 November 2023, 20:05 GMT
Reason for closing: Moved
Additional comments about closing: https://gitlab.archlinux.org/archlinux/p ackaging/packages/python-pytorch/issues/ 1
Saturday, 25 November 2023, 20:05 GMT
Reason for closing: Moved
Additional comments about closing: https://gitlab.archlinux.org/archlinux/p ackaging/packages/python-pytorch/issues/ 1
it applies to current community version of python-pytorch-rocm: 1.13.1-1
and most probably also to 1.13.1-2 in community testing
because the only difference is one additional gpu build architecture, which i am not using.
```ipython
In [30]: from torch.utils.cpp_extension import ROCM_HOME
...: print("ROCM_HOME:'{}', type(ROCM_HOME): '{}'".format(ROCM_HOME, type(ROCM_HOME)))
ROCM_HOME:'', type(ROCM_HOME): '<class 'str'>'
```
i looked at the source of 1.13.1 at the function `_find_rocm_home() -> Optional[str]`,
https://github.com/pytorch/pytorch/blob/49444c3e546bf240bed24a101e747422d1f8a0ee/torch/utils/cpp_extension.py#L119
and replicated the path that triggers an empty string: L125-L133
so if ROCM_HOME is not set as an environment var, it tries to guess from inferring from the real path of hipcc,
using `which`, but if hipcc is not in path, it wont throw an exception, but return ''
Line 125-133:
```
try:
pipe_hipcc = subprocess.Popen(
["which hipcc | xargs readlink -f"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
hipcc, _ = pipe_hipcc.communicate()
# this will be either <ROCM_HOME>/hip/bin/hipcc or <ROCM_HOME>/bin/hipcc
rocm_home = os.path.dirname(os.path.dirname(hipcc.decode(*SUBPROCESS_DECODE_ARGS).rstrip('\r\n')))
if os.path.basename(rocm_home) == 'hip':
rocm_home = os.path.dirname(rocm_home)
except Exception:
```
so if you have set ('ROCM_HOME') or ('ROCM_PATH') the function will take it from there,
if not it will try to find the path of hipcc and if that fails use hardcoded "/opt/rocm".
but not finding hipcc will not trigger an exception, and will return '' instead.
the code path is still present in pytorch@master, and blame shows the corresponding code lines got edited Apr 15, 2021.
+ Workaround: As a workaround setting either ROCM_HOME or ROCM_PATH as environment variable works for me
https://github.com/pytorch/pytorch/issues/95633
and a pull request
https://github.com/pytorch/pytorch/pull/95634
same patch would probably apply to all forthcoming pytorch versions
until pytorch@master makes its way to stable
https://github.com/pytorch/pytorch/commit/e140c9cc92c4ea279649a784f8bf5feabc627260
is included in 2.1.0:
https://github.com/pytorch/pytorch/releases/tag/v2.1.0
which was released 4.Oct.2023.
once arch package python-pytorch-rocm is updated to 2.1.0, this can be closed.