FS#77609 - [python-pytorch-rocm] "from torch.utils.cpp_extension import ROCM_HOME" returns empty string

Attached to Project: Community Packages
Opened by wuxxin (wuxxin) - Tuesday, 21 February 2023, 23:04 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:05 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Torsten Keßler (tpkessler)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

* this breaks detection of ROCM Version in some torch depending packages

eg.: https://github.com/microsoft/DeepSpeed/blob/b5750b64497cacbaf469f02db9cd76d9eda495c9/op_builder/builder.py#L203

Steps to reproduce:
```
python -c "from torch.utils.cpp_extension import ROCM_HOME; print(ROCM_HOME)"
```
This task depends upon

Closed by  Buggy McBugFace (bugbot)
Saturday, 25 November 2023, 20:05 GMT
Reason for closing:  Moved
Additional comments about closing:  https://gitlab.archlinux.org/archlinux/p ackaging/packages/python-pytorch/issues/ 1
Comment by Toolybird (Toolybird) - Wednesday, 22 February 2023, 01:26 GMT
You haven't specified the pkg version number. It's the *most* important detail in a bug report! (there's also a version in [testing])
Comment by wuxxin (wuxxin) - Wednesday, 22 February 2023, 01:42 GMT
sry, my bad, also i thought this would be catched by the pkg to bugtracker "add bug" link.

it applies to current community version of python-pytorch-rocm: 1.13.1-1
and most probably also to 1.13.1-2 in community testing
because the only difference is one additional gpu build architecture, which i am not using.

Comment by Torsten Keßler (tpkessler) - Thursday, 23 February 2023, 18:39 GMT
What's the output of the command in your case? For me, it prints /opt/rocm
Comment by wuxxin (wuxxin) - Thursday, 23 February 2023, 19:30 GMT
for me it prints an empty string.


```ipython
In [30]: from torch.utils.cpp_extension import ROCM_HOME
...: print("ROCM_HOME:'{}', type(ROCM_HOME): '{}'".format(ROCM_HOME, type(ROCM_HOME)))
ROCM_HOME:'', type(ROCM_HOME): '<class 'str'>'
```

i looked at the source of 1.13.1 at the function `_find_rocm_home() -> Optional[str]`,
https://github.com/pytorch/pytorch/blob/49444c3e546bf240bed24a101e747422d1f8a0ee/torch/utils/cpp_extension.py#L119

and replicated the path that triggers an empty string: L125-L133

so if ROCM_HOME is not set as an environment var, it tries to guess from inferring from the real path of hipcc,
using `which`, but if hipcc is not in path, it wont throw an exception, but return ''

Line 125-133:
```
try:
pipe_hipcc = subprocess.Popen(
["which hipcc | xargs readlink -f"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
hipcc, _ = pipe_hipcc.communicate()
# this will be either <ROCM_HOME>/hip/bin/hipcc or <ROCM_HOME>/bin/hipcc
rocm_home = os.path.dirname(os.path.dirname(hipcc.decode(*SUBPROCESS_DECODE_ARGS).rstrip('\r\n')))
if os.path.basename(rocm_home) == 'hip':
rocm_home = os.path.dirname(rocm_home)
except Exception:

```

so if you have set ('ROCM_HOME') or ('ROCM_PATH') the function will take it from there,
if not it will try to find the path of hipcc and if that fails use hardcoded "/opt/rocm".

but not finding hipcc will not trigger an exception, and will return '' instead.

the code path is still present in pytorch@master, and blame shows the corresponding code lines got edited Apr 15, 2021.


+ Workaround: As a workaround setting either ROCM_HOME or ROCM_PATH as environment variable works for me
Comment by wuxxin (wuxxin) - Thursday, 23 February 2023, 19:33 GMT
Comment by Torsten Keßler (tpkessler) - Friday, 24 February 2023, 16:17 GMT
Oh! /opt/rocm/bin is part of my PATH therefore the command worked in my case. This is actually a problem that should be reported upstream.
Comment by wuxxin (wuxxin) - Monday, 27 February 2023, 19:01 GMT
ack, i made an upstream bug for pytorch@master

https://github.com/pytorch/pytorch/issues/95633

and a pull request

https://github.com/pytorch/pytorch/pull/95634
Comment by wuxxin (wuxxin) - Wednesday, 28 June 2023, 20:13 GMT
pull request got merged,
same patch would probably apply to all forthcoming pytorch versions
until pytorch@master makes its way to stable
Comment by wuxxin (wuxxin) - Saturday, 07 October 2023, 02:04 GMT
fix:
https://github.com/pytorch/pytorch/commit/e140c9cc92c4ea279649a784f8bf5feabc627260

is included in 2.1.0:
https://github.com/pytorch/pytorch/releases/tag/v2.1.0

which was released 4.Oct.2023.

once arch package python-pytorch-rocm is updated to 2.1.0, this can be closed.

Loading...