Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
FS#77609 - [python-pytorch-rocm] "from torch.utils.cpp_extension import ROCM_HOME" returns empty string
Attached to Project:
Community Packages
Opened by wuxxin (wuxxin) - Tuesday, 21 February 2023, 23:04 GMT
Last edited by Toolybird (Toolybird) - Wednesday, 22 February 2023, 01:57 GMT
Opened by wuxxin (wuxxin) - Tuesday, 21 February 2023, 23:04 GMT
Last edited by Toolybird (Toolybird) - Wednesday, 22 February 2023, 01:57 GMT
|
Details* this breaks detection of ROCM Version in some torch depending packages
eg.: https://github.com/microsoft/DeepSpeed/blob/b5750b64497cacbaf469f02db9cd76d9eda495c9/op_builder/builder.py#L203 Steps to reproduce: ``` python -c "from torch.utils.cpp_extension import ROCM_HOME; print(ROCM_HOME)" ``` |
This task depends upon
it applies to current community version of python-pytorch-rocm: 1.13.1-1
and most probably also to 1.13.1-2 in community testing
because the only difference is one additional gpu build architecture, which i am not using.
```ipython
In [30]: from torch.utils.cpp_extension import ROCM_HOME
...: print("ROCM_HOME:'{}', type(ROCM_HOME): '{}'".format(ROCM_HOME, type(ROCM_HOME)))
ROCM_HOME:'', type(ROCM_HOME): '<class 'str'>'
```
i looked at the source of 1.13.1 at the function `_find_rocm_home() -> Optional[str]`,
https://github.com/pytorch/pytorch/blob/49444c3e546bf240bed24a101e747422d1f8a0ee/torch/utils/cpp_extension.py#L119
and replicated the path that triggers an empty string: L125-L133
so if ROCM_HOME is not set as an environment var, it tries to guess from inferring from the real path of hipcc,
using `which`, but if hipcc is not in path, it wont throw an exception, but return ''
Line 125-133:
```
try:
pipe_hipcc = subprocess.Popen(
["which hipcc | xargs readlink -f"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
hipcc, _ = pipe_hipcc.communicate()
# this will be either <ROCM_HOME>/hip/bin/hipcc or <ROCM_HOME>/bin/hipcc
rocm_home = os.path.dirname(os.path.dirname(hipcc.decode(*SUBPROCESS_DECODE_ARGS).rstrip('\r\n')))
if os.path.basename(rocm_home) == 'hip':
rocm_home = os.path.dirname(rocm_home)
except Exception:
```
so if you have set ('ROCM_HOME') or ('ROCM_PATH') the function will take it from there,
if not it will try to find the path of hipcc and if that fails use hardcoded "/opt/rocm".
but not finding hipcc will not trigger an exception, and will return '' instead.
the code path is still present in pytorch@master, and blame shows the corresponding code lines got edited Apr 15, 2021.
+ Workaround: As a workaround setting either ROCM_HOME or ROCM_PATH as environment variable works for me
https://github.com/pytorch/pytorch/issues/95633
and a pull request
https://github.com/pytorch/pytorch/pull/95634