FS#65202 - [python-pytorch-opt-cuda] incompatible nccl
Attached to Project:
Community Packages
Opened by Yuxin Wu (ppwwyyxx) - Sunday, 19 January 2020, 08:41 GMT
Last edited by Konstantin Gizdov (kgizdov) - Monday, 27 January 2020, 19:57 GMT
Opened by Yuxin Wu (ppwwyyxx) - Sunday, 19 January 2020, 08:41 GMT
Last edited by Konstantin Gizdov (kgizdov) - Monday, 27 January 2020, 19:57 GMT
|
Details
Description:
Cannot import torch. Steps to reproduce: Install python-pytorch-opt-cuda 1.4.0-1 from testing. Run ``` $python -c 'import torch' Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/lib/python3.8/site-packages/torch/__init__.py", line 81, in <module> from torch._C import * ImportError: /usr/lib/python3.8/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZN5torch4cuda4nccl6detail16throw_nccl_errorE12ncclResult_t ``` |
This task depends upon
Closed by Konstantin Gizdov (kgizdov)
Monday, 27 January 2020, 19:57 GMT
Reason for closing: Fixed
Additional comments about closing: python-pytorch 1.4.0-4
Monday, 27 January 2020, 19:57 GMT
Reason for closing: Fixed
Additional comments about closing: python-pytorch 1.4.0-4
```
~ » python
Python 3.8.1 (default, Jan 8 2020, 23:09:20)
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/site-packages/torch/__init__.py", line 81, in <module>
from torch._C import *
ImportError: /usr/lib/python3.8/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZN5torch4cuda4nccl6detail16throw_nccl_errorE12ncclResult_t
```
It works fine on the non-cuda version, but this is pretty useless for most of our purposes.
torch_cuda_api.patch (1.2 KiB)
@hgaiser, the pip version will work, because it is using the built-in nccl and the thus it never needs to export the symbols correctly. This actually proves the bug.
@otaj, good catch - I was creating the patch from the wrong branch. However, this proves 1.4.0 release does indeed have a bug where the symbol is not exported properly. I have updated the patch (attached) and will soon be in the repo.