FS#65176 - [python-pytorch] nn.DataParallel" causes "NCCL Error 4: invalid argument
Attached to Project:
Community Packages
Opened by Cat (lasercat) - Thursday, 16 January 2020, 05:49 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Wednesday, 22 January 2020, 02:34 GMT
Opened by Cat (lasercat) - Thursday, 16 January 2020, 05:49 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Wednesday, 22 January 2020, 02:34 GMT
|
Details
Description:
nn.Dataparallel does not work with the NCCL 2.5.6 This seems to be fixed in 1.4 according to the upstream PR, released 6 hrs ago. https://github.com/pytorch/pytorch/releases/tag/v1.4.0 Additional info: * package version(s) python-pytorch-cuda 1.3.1-7 * config and/or log files etc. * link to upstream bug report, if any Also reported in upstream pull request https://github.com/pytorch/pytorch/pull/29014 Steps to reproduce: |
This task depends upon
Closed by Sven-Hendrik Haase (Svenstaro)
Wednesday, 22 January 2020, 02:34 GMT
Reason for closing: Fixed
Wednesday, 22 January 2020, 02:34 GMT
Reason for closing: Fixed
It should be "nn.DataParallel" causes "NCCL Error 4: invalid argument"