FS#60814 - [tensorflow] different behavior between Archlinux version and original
Attached to Project:
Community Packages
Opened by Paolo Galeone (pgaleone) - Thursday, 15 November 2018, 09:54 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Monday, 10 December 2018, 22:54 GMT
Opened by Paolo Galeone (pgaleone) - Thursday, 15 November 2018, 09:54 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Monday, 10 December 2018, 22:54 GMT
|
Details
Description:
The behavior of tensorflow is different between the original and supported implementation of tensorflow (python 3.6, cuda 9, cudnn 7) and the one shipped in Archlinux (compiled with CUDA 10, cudnn 7.3 and python 3.7). In particular, a memory leak is present in `tf.gfile` in the Archliux version, that's not present in the original one. Moreover, the behavior of the training process of certain models changes between the archlinux and the orginal version (for instance, a generative model collapses when trained using the Archlinux version, while it works correctly when using the original version). For additional information and the code to reproduce the issue see: https://github.com/tensorflow/tensorflow/issues/23733 |
This task depends upon
Closed by Sven-Hendrik Haase (Svenstaro)
Monday, 10 December 2018, 22:54 GMT
Reason for closing: Upstream
Monday, 10 December 2018, 22:54 GMT
Reason for closing: Upstream
Comment by
Sven-Hendrik Haase (Svenstaro) -
Thursday, 15 November 2018, 11:03 GMT
Comment by
Paolo Galeone (pgaleone) -
Thursday, 15 November 2018, 11:13 GMT
Comment by
Sven-Hendrik Haase (Svenstaro) -
Sunday, 09 December 2018, 09:40 GMT
Comment by
Paolo Galeone (pgaleone) - Monday,
10 December 2018, 17:10 GMT
Comment by
Sven-Hendrik Haase (Svenstaro) -
Monday, 10 December 2018, 22:53 GMT
Can you try recompiling tensorflow by yourself against cuda 10
with python 3.7 and check whether you get the same problem?
Right now I can't since I had to downgrade cuda, cudnn and work
inside a virtualenev to continue working. However, maybe during
the weekend but I can't guarantee, I could give a try.
Does this still exist now?
Yes, but is not your fault probably. In fact, I found the same bug
/ memory leak also when using python 3.6 and tensorflow installed
using coda, in a conda environment. More info here:
https://github.com/tensorflow/tensorflow/issues/23733
Alright, thanks! In that case, I don't think it makes much sense
to track this bug here downstream as this is just one bug of many
and it's nothing that we can fix here.