FS#60814 - [tensorflow] different behavior between Archlinux version and original

Attached to Project: Community Packages
Opened by Paolo Galeone (pgaleone) - Thursday, 15 November 2018, 09:54 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Monday, 10 December 2018, 22:54 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Sven-Hendrik Haase (Svenstaro)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

The behavior of tensorflow is different between the original and supported implementation of tensorflow (python 3.6, cuda 9, cudnn 7) and the one shipped in Archlinux (compiled with CUDA 10, cudnn
7.3 and python 3.7).

In particular, a memory leak is present in `tf.gfile` in the Archliux version, that's not present in the original one.

Moreover, the behavior of the training process of certain models changes between the archlinux and the orginal version (for instance, a generative model collapses when trained using the Archlinux version, while it works correctly when using the original version).

For additional information and the code to reproduce the issue see: https://github.com/tensorflow/tensorflow/issues/23733
This task depends upon

Closed by  Sven-Hendrik Haase (Svenstaro)
Monday, 10 December 2018, 22:54 GMT
Reason for closing:  Upstream
Comment by Sven-Hendrik Haase (Svenstaro) - Thursday, 15 November 2018, 11:03 GMT
Can you try recompiling tensorflow by yourself against cuda 10 with python 3.7 and check whether you get the same problem?
Comment by Paolo Galeone (pgaleone) - Thursday, 15 November 2018, 11:13 GMT
Right now I can't since I had to downgrade cuda, cudnn and work inside a virtualenev to continue working. However, maybe during the weekend but I can't guarantee, I could give a try.
Comment by Sven-Hendrik Haase (Svenstaro) - Sunday, 09 December 2018, 09:40 GMT
Does this still exist now?
Comment by Paolo Galeone (pgaleone) - Monday, 10 December 2018, 17:10 GMT
Yes, but is not your fault probably. In fact, I found the same bug / memory leak also when using python 3.6 and tensorflow installed using coda, in a conda environment. More info here: https://github.com/tensorflow/tensorflow/issues/23733
Comment by Sven-Hendrik Haase (Svenstaro) - Monday, 10 December 2018, 22:53 GMT
Alright, thanks! In that case, I don't think it makes much sense to track this bug here downstream as this is just one bug of many and it's nothing that we can fix here.

Loading...