FS#66503 - [python-tensorflow-cuda] Python package should be named tensorflow not tensorflow-gpu

Attached to Project: Community Packages
Opened by Eric Langlois (elanglois) - Saturday, 02 May 2020, 02:08 GMT
Last edited by freswa (frederik) - Monday, 04 May 2020, 22:45 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Sven-Hendrik Haase (Svenstaro)
Konstantin Gizdov (kgizdov)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
As of version 2.0, both the CPU and GPU versions of tensorflow are distributed under the python package name "tensorflow".
See https://www.tensorflow.org/install/pip

The current behaviour of naming the python package "tensorflow-gpu" causes python dependency checks to fail for any other packages that depend on "tensorflow".
It also means that this package isn't fully providing "python-tensorflow" (since pip views them differently, although "import tensorflow" still works)

I notice that there is still a tensorflow-gpu pip package but given that it is not mentioned in the official installation instructions and causes dependency trouble I think it would be best to use the "tensorflow" name.

Additional info:
* Package version: 2.2.0rc3-2

Steps to reproduce:
pacman -S python-tensorflow-cuda
pip show tensorflow
> WARNING: Package(s) not found: tensorflow
pip show tensorflow-gpu
[package info]

pacman -S python-tensorflow-serving-api
pip check
> tensorflow-serving-api 2.1.0 requires tensorflow, which is not installed.
This task depends upon

Closed by  freswa (frederik)
Monday, 04 May 2020, 22:45 GMT
Reason for closing:  Upstream
Additional comments about closing:  Eric: "Upstream problem (really, setup.py problem for not having a provides= functionality)"
Comment by Konstantin Gizdov (kgizdov) - Saturday, 02 May 2020, 17:54 GMT
Hi, this is a non-issue for us. We are not naming the package anything and the package name in the repos has nothing to do with what pip sees.

On Arch, you should not use pip for system packages as it conflicts and messes up the system. In fact, if you've used it before, your system is basically in a non-supported state. This is all documented in the wiki.

I can confidently say that the pip name of tensorflow is solely determined by the way it's built - if you enable cuda, it calls itself tensorflow-gpu, otherwise it calls itself tensorflow. This behaviour is completely controlled by upstream. If what you say is indeed and issue then it's an issue with upstream. You should open an issue with them.

Unless something, you found, has changed and there is another way of specifying how tensorflow should be seen by pip - them you should let us now. Cheers.
Comment by Eric Langlois (elanglois) - Saturday, 02 May 2020, 18:26 GMT
Thanks for your reply, I'll look into the upstream behaviour.

I do want to clarify that I am not using pip for installing system packages but I do use `pip install --user` and virtual environments with visible system packages and for both of those the system package names are relevant. Installing a package that depends on `tensorflow` to a user directory causes pip to download and install a second gpu-enabled version of tensorflow even though one is already present on the system.
Comment by Konstantin Gizdov (kgizdov) - Monday, 04 May 2020, 10:56 GMT
well, I guess we can patch it to provide both - however, that might have its side effects too. I am however wondering about this whole thing, because tensorboard specifically works differently based on the fact whether it sees `tensorflow` or `tensorflow-gpu` installed and provided by pip. Could you be more specific which packages require what, because it might be an issue with those packages instead? Thanks.
Comment by Eric Langlois (elanglois) - Monday, 04 May 2020, 18:00 GMT
I surveyed the official packages that depend on tensorflow and a handful of AUR packages based on how they deal with the tensorflow requirement upstream.

# Missing Requirement
In these packages, setup.py does not list `tensorflow` or `tensorflow-gpu` as
requirements despite being necessary. Documentation often instructs the user to install tensorflow.
Most major packages take this approach because of of all the names tensorflow
can be installed under using pip: tensorflow/tensorflow-gpu/tf_nightly/...

* tensorboard: https://www.archlinux.org/packages/community/x86_64/tensorboard/
> Searching the repo only shows a mention of tensorflow-gpu in `diagnose_tensorboard.py`: https://github.com/tensorflow/tensorboard/search?q=tensorflow-gpu&unscoped_q=tensorflow-gpu
* python-tflearn: https://aur.archlinux.org/packages/python-tflearn/
* python-sonnet: https://aur.archlinux.org/packages/python-sonnet-cuda-git/
* python-tslearn: https://aur.archlinux.org/packages/python-tslearn-git/
* python-tensorflow-probability: https://aur.archlinux.org/packages/python-tensorflow-probability/
* python-tensorflow-compression: https://aur.archlinux.org/packages/python-tensorflow-compression-git/
> uses `build_pip_package.py` instead of `setup.py`
* python-keras: https://aur.archlinux.org/packages/python-keras/
> Uses either tensorflow, torch or CNTK
> setup.py does not list any of these as dependencies

# Dynamic checks
* python-tensorflow-serving-api: https://www.archlinux.org/packages/community/any/python-tensorflow-serving-api/
> dynamically require `tensorflow` or `tensorflow-gpu` based on whether it's being build under the name `tensorflow-serving-api-gpu`
> So far this is the only one I found that would actually have a broken non-optional dependency if TF were packaged as `tensorflow` rather than `tensorflow-gpu`. It's unfortunate that this occurs in one of the package officially distributed by the tensorflow team.
> I don't know if there are any functionality differences between `tensorflow-serving-api` and `tensorflow-serving-api-gpu`.

* python-gpflow: https://aur.archlinux.org/packages/python-gpflow/
> setup.py dynamically decide whether to require `tensorflow` based on whether `import tensorflow` works and its version

* openai-baselines: https://github.com/openai/baselines/blob/master/setup.py
> setup.py does not list `tensorflow` or `tensorflow-gpu` as requirements,
> has assertion check that tensorflow can be imported and version is high enough

# Optional requirements
* python-sonnet: https://aur.archlinux.org/packages/python-sonnet-cuda-git/
> Optional requirements on `tensorflow` and `tensorflow-gpu`

* python-pymanopt: https://aur.archlinux.org/packages/python-pymanopt/
> Optional requirement on `tensorflow`

# Requires `tensorflow`
These packages likely currently trigger some dependency check failures or duplicate TF installs for a certain fraction of upstream users who installed tensorflow-gpu/tf_nightly/...
While I don't know the author motivation, I suspect some combination of: targeting TF>=2, doing things the "normal" pip way, expecting users to ignore/modify the checks if using a different TF package, lack of awareness, or expecting installs in a controlled environment (conda, virtualenv, ...)

* python-ludwig: https://aur.archlinux.org/packages/python-ludwig/
* python-neupy: https://aur.archlinux.org/packages/python-neupy/
* spinningup: (not on AUR) https://github.com/openai/spinningup/
* garage: (not on AUR) https://github.com/rlworkgroup/garage/

So the summary is:
* most packages don't care because they falsely report missing/optional requirement in the package
* some packages decide dynamically during build, which is fine for user-built packages but might cause check failures when distributing pre-build packages
* some packages depend on `tensorflow`, I think it would be nice if the pip dependency checks passed for them on Arch.
* `tensorflow-serving-api-gpu` is the only package I've found to absolutely require `tensorflow-gpu`, I don't know how it differs from `tensorflow-serving-api` (which also gets installed)

It's not really clear to me whether the TF>=2.1 [1] use of the name `tensorflow` for both CPU & GPU is an intentional effort to resolve the dependency mess or not. Only mentioning `pip install tensorflow` seems like a sign in favour.
I hope that it is, and that upstream packages from now own can depend on `tensorflow` and have things work, in which case I think arch should name the package `tensorflow` (or provide both) even if upstream packages haven't yet removed their work-arounds
If not, then presumably the dependency problems will continue upstream and a setup.py requirement on `tensorflow` could be considered an upstream bug.

I didn't check whether any packages had runtime behaviour dependencies on tensorflow vs tensorflow-gpu but it seems more likely to me that they would use runtime device availability checks rather than package name checks. Do you have any more details about how tensorboard depends on the packages? I didn't find any significant mention of `tensorflow-gpu` in the tensorboard source but I might have missed something.
In terms of build requirements, I didn't find anything that would have a problem with both tensorflow and tensorflow-gpu being available (and only `tensorflow-serving-api-gpu` would have a problem if only `tensorflow` was available, but both the cpu and gpu versions are installed in arch so at least one of them has pip check failures right now)

[1]: Turns out it was actually the 2.1 release that included GPU in the `tensorflow` package https://github.com/tensorflow/tensorflow/releases/tag/v2.1.0
Comment by Konstantin Gizdov (kgizdov) - Monday, 04 May 2020, 20:02 GMT
So if I understand correctly the problem is solely with `python-tensorflow-serving-api`. It's worth to have a look at what can be done. For the rest, it seems to be a problem with their dependency implementation and if you use them, it's worth messaging their maintainers to see what they can offer as advice.

Are you ok with me closing this issue and you make a smaller more specific on for `python-tensorflow-serving-api` referencing the dependency mismatch?
Comment by Eric Langlois (elanglois) - Monday, 04 May 2020, 22:19 GMT
I'm not sure that the other packages that just depend on "tensorflow" are necessarily in the wrong, because as of >=2.1 that works for installing them using pip with either CPU or GPU support. So I think ideally that would work for installing on Arch too. (In my commentary on why they're like that I was thinking more about the fact that they also existed pre-2.1)

I agree that there's also an issue python-tensorflow-serving-api so I'll report that

I'm OK with closing this issue with the argument being that TF calls itself tensorflow-gpu when compiled as GPU so it's an upstream problem.
This is not a huge issue on the Arch side and if TF wants to really resolve the dependency issues then it should commit to always using "tensorflow".

My understanding of the issue:
1. TF python package names are a mess <2.1 so python packages effectively cannot report tensorflow as a python dependency
2. TF upstream has potentially fixed this in 2.1 by including GPU support in "tensorflow" and recommending "pip install tensorflow" instead of "pip install tensorflow-gpu", thereby allowing all packages to depend on "tensorflow"
3. However, TF upstream has not committed to this fix since they still provide "tensorflow-gpu", apparently names itself "tensorflow-gpu", and at least one other official TF package (dynamically) depends on "tensorflow-gpu"
4. This package consequently ends up on the wrong side (IMHO) by building "tensorflow-gpu" so user installing any python packages that require "tensorflow" (which now works in pip-land for installing with GPU support) will produce dependency errors or duplicate installs on Arch.
Comment by Eric Langlois (elanglois) - Monday, 04 May 2020, 22:34 GMT
I discovered that on the TF Repo readme [1] they suggest installing "tensorflow-cpu" for CPU-only support so it looks like there is no end in sight for dependency name mismatches and simply requiring "tensorflow" is _not_ going to be a correct pip-land solution in general, which means the current behaviour of this package is fine. (If only setup.py had a provides= section like PKGBUILDs...)

[1]: https://github.com/tensorflow/tensorflow

Loading...