FS#79523 - [tensorflow] tensorflow_cc links against invalid version of openmp

Attached to Project: Arch Linux
Opened by Arsen (menkaur) - Thursday, 31 August 2023, 00:52 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:19 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Sven-Hendrik Haase (Svenstaro)
Konstantin Gizdov (kgizdov)
Architecture x86_64
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:


Additional info:
* package version(s)
tensorflow-opt-cuda 2.13.0-1


Description:
When compiling c++ code that references tensorflow_cc.so library, following linking errors occur:
/usr/bin/ld: /usr/lib/libtensorflow_cc.so: undefined reference to `omp_in_parallel@VERSION'
/usr/bin/ld: /usr/lib/libtensorflow_cc.so: undefined reference to `GOMP_barrier@VERSION'
/usr/bin/ld: /usr/lib/libtensorflow_cc.so: undefined reference to `omp_get_max_threads@VERSION'
/usr/bin/ld: /usr/lib/libtensorflow_framework.so: undefined reference to `kmp_set_blocktime@VERSION'
/usr/bin/ld: /usr/lib/libtensorflow_cc.so: undefined reference to `omp_get_num_threads@VERSION'
/usr/bin/ld: /usr/lib/libtensorflow_cc.so: undefined reference to `omp_get_thread_num@VERSION'
/usr/bin/ld: /usr/lib/libtensorflow_cc.so: undefined reference to `GOMP_parallel@VERSION'
/usr/bin/ld: /usr/lib/libtensorflow_framework.so: undefined reference to `omp_set_num_threads@VERSION'
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/my_project.dir/build.make:99: my_project] Error 1
make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/my_project.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

These errors are new and did not occur in the previous version of the package I had installed

I wrote a script to search for libraries referencing one of these symbols:
File: /usr/lib/libcaffe2_detectron_ops_gpu.so
109: 0000000000000000 0 FUNC GLOBAL DEFAULT UND omp_in_parallel@OMP_1.0 (22)
---------
File: /usr/lib/libdnnl.so
68: 0000000000000000 0 FUNC GLOBAL DEFAULT UND omp_in_parallel@OMP_1.0 (21)
---------
File: /usr/lib/libgomp.so
208: 0000000000017140 24 FUNC GLOBAL DEFAULT 15 omp_in_parallel@@OMP_1.0
---------
File: /usr/lib/libtensorflow_cc.so
2702: 0000000000000000 0 FUNC GLOBAL DEFAULT UND omp_in_parallel@VERSION (42)
---------
File: /usr/lib/libtensorflow.so
1561: 0000000000000000 0 FUNC GLOBAL DEFAULT UND omp_in_parallel@VERSION (11)
---------
File: /usr/lib/libtorch_cpu.so
213: 0000000000000000 0 FUNC GLOBAL DEFAULT UND omp_in_parallel@OMP_1.0 (29)
---------
and it looks like in tensorflow libraries, library version is invalid for some reason (@VERSION where @OMP_1.0 would probably work)
This doesn't interfere with python code loading tensorflow libraries, but c++ code becomes inoperable

Steps to reproduce:
main.cpp:
#include <tensorflow/cc/client/client_session.h>
#include <tensorflow/cc/ops/standard_ops.h>
#include <tensorflow/core/framework/tensor.h>
#include <tensorflow/core/public/session.h>

using namespace tensorflow;
using namespace tensorflow::ops;

int main() {
// tensorflow::data::DataseT::
using namespace tensorflow;
using namespace tensorflow::ops;

// Create a root scope.
Scope root = Scope::NewRootScope();

// Create a constant tensor of shape {2, 2}.
auto a = Const(root, {{1.f, 2.f}, {3.f, 4.f}});

// Create a graph.
GraphDef graph_def;
TF_CHECK_OK(root.ToGraphDef(&graph_def));

// Create a session and associate the graph with it.
SessionOptions session_options;
std::unique_ptr<Session> session(NewSession(session_options));
TF_CHECK_OK(session->Create(graph_def));

std::vector<Tensor> outputs;

// Evaluate the tensor `a`.
TF_CHECK_OK(
session->Run({/* No inputs */}, {a.node()->name()}, {}, &outputs));

// Print the result.
LOG(INFO) << outputs[0].matrix<float>();

// Close the session.

auto status = session->Close();
if (!status.ok()) {
std::cerr << "Error closing session: " << status.ToString()
<< std::endl;
}

return 0;
}

CMakeLists.txt:
cmake_minimum_required(VERSION 3.8)
project(my_project)

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

# tensorflow so files
find_library(TENSORFLOW_CC
NAMES tensorflow_cc
HINTS "/usr/lib/"
)
find_library(TENSORFLOW_FRAMEWORK
NAMES tensorflow_framework
HINTS "/usr/lib/"
)

add_executable(my_project main.cpp)

#adding library to the target in a way that doesn't generate warnings
target_include_directories(my_project SYSTEM PRIVATE /usr/include/tensorflow)

target_link_libraries(my_project
${TENSORFLOW_CC}
${TENSORFLOW_FRAMEWORK}
)
This task depends upon

Closed by  Buggy McBugFace (bugbot)
Saturday, 25 November 2023, 20:19 GMT
Reason for closing:  Moved
Additional comments about closing:  https://gitlab.archlinux.org/archlinux/p ackaging/packages/tensorflow/issues/2
Comment by Toolybird (Toolybird) - Thursday, 31 August 2023, 02:25 GMT
openmp was recently upgraded to 16.0.6 in line with the rest of the LLVM suite. I guess tensorflow might need a rebuild..
Comment by Arsen (menkaur) - Thursday, 31 August 2023, 20:44 GMT
Quick update. I was experimenting with patching the .so files and found a few more inconsistencies in the functions they reference
75: 0000000000000000 0 FUNC GLOBAL DEFAULT UND kmp_set_blocktime@VERSION (20)
this one is not in any of the libraries in /usr/lib/ . I'm not quite sure where this one should be imported from, looks like it's from intel's openmp which is not available in any repositories or on aur
And following functions won't be imported by simply changing VERSION to OMP_1.0
File: /usr/lib/libgomp.so
246: 0000000000016de0 143 FUNC GLOBAL DEFAULT 15 GOMP_parallel@@GOMP_4.0
File: /usr/lib/libgomp.so
151: 000000000000d910 33 FUNC GLOBAL DEFAULT 15 GOMP_barrier@@GOMP_1.0
Comment by Arsen (menkaur) - Friday, 01 September 2023, 00:41 GMT
Another issue I've noticed is that tensorflow and keras have different versions, which (likely) cause following exception when using lstm models:
AttributeError: Exception encountered when calling layer "lstm" (type LSTM).

module 'tensorflow.compat.v2.__internal__.function' has no attribute 'defun_with_attributes'

>>> print(tf.__version__, keras.__version__)
2.13.0 2.12.0
Comment by loqs (loqs) - Friday, 01 September 2023, 21:05 GMT
> openmp was recently upgraded to 16.0.6 in line with the rest of the LLVM suite. I guess tensorflow might need a rebuild..
Unfortunately tensorflow does not currently build. Issue seems to be https://github.com/tensorflow/tensorflow/issues/60398 which may be fixed by https://github.com/tensorflow/tensorflow/commit/86daa4eef029512503d50af8c1fdf99bd87827e9 (not yet tested)
Comment by loqs (loqs) - Saturday, 02 September 2023, 09:58 GMT
https://github.com/tensorflow/tensorflow/commit/86daa4eef029512503d50af8c1fdf99bd87827e9 fixed the build failure. The produced opt-cuda packages are linked below:
https://drive.google.com/file/d/1q_AMvbk0CZ73RQm-YZGr0Wv326U7HlCa/view?usp=sharing tensorflow-opt-cuda-2.13.0-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1-ts6hNMKxCWSxxTOx7PBkmWtNn06kyO7/view?usp=sharing python-tensorflow-opt-cuda-2.13.0-1-x86_64.pkg.tar.zst

@menkaur do these packages resolve the issue?
Comment by Wolfgang Seifert (wolfseifert) - Wednesday, 22 November 2023, 15:44 GMT
I had the same issue and found this: https://stackoverflow.com/questions/77381561/tensorflow-cc-results-in-undefined-reference-to-omp-in-parallelversion

To solve the problem I installed intel-oneapi-openmp and intel-oneapi-compiler-shared-runtime-libs. Then I undid the bad patches from PKGBUILD:

$ sudo patchelf --replace-needed libomp.so libiomp5.so libtensorflow_cc.so.2.13.0
$ sudo patchelf --replace-needed libomp.so libiomp5.so libtensorflow.so.2.13.0
$ sudo patchelf --replace-needed libomp.so libiomp5.so libtensorflow_framework.so.2.13.0

And now it works again.

@Maintainers: please remove the patchelf from PKGBUILD and add dependencies to intel-oneapi-openmp and intel-oneapi-compiler-shared-runtime-libs.

Loading...