FS#76990 - [arrow] updates break [python-pyarrow] and subsequently [python-pandas]

Attached to Project: Community Packages
Opened by Daniel Jewell (danieljewell) - Monday, 02 January 2023, 21:42 GMT
Last edited by Toolybird (Toolybird) - Thursday, 23 February 2023, 22:25 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Bruno Pagani (ArchangeGabriel)
Konstantin Gizdov (kgizdov)
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

This is partially a bug report and partially a notification/awareness message.

The [arrow] package is a dependency for [python-pyarrow] because pyarrow is built using the Apache Arrow libraries that are installed by [arrow] into /usr/lib (notably, /usr/lib/libarrow.so.1000 as of this writing). A very specific example is the "lib.cpython-310-x86_64-linux-gnu.so" file built (as part of [python-pyarrow]) which is dynamically linked against libarrow.so.1000 ... ("ldd /usr/lib/python3.10/site-packages/pyarrow/lib.cpython-310-x86_64-linux-gnu.so")

As a result, when [arrow] is updated/recompiled, this completely breaks [python-pyarrow] *AND* other packages that might auto-import pyarrow... Including, and perhaps most importantly, [python-pandas]. (And when I say "breaks", python dies with an illegal hardware instruction error whenever one tries to import pandas or pyarrow.)

There are two possible solutions to this that I can see:

1. Ensure that updating [arrow] requires/cascades an update of [python-pyarrow]
2. Change [python-pyarrow] to not depend on the system arrow libraries and instead bundle its own

Solution #2 is what the "pyarrow" binary package from PyPi does - it bundles the required libraries in a self-contained way that then get installed within the python pacakge directory itself. Perhaps not the most elegant solution, but it does ensure that a major part of the Python ecosystem doesn't break when a single package is updated.

Interestingly, Debian currently doesn't even package arrow (or pyarrow) at all - there is experimental support for it though.
This task depends upon

Closed by  Toolybird (Toolybird)
Thursday, 23 February 2023, 22:25 GMT
Reason for closing:  Works for me
Additional comments about closing:  Also "no response"
Comment by Toolybird (Toolybird) - Monday, 02 January 2023, 22:02 GMT
> completely breaks [python-pyarrow]

Your report is missing the most vital piece of info -> "Steps to reproduce:"

This works in a quick test:

$ python -c 'import pyarrow'
Comment by Bruno Pagani (ArchangeGabriel) - Monday, 02 January 2023, 22:15 GMT
soname changes are normally respected, so python-pyarrow is rebuilt upon major arrow release, and in fact python-pyarrow is always released at the same time as arrow itself (actually they are the same repository upstream) so it is updated and rebuilt at the same time. And as Toolybird mentionned, we cannot reproduce.

I guess this is more likely https://bugs.archlinux.org/task/75747.

Loading...