FS#73335 - [ceph-mgr] Module 'devicehealth' has failed: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

Attached to Project: Community Packages
Opened by André Miranda (ardemiranda) - Wednesday, 12 January 2022, 02:19 GMT
Last edited by Thore Bödecker (foxxx0) - Monday, 26 September 2022, 18:17 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Thore Bödecker (foxxx0)
Architecture All
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 5
Private No

Details

Description:
The command cephd -s returns the message:
Module 'devicehealth' has failed: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

Additional info:

* package version(s)
ceph-mgr: 15.2.14-5
python: 3.10.1-2

* link to upstream bug report, if any
Possible resolution https://github.com/ceph/ceph/pull/44112

This task depends upon

Closed by  Thore Bödecker (foxxx0)
Monday, 26 September 2022, 18:17 GMT
Reason for closing:  Won't fix
Additional comments about closing:  The ceph pkgbase has been dropped to AUR. Please comment/notify bugs there.
Comment by Mal Haak (insanemal) - Monday, 07 February 2022, 04:54 GMT
It also breaks MGR's balancer module

Module 'balancer' has failed: PY_SSIZE_T_CLEAN macro must be defined for '#' formats
Comment by Paul Stemmet (pbazaah) - Saturday, 19 February 2022, 22:31 GMT
I have noted this error appearing in ceph CSI controller logs, in code paths involving DeleteVolume, CreateSnapshot and DeleteSnapshot calls. I also have strong suspicions that this issue can lead to strange outcomes in calls RADOS Object Storage. Given that these are components of the actual functionality of the package, I would recommend fixing this quickly, as it is worse than I had originally thought[0].

To that end, I have invested some time and compute into finding a fix, the patch set of which is attached to this post. It is anchored against the Github package mirror so it may require some tweaking for SVN.

The TLDR is an additional patch file porting the one line fix that was merged to the mainline upstream branch, listed here[1], along with the requisite changes to the PKGBUILD. If you are building for the current Arch release window you'll also need to bump zstd to 1.5.2 (and update the shasum).

If I can be of any further assistance in getting this merged, please let me know.

[0]: https://bugs.archlinux.org/task/73314
[1]: https://github.com/ceph/ceph/pull/44112
Comment by Paul Stemmet (pbazaah) - Sunday, 27 March 2022, 16:39 GMT
Any updates on this issue?

This bug renders this package inoperable on any recent build of Archlinux, and has been open for over a month. Is there any action I can take to help it along?
Comment by Daniel (feedc0de) - Wednesday, 04 May 2022, 14:26 GMT
Additionally to the mon error message in ceph status I can also trigger this error with the following commands:

sudo ceph dashboard ac-user-create feedc0de -i password.txt administrator
Error EINVAL: SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

sudo ceph fs volume create komposthaufen
Error EINVAL: SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

I guess ceph as a whole is unusable at its current state on arch, I cannot configure anything
Comment by Paul Stemmet (pbazaah) - Wednesday, 04 May 2022, 14:43 GMT
Correct.

I have unfortunately had to switch to using a private in-house package for the moment. If you have the compute resources lying around you can take the .patch file listed above and apply it to the github mirror[0] and build it yourself. If you do, make sure you copy the files out of the git repo first, as the ceph CMake build will incorrectly version ceph otherwise.

To make matters worse, ceph 15 is reaching EOL in June of this year so I've had to go muck around with updating the PKGBUILD for v16

[0]: https://github.com/archlinux/svntogit-community/tree/packages/ceph
Comment by Alex Sowitzki (eqrx) - Wednesday, 04 May 2022, 16:04 GMT
I also can confirm that the patch from Paul works, thank you for that :) I had some other issues that caused me to run my ceph demons on centos within a systemd-nspawn machine (OSDs bind mounted into it... sounds ugly, is ugly but works surprisingly well
). Luckily the client functionality of the ceph package seems to be unaffected so I can still mount my cephfs from arch.

Could someone shed a bit of light on what the state of this package is? Is it "orphaned"? I completely understand that package maintenance is a lot of work and I am hugely thankful for that... but the package is broken for more than 3 months. Can I do something to help?
Comment by Miles Simpson (heliochronix) - Thursday, 05 May 2022, 18:00 GMT
My cluster has been stuck in a bad state for months due to this issue. Paul, have you considered publishing your patch and PKGBUILD on AUR? If it means I can fix up my cluster I'd happily switch over to a working build until the official package gets fixed and updated to 16.
Comment by Paul Stemmet (pbazaah) - Thursday, 05 May 2022, 18:36 GMT
Unfortunately Miles, doing so would violate the guidelines for AUR[0].

> The submitted PKGBUILDs must not build applications already in any of the official binary repositories under any circumstances.

I am considering trying to get in touch with one of the Archlinux maintainers about this issue, but I want to finish up my work on building a ceph 16 package, as ceph 15 is quite old already. I have made significant progress, and will likely finish this weekend.

[0]: https://wiki.archlinux.org/title/AUR_submission_guidelines
Comment by Daniel (feedc0de) - Friday, 06 May 2022, 13:00 GMT
im still compiling since days, my arm devices arent as fast, what would be missing to have it upstreamed and compiled by the nice archlinuxarm guys? as far as I understood, they use the identical pkgbuilds
Comment by Paul Stemmet (pbazaah) - Friday, 06 May 2022, 19:14 GMT
I think its mostly a lack of attention. My guess is Thore got busy with other responsibilities, and just hasn't looked at this package in a while.

The original patch above does work (I'm effectively running it myself), so it should (?) just be a case of an Arch maintainer being bugged about it and running it through whatever process packages take to be published.

I can't speak to arm builds, though, unfortunately.
Comment by Jamin Collins (jamincollins) - Wednesday, 11 May 2022, 01:29 GMT
I've made a build with the above patch for anyone that would rather not go through the build process.

https://drive.google.com/drive/folders/13TWsFeM7B9Wht_2vrJrnHmyCJTRFndpR?usp=sharing

These builds will remain available until an official updated build is released, at which point I'll remove them.

The sha256sums are included in the above link but I'll include them here as well:


359b8e6180d1752023a5075e503ecd190dddb3a85f8482bbfdaeea12448177af ceph-15.2.14-6-x86_64.pkg.tar.zst
138ee46a5b60d47b4b207d04f510313a7487d366e04e227640da76b6253bed21 ceph-libs-15.2.14-6-x86_64.pkg.tar.zst
c2798d5e2791254ea25b651c6036541d0e4ed1b221078333e8031e0e5f8b8214 ceph-mgr-15.2.14-6-x86_64.pkg.tar.zst
Comment by loqs (loqs) - Wednesday, 11 May 2022, 18:52 GMT
This is similar to jamincollins build but is updated to ceph 17.2 which depends the arrow package in the folder which in turn depends on the orc package.
ceph-17.2.0-1.src.tar.gz is in the tarball to see the changes I made. Packages that depend on ceph will need to be rebuilt locally for the updated version.

https://drive.google.com/drive/folders/1qSQGame-OkwqK6SYW60Bj5FGpfMrqkjq?usp=sharing
Comment by Daniel (feedc0de) - Monday, 23 May 2022, 15:41 GMT
i just can't successfully compile for archlinux arm, I get a errors while the install step happens in the end, I could not setup successful makepkg crosscompiling from x86 to arm. I think on archlinuxarm it is currently just not possible to host ceph related stuff. I will switch to debian for now as this is the only official supported distro anyways and compiling myself turned out to be such a huge pain.
Comment by Daniel (feedc0de) - Sunday, 12 June 2022, 13:53 GMT
I tried compiling on x86 and it fails way earlier than on arm. Did something change in the sources? I remember it compiling at least for an hour or so before.

src/ceph-15.2.14/src/include/buffer.h:97:52: Error: expected template-name before »<« token
97 | struct unique_leakable_ptr : public std::unique_ptr<T, ceph::nop_delete<T>> {
| ^
src/include/buffer.h:97:52: Error: expected »{« before »<« token
src/ceph-15.2.14/src/include/buffer.h:397:17: Error: »unique_ptr« in namespace »std« is not a template
397 | static std::unique_ptr<ptr_node, disposer>
| ^~~~~~~~~~
Comment by loqs (loqs) - Sunday, 12 June 2022, 18:58 GMT
@feedc0de see the two gcc12 patches referenced in the PKGBUILD in the attached src.tar.gz. The build failure you posted is fixed by ceph-17.2.0-gcc12-missing-memory-include.patch or a similar patch for 15.2.14.
Comment by Daniel (feedc0de) - Friday, 05 August 2022, 15:28 GMT
actually i still needed some patches to get ceph-mgr to work, it crashed with an unhandled exception because of an syntax error in std::regex, the patch applied was this: https://github.com/ceph/ceph/pull/47271/files
Comment by Daniel (feedc0de) - Friday, 05 August 2022, 15:28 GMT
i love how much completely broken the arch packaging for ceph is in every way :)
Comment by Paul Stemmet (pbazaah) - Monday, 08 August 2022, 08:02 GMT
As an update, I've reached out to the Archlinux maintainers, about this package and they've responded with the following (snipped):

> ... The packager has been pinged whether it should be dropped to the AUR, so it can be properly maintained.

So hopefully we'll see some movement on this soon.
Comment by Thore Bödecker (foxxx0) - Monday, 08 August 2022, 09:16 GMT
ceph has been partially dropped to the AUR: https://aur.archlinux.org/pkgbase/ceph

This represents the current repo SVN trunk/ directory and is now orphaned, feel free to adopt it.

Cleaning up the Arch repo is going to take a bit more time due to ceph-libs dependencies.

Sorry for the delay.
Comment by Paul Stemmet (pbazaah) - Monday, 08 August 2022, 09:43 GMT
Hi Thore, I've adopted the package, thanks for helping us with this.

Would you be okay if I pinged you if I get stuck on some C++ issue? I'm not expecting fast responses, and if you don't have time that's fine, but my C++ is pretty bad, and I struggle to understand some of the errors that pop up.

Everyone else:

I think this issue has run its course. I'd direct everyone to post comments / feedback to the AUR page now. Close?
Comment by loqs (loqs) - Monday, 08 August 2022, 13:17 GMT
@pbazaah this may be of some help. This builds for me.
Comment by Paul Stemmet (pbazaah) - Tuesday, 09 August 2022, 15:45 GMT
Thanks loqs, I'll take a look.
Comment by Miles Simpson (heliochronix) - Friday, 09 September 2022, 07:01 GMT
It seems these packages are being maintained in the official repos again, with the same package names but different versions. However, this PY_SSIZE_T_CLEAN issue persists in the official packages.
Comment by Daniel (feedc0de) - Sunday, 18 September 2022, 10:29 GMT
ceph 17.2.3 does not compile with latest fmt anymore

/home/feedc0de/ceph/src/ceph-17.2.3/src/common/Journald.cc:142:19: required from here
/usr/include/fmt/core.h:1757:7: error: static assertion failed: Cannot format an argument. To make type T formattable provide a formatter<T> specialization: https://fmt.dev/latest/api.html#udt
1757 | formattable,
| ^~~~~~~~~~~
/usr/include/fmt/core.h:1757:7: note: 'formattable' evaluates to false
Comment by Evangelos Foutras (foutrelis) - Sunday, 18 September 2022, 12:57 GMT
FWIW I couldn't get ceph 15.2.17 to build against boost 1.80 which means it'll be more broken when the latter moves out of testing.

If anyone knows how to tackle those boost::asio errors, do tell. :P
Comment by loqs (loqs) - Sunday, 18 September 2022, 16:51 GMT
@foutrelis rebuild xrootd with the ceph dependency removed to complete the removal of ceph [1] to AUR and let it be fixed there?

@freecode try the attached src archive [2], I test built this with and without testing enabled. Works with fmt 9 and boost 1.80.

[1] https://archlinux.org/todo/removal-of-ceph-from/
[2] ceph-17.2.3-1.src.tar.gz
Comment by Daniel (feedc0de) - Monday, 26 September 2022, 18:00 GMT
Thank you so much, ceph was compiling successfully on x86_64, but for my arm machines it fails with a strange linker error, any ideas?

[ 62%] Generating ../../../lib/cython_modules/lib.3/rgw.cpython-310-aarch64-linux-gnu.so
cd /mnt/ceph/src/ceph-17.2.3/src/pybind/rgw && env CC="/usr/bin/cc" CFLAGS="" CPPFLAGS="-iquote/mnt/ceph/src/ceph-17.2.3/src/include -w -D'void0=dead_function(void)' -D'__Pyx_check_single_interpreter(ARG)=ARG ## 0'" CXX="/usr/bin/c++" LDSHARED="/usr/bin/cc -shared" OPT="-DNDEBUG -g -fwrapv -w" LDFLAGS=-L/mnt/ceph/src/ceph-17.2.3/build/lib CYTHON_BUILD_DIR=/mnt/ceph/src/ceph-17.2.3/build/src/pybind/rgw CEPH_LIBDIR=/mnt/ceph/src/ceph-17.2.3/build/lib /usr/bin/python3.10 /mnt/ceph/src/ceph-17.2.3/src/pybind/rgw/setup.py build --build-base /mnt/ceph/src/ceph-17.2.3/build/lib/cython_modules --build-platlib /mnt/ceph/src/ceph-17.2.3/build/lib/cython_modules/lib.3
/usr/bin/ld: warning: libthrift-0.16.0.so, needed by /usr/lib/libparquet.so.800, not found (try using -rpath or -rpath-link)
/usr/bin/ld: /usr/lib/libparquet.so.800: undefined reference to `apache::thrift::protocol::TProtocolFactory::~TProtocolFactory()'
/usr/bin/ld: /usr/lib/libparquet.so.800: undefined reference to `typeinfo for apache::thrift::transport::TMemoryBuffer'
/usr/bin/ld: /usr/lib/libparquet.so.800: undefined reference to `typeinfo for apache::thrift::protocol::TProtocolFactory'
/usr/bin/ld: /usr/lib/libparquet.so.800: undefined reference to `vtable for apache::thrift::protocol::TProtocol'
/usr/bin/ld: /usr/lib/libparquet.so.800: undefined reference to `typeinfo for apache::thrift::protocol::TProtocol'
/usr/bin/ld: /usr/lib/libparquet.so.800: undefined reference to `apache::thrift::protocol::TProtocol::~TProtocol()'
/usr/bin/ld: /usr/lib/libparquet.so.800: undefined reference to `vtable for apache::thrift::transport::TMemoryBuffer'
/usr/bin/ld: /usr/lib/libparquet.so.800: undefined reference to `vtable for apache::thrift::transport::TTransportException'
/usr/bin/ld: /usr/lib/libparquet.so.800: undefined reference to `apache::thrift::protocol::TProtocol::skip_virt(apache::thrift::protocol::TType)'
/usr/bin/ld: /usr/lib/libparquet.so.800: undefined reference to `typeinfo for apache::thrift::transport::TTransportException'

https://bugs.archlinux.org/task/76110 is taking care of my problem :)

Loading...