FS#73967 - glibc 2.35 new dependency resolver buggyness with dlclose()
Attached to Project:
Arch Linux
Opened by Pieter-Jan Briers (PJB) - Saturday, 26 February 2022, 20:37 GMT
Last edited by freswa (frederik) - Tuesday, 20 September 2022, 14:09 GMT
Opened by Pieter-Jan Briers (PJB) - Saturday, 26 February 2022, 20:37 GMT
Last edited by freswa (frederik) - Tuesday, 20 September 2022, 14:09 GMT
|
Details
(glibc clearly says that I should report to the distro
first. I'm 99% sure this is not Arch's fault but I'm posting
here first just to comply with that)
Description: glibc 2.35 has a new dependency resolver for dynamic library loading, or something like that. I traced some bug reports we were getting in our game to it after a lot of debugging (I have a small C repro and the bug went away with GLIBC_TUNABLES=glibc.rtld.dynamic_sort=1 which disables the new system). The core of the bug in our game appears to be that a roundabout dlclose(dlopen(NULL)) done in ALSA's libraries causes libbz2.so to unload randomly, which then breaks loading libgtk-3.so when we try to open a MIDI file dialog later. The sequence of events can be best summarized with the small C program I got to reproduce the behavior (just gcc main.c && ./a.out it. You will need to have fluidsynth installed, and probably PortAudio/ALSA which it loads to cascade into this): #include <dlfcn.h> #include <stdio.h> #include <unistd.h> void main() { void* lib; // Early on, we load freetype, which has a direct dependency on bzip2. lib = dlopen("libfreetype.so.6", RTLD_NOW); printf("#### freetype: %llX\n", lib); // Then, much later (when the user opens the instrument menu), we load fluidsynth lib = dlopen("libfluidsynth.so.3", RTLD_NOW); printf("#### fluidsynth: %llX\n", lib); // Using dlsym so we don't link directly against fluidsynth and it's dynamically loaded. void* (*new_fluid_settings_fp)() = dlsym(lib, "new_fluid_settings"); // And initialize fluidsynth by calling new_fluid_settings() printf("#### Initializing fluidsynth!\n"); void* settings = new_fluid_settings_fp(); // Fluidsynth will go through all its audio drivers, which ends up initializing ALSA (through PortAudio) // alsa-libs/libasound.so ends up doing effectively dlclose(dlopen(NULL)) at config.c line 4017 running config hooks. This dlclose call, for some reason, results in libbz2.so getting fini'd. printf("#### Initialized fluidsynth!\n"); // GTK3 will fail to load (be zero) if loaded after fluidsynth initialized, because libbz2.so was erroneously unloaded. lib = dlopen("libgtk-3.so", RTLD_NOW); printf("#### GTK3: %llX\n", lib); return; } When I run this on my system right now (glibc 2.35), GTK will fail to be loaded, and LD_DEBUG=libs output shows libbz2.so getting unloaded during FluidSynth initialization. If I then run it with GLIBC_TUNABLES=glibc.rtld.dynamic_sort=1 (see glibc 2.35 release notes), this does not happen and GTK loads fine. I understand that this C program has a *lot* of variables at play through FluidSynth, ALSA, etc... Still we got at least 3 people reproducing it (me, and the two other bug reports[1] we got that made me investigate this). I've attached LD_DEBUG=libs for my system to the bug as mine.log. So this would be the end of it, except one of our maintainers had a problem where her game would fail to load (like, wouldn't even start, no ALSA/FluidSynth/GTK involved) in most cases with an error in ld.so: "Inconsistency detected by ld.so: dl-close.c: 277: _dl_close_worker: Assertion `imap->l_type == lt_loaded && !imap->l_nodelete_active' failed!" Running in Steam Linux Runtime seemed to fix it, and we just tried with this new tunable and the problem went away completely. LD_DEBUG file for her scenario also attached as vera.log (our game is .NET though so there's quite a lot of noise) Additional info: * package version: glibc 2.35-2 * config and/or log files etc. * link to upstream bug report, if any: none yet, since they tell me to report here first Steps to reproduce: See above, couldn't really figure out a good way to structure this. [1]: https://github.com/space-wizards/RobustToolbox/issues/2563 |
This task depends upon
Closed by freswa (frederik)
Tuesday, 20 September 2022, 14:09 GMT
Reason for closing: Fixed
Additional comments about closing: Fix commited to trunk revision 456508.
Tuesday, 20 September 2022, 14:09 GMT
Reason for closing: Fixed
Additional comments about closing: Fix commited to trunk revision 456508.
I'll close this for, but please feel free to request a re-open in case you have a patch/fix we may apply.
I'm in the process of filing a bug on upstream, but I need approval to make a bugzilla account there. It'll take a bit I guess.
I think changing the default sort as suggested is a good idea.
fprintf(stderr, "#### Initialized fluidsynth!\n");
Could this not be a bug in fluidsynth? Some kind of memory corruption, perhaps?
(gdb) where
#0 0x00007f71d7bfe143 in dfs_traversal.part () at /lib64/ld-linux-x86-64.so.2
#1 0x00007f71d7bfe194 in dfs_traversal.part () at /lib64/ld-linux-x86-64.so.2
#2 0x00007f71d7bfe194 in dfs_traversal.part () at /lib64/ld-linux-x86-64.so.2
#3 0x00007f71d7bfe58d in _dl_sort_maps () at /lib64/ld-linux-x86-64.so.2
#4 0x00007f71d7bee56c in _dl_close_worker () at /lib64/ld-linux-x86-64.so.2
#5 0x00007f71d7bef05b in _dl_close () at /lib64/ld-linux-x86-64.so.2
#6 0x00007f71d7b1be18 in _dl_catch_exception () at /usr/lib/libc.so.6
#7 0x00007f71d7b1bee3 in _dl_catch_error () at /usr/lib/libc.so.6
#8 0x00007f71d7a4a24e in _dlerror_run () at /usr/lib/libc.so.6
#9 0x00007f71d7a49f68 in dlclose@GLIBC_2.2.5 () at /usr/lib/libc.so.6
#10 0x00007f71d5de851d in pw_impl_module_destroy () at /usr/lib/libpipewire-0.3.so.0
#11 0x00007f71d5dc8546 in pw_context_destroy () at /usr/lib/libpipewire-0.3.so.0
#12 0x00007f71d5e872fd in () at /usr/lib/alsa-lib/libasound_module_pcm_pipewire.so
#13 0x00007f71d5e873e3 in () at /usr/lib/alsa-lib/libasound_module_pcm_pipewire.so
#14 0x00007f71d732e475 in () at /usr/lib/libasound.so.2
#15 0x00007f71d72eb156 in snd_pcm_close () at /usr/lib/libasound.so.2
#16 0x00007f71d72701cf in () at /usr/lib/libportaudio.so.2
#17 0x00007f71d727087b in () at /usr/lib/libportaudio.so.2
#18 0x00007f71d727596c in () at /usr/lib/libportaudio.so.2
#19 0x00007f71d727625a in PaAlsa_Initialize () at /usr/lib/libportaudio.so.2
#20 0x00007f71d7269336 in Pa_Initialize () at /usr/lib/libportaudio.so.2
#21 0x00007f71d73ef272 in () at /usr/lib/libfluidsynth.so.3
#22 0x00007f71d73f0e38 in new_fluid_settings () at /usr/lib/libfluidsynth.so.3
#23 0x000055aa0cdec22c in main () at test.c:22
#0 do_lookup_x (undef_name=undef_name@entry=0x7f5e0e188a6d "BZ2_hbMakeCodeLengths", new_hash=new_hash@entry=2100737866, old_hash=old_hash@entry=0x7ffd7039a000, ref=0x7f5e0e1887f8, result=result@entry=0x7ffd7039a010, scope=<optimized out>, i=2, version=0x0, flags=9, skip=<optimized out>, type_class=<optimized out>, undef_map=0x55a13367eda0) at dl-lookup.c:403
#1 0x00007f5e12e150e1 in _dl_lookup_symbol_x (undef_name=0x7f5e0e188a6d "BZ2_hbMakeCodeLengths", undef_map=undef_map@entry=0x55a13367eda0, ref=ref@entry=0x7ffd7039a128, symbol_scope=symbol_scope@entry=0x55a13367f110, version=0x0, type_class=4, flags=9, skip_map=0x0) at dl-lookup.c:860
#2 0x00007f5e12e1a050 in elf_machine_rela (skip_ifunc=0, reloc_addr_arg=0x7f5e0e199e70, version=0x55a13365c6b8, sym=0x7f5e0e1887f8, reloc=0x7f5e0e188ff0, scope=0x55a13367f110, map=0x55a13367eda0) at ../sysdeps/generic/ldsodefs.h:80
#3 elf_dynamic_do_Rela (skip_ifunc=0, lazy=<optimized out>, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, scope=<optimized out>, map=0x55a13367eda0) at /usr/src/debug/glibc-2.35/elf/do-rel.h:142
#4 _dl_relocate_object (l=0x55a13367eda0, scope=<optimized out>, reloc_mode=reloc_mode@entry=0, consider_profiling=<optimized out>, consider_profiling@entry=0) at dl-reloc.c:288
#5 0x00007f5e12e177de in dl_open_worker_begin (a=a@entry=0x7ffd7039a4d0) at dl-open.c:702
#6 0x00007f5e12d25e18 in __GI__dl_catch_exception (exception=exception@entry=0x7ffd7039a330, operate=operate@entry=0x7f5e12e17510 <dl_open_worker_begin>, args=args@entry=0x7ffd7039a4d0) at /usr/src/debug/glibc-2.35/elf/dl-error-skeleton.c:208
#7 0x00007f5e12e16d7b in dl_open_worker (a=a@entry=0x7ffd7039a4d0) at dl-open.c:782
#8 0x00007f5e12d25e18 in __GI__dl_catch_exception (exception=exception@entry=0x7ffd7039a4b0, operate=operate@entry=0x7f5e12e16d40 <dl_open_worker>, args=args@entry=0x7ffd7039a4d0) at /usr/src/debug/glibc-2.35/elf/dl-error-skeleton.c:208
#9 0x00007f5e12e1715d in _dl_open (file=0x7ffd7039a4b0 "", mode=<optimized out>, caller_dlopen=0x55a13175d232 <main+201>, nsid=-2, argc=1, argv=0x7ffd7039a8b8, env=0x7ffd7039a8c8) at dl-open.c:883
#10 0x00007f5e12c5474c in dlopen_doit (a=a@entry=0x7ffd7039a740) at dlopen.c:56
#11 0x00007f5e12d25e18 in __GI__dl_catch_exception (exception=exception@entry=0x7ffd7039a6a0, operate=0x7f5e12c546f0 <dlopen_doit>, args=0x7ffd7039a740) at /usr/src/debug/glibc-2.35/elf/dl-error-skeleton.c:208
#12 0x00007f5e12d25ee3 in __GI__dl_catch_error (objname=0x7ffd7039a6f8, errstring=0x7ffd7039a700, mallocedp=0x7ffd7039a6f7, operate=<optimized out>, args=<optimized out>) at /usr/src/debug/glibc-2.35/elf/dl-error-skeleton.c:227
#13 0x00007f5e12c5424e in _dlerror_run (operate=operate@entry=0x7f5e12c546f0 <dlopen_doit>, args=args@entry=0x7ffd7039a740) at dlerror.c:138
#14 0x00007f5e12c547d8 in dlopen_implementation (dl_caller=<optimized out>, mode=<optimized out>, file=<optimized out>) at dlopen.c:71
#15 ___dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:81
#16 0x000055a13175d232 in main ()