FS#79300 - [glibc] 2.38 posix_memalign regression
            Attached to Project:
            Arch Linux
            
Opened by lopster (lopster) - Saturday, 05 August 2023, 01:30 GMT
Last edited by freswa (frederik) - Thursday, 17 August 2023, 15:52 GMT
          Opened by lopster (lopster) - Saturday, 05 August 2023, 01:30 GMT
Last edited by freswa (frederik) - Thursday, 17 August 2023, 15:52 GMT
|  | Details
                    Description: This is probably an upstream issue but glibc
                    seems to want bugs to start with the distro. If someone here
                    can at least reproduce this I guess that would be good. Beginning with the glibc 2.38 update, I started seeing up to 100% CPU utilization in mpv when streaming videos using yt-dlp. I attached gdb to the process and the mpv demux thread is stuck looping with this stack trace on any break: #0 chunk_ok_for_memalign (p=p@entry=0x7fffc4fd5c00, alignment=alignment@entry=64, nb=nb@entry=32) at malloc.c:5010 #1 0x00007ffff509cc47 in _int_memalign (av=av@entry=0x7fffc4000030, alignment=alignment@entry=64, bytes=bytes@entry=24) at malloc.c:5116 #2 0x00007ffff509d442 in _mid_memalign (alignment=alignment@entry=64, bytes=bytes@entry=24, address=<optimized out>) at malloc.c:3636 #3 0x00007ffff509ed23 in __posix_memalign (size=24, alignment=64, memptr=0x7fffcb91ac20) at malloc.c:5905 #4 __posix_memalign (memptr=0x7fffcb91ac20, alignment=64, size=24) at malloc.c:5889 #5 0x00007ffff5c52126 in av_malloc () from target:/usr/lib/libavutil.so.58 #6 0x00007ffff5c522fe in av_mallocz () from target:/usr/lib/libavutil.so.58 #7 0x00007ffff5c27073 in av_buffer_ref () from target:/usr/lib/libavutil.so.58 #8 0x00005555555d509f in new_demux_packet_from_buf (buf=0x7fffc5b4e700) at ../mpv/demux/packet.c:104 #9 0x00005555555c715a in handle_block (block_info=0x7fffcb91ad40, demuxer=0x7fffc40a0ee0) at ../mpv/demux/demux_mkv.c:2709 #10 demux_mkv_read_packet (demuxer=0x7fffc40a0ee0, pkt=<optimized out>) at ../mpv/demux/demux_mkv.c:2958 #11 0x00005555555b4d5b in read_packet (in=0x7fffc4057f80) at ../mpv/demux/demux.c:2284 #12 thread_work (in=0x7fffc4057f80) at ../mpv/demux/demux.c:2559 #13 0x00005555556d7460 in demux_read_any_packet.isra.0 (demuxer=<optimized out>) at ../mpv/demux/demux.c:2839 #14 0x00005555555cfb0a in do_read_next_packet (demuxer=0x7fffc4024070, src=0x7fffc4024bc0) at ../mpv/demux/demux_timeline.c:276 #15 0x00005555555d001b in do_read_next_packet (src=0x7fffc4024bc0, demuxer=0x7fffc4024070) at ../mpv/demux/demux_timeline.c:374 #16 d_read_packet (demuxer=0x7fffc4024070, out_pkt=0x7fffcb91b720) at ../mpv/demux/demux_timeline.c:394 #17 0x00005555555b4d5b in read_packet (in=0x7fffc408d560) at ../mpv/demux/demux.c:2284 #18 thread_work (in=in@entry=0x7fffc408d560) at ../mpv/demux/demux.c:2559 #19 0x00005555555b5160 in demux_thread (pctx=0x7fffc408d560) at ../mpv/demux/demux.c:2577 #20 0x00007ffff508c9eb in start_thread (arg=<optimized out>) at pthread_create.c:444 #21 0x00007ffff511123c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 chunk_ok_for_memalign is new in glibc 2.38. Steps to reproduce: Start a video streaming with a command like this (videos from any site supported by yt-dlp seem to cause it): mpv --demuxer-max-bytes=300000k https://www.pbs.org/video/australia-animals-with-cameras-637xr7/ Pause the video to allow the demuxer cache to fill. Watch top/htop to see CPU usage spike. It sometimes helps to pause/resume the video a few times. Eventually it can get stuck at 100% of a core. mpv starts having performance issues as well. | 
              This task depends upon
              
              
            
            
          
            Closed by  freswa (frederik)
Thursday, 17 August 2023, 15:52 GMT
Reason for closing: Fixed
Additional comments about closing: glibc-2.38-3
          
        Thursday, 17 August 2023, 15:52 GMT
Reason for closing: Fixed
Additional comments about closing: glibc-2.38-3
 
                      
[1] https://sourceware.org/pipermail/libc-alpha/2023-August/150653.html
[1] https://bbs.archlinux.org/viewtopic.php?id=287741
[2] https://github.com/mpv-player/mpv/issues/12076
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=30723
Here's an easy binary patch that seems to work around this issue for the arch glibc 2.38-2 build:
Copy /usr/lib/libc.so.6 somewhere. Load it in a hex editor.
The bytes at offset 9ca05 should be hex 74 79.
Change them to EB 79.
Prior to running mpv, execute this command:
export LD_LIBRARY_PATH=/directory_with_modified_libc:$LD_LIBRARY_PATH
This makes glibc take a code path like 2.37 did.
https://github.com/mpv-player/mpv/issues/12076#issuecomment-1666892864
Another workaround: Using a different allocator, e. g. jemalloc, via LD_PRELOAD, as described here: https://github.com/jemalloc/jemalloc/wiki/Getting-Started
$ LD_PRELOAD=`jemalloc-config --libdir`/libjemalloc.so.`jemalloc-config --revision` mpv ...
The present severity of 'low' is, in my opinion, overly optimistic about how little a potential impact _any_ bug in glibc can be. This is difficult to isolate but might be an unbounded or very bad worst-possible-performance case regression in a core system library.
I find this glibc change somewhat baffling. It's like a poorly performing garbage collector was put in the aligned allocator code path. Less than ideal since aligned memory is most likely to be used by apps seeking the best performance.
As mentioned elsewhere, severity is basically useless for Arch tickets. Upping the severity is not going to magically turn us all into Glibc hackers :) It's an upstream issue after all...so it doesn't get the bug fixed any faster.
This might be needed if upstream don't fix it in a timely manner. Feel free to prepare and submit a tested patch for consideration.
Yes
>> A simple Arch specific patch
>This might be needed if upstream don't fix it in a timely manner. Feel free to prepare and submit a tested patch for consideration.
+1
The revert of the glibc memalign changes is much larger.
I could not reproduce the issue with ffmpeg due to:
$ mpv --demuxer-max-bytes=300000k https://www.pbs.org/video/australia-animals-with-cameras-637xr7/
[ytdl_hook] ERROR: An extractor error has occurred. (caused by KeyError('title')); please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U
[ytdl_hook] youtube-dl failed: unexpected error occurred
Failed to recognize file format
Using a synthetic test the glibc patch restored performance.
>$ mpv --demuxer-max-bytes=300000k https://www.pbs.org/video/australia-animals-with-cameras-637xr7/
>[ytdl_hook] ERROR: An extractor error has occurred. (caused by KeyError('title')); please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U
That's still working for me. Maybe it's a region locked video, sorry about that. The issue should happen with any youtube or other yt-dlp supported site video of decent size as well.
[1] https://sourceware.org/pipermail/libc-alpha/2023-August/150771.html
I saw that discussion and still don't get the thinking here.
I don't know anything about ffmpeg internals but, yes, that would almost certainly be a better way to go. I'm not sure they should have to do that, however. The previous glibc allocation strategy was apparently working for them.
A similar argument could be made for whatever apps they have in mind that are prompting this change due to heap fragmentation. Fragmentation has been an issue forever with environments that don't have garbage collection. I would suggest those apps should employ a special purpose allocator to resolve their issues rather than mess up a general purpose allocator. Maybe the former glibc strategy was wasting memory but in a lot of cases memory is more available than CPU. mpv went from using low CPU and reasonable memory to completely unusable with this change so the tradeoff went way too far in the other direction, in my opinion. mpv is probably not the only impacted app.
I hope it gets resolved quickly, but I find this comment by the glibc devs worrying: https://sourceware.org/pipermail/libc-alpha/2023-August/150729.html
I really disagree with that mentality - blaming the API consumers after a change that broke things that previously worked. (quote: "I'm tempted to not fix this, to teach developers to not use posix_memalign() unless they REALLY need it ;-)" )
I mean people literally hotfixing this with LD_PRELOADing alternative allocators like 'jemalloc' to get it back working. I also just tried to LD_PRELOAD 'mimalloc' and mpv works fine in the use case in question here too.
I think glibc malloc allowing itself a to preform horribly in a (maybe suboptimal or even erranous) API use case that otherwise works just OK with both 'jemalloc' and 'mimalloc' would be a bad outcome.
https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/HEAD:/libavutil/mem.h#l112
https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/HEAD:/libavutil/mem.c#l96
Disabling aligned allocations in ffmpeg entirely will likely impact the performance of SIMD accelerated subroutines (SSE prefers 128 bit, AVX prefers 256 bit alignment). Unaligned SIMD accesses may not crash but hitting page boundaries is not good in SIMD routes that tend to be performance critical.
[1] https://github.com/Mesa3D/mesa/blob/mesa-23.0.2/src/util/os_memory_aligned.h#L71
Now every project which used that function in the past have to look for something else? It worked! Why break it? If an API is used, even in a weird way it won't be a bug it will become a feature. Even Torvalds said that at their level for the kernel development. glibc is in a similar realm and should honor that too: Never break user space!
The reason is not a historical one but a practical one. There are many many many programs that run on top of the Linux kernel; if a kernel interface breaks those programs then everybody would need to upgrade those programs.
At Manjaro side we use your patch @lopster, thx for that.
https://drive.google.com/file/d/1tAi0w_lvb_mMrhlsaYhplmqBS2Goc3Zw/view?usp=sharing glibc-2.38-2.1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/16pXFwdKLbwjowsuTjl98jYv7JRAJ0mmo/view?usp=sharing lib32-glibc-2.38-2.1-x86_64.pkg.tar.zst
I can no longer reproduce the issue with the patch applied.
Sharing on the link should now be fixed. Sorry about that.
https://drive.google.com/file/d/1tAi0w_lvb_mMrhlsaYhplmqBS2Goc3Zw/view?usp=sharing glibc-2.38-2.1-x86_64.pkg.tar.zst
[1] https://github.com/mpv-player/mpv/issues/12076
[1] https://inbox.sourceware.org/libc-alpha/877cq4yrpg.fsf%40oldenburg.str.redhat.com/T/
At me for the mpv issue this patch did not help significantly (just by looking at cpu use in htop, I did no bench).
If someone wants to see a flamegraph with mpv already barely keeping up with demuxing at 100% CPU use, I attach mine here taken with this patch [1] applied.
[1] https://inbox.sourceware.org/libc-alpha/877cq4yrpg.fsf%40oldenburg.str.redhat.com/
No...the progression of CPU usage might have been a little different just from looking at top but mpv still ended up at 100% utilization with approximately the same amount of data cached (~200MB on my machine).
$ mpv --no-config --cache=yes --demuxer-max-bytes=1G big_local_video_file.mkv
Out of the total cpu time of the mpv process here, about 75 % is in the "demuxer_thread" -> __posix_memalign -> ... -> chunk_ok_for_memalign call path
ffmpeg -f lavfi -i anullsrc -f lavfi -i color=color=black -t 3600 -f matroska test.mkv
Also works in a pipeline:
ffmpeg -f lavfi -i anullsrc -f lavfi -i color=color=black -t 3600 -f matroska - | mpv --no-config --cache=yes --demuxer-max-bytes=1G -
(on top of the previous one from [2])
seems to fix the performance issues with mpv for me.
[1] https://inbox.sourceware.org/libc-alpha/87pm3uajev.fsf%40oldenburg.str.redhat.com
[2] https://inbox.sourceware.org/libc-alpha/877cq4yrpg.fsf%40oldenburg.str.redhat.com
[1] https://inbox.sourceware.org/libc-alpha/87pm3uajev.fsf%40oldenburg.str.redhat.com/
https://drive.google.com/file/d/1a-xidrDmkF7Aa1LGbOtG1Uabp6xHPngK/view?usp=sharing glibc-2.38-2.2-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1RkFGJvHn4HP41N5iZG1qp5wIp_isXDRO/view?usp=sharing lib32-glibc-2.38-2.2-x86_64.pkg.tar.zst
[1] https://inbox.sourceware.org/libc-alpha/877cq4yrpg.fsf%40oldenburg.str.redhat.com
[2] https://inbox.sourceware.org/libc-alpha/87pm3uajev.fsf%40oldenburg.str.redhat.com
I tested as you said and with that now mpv is fixed for me - moreover demuxer performance looks fully back to normal (based on some basic bench I did).
I tested with the latest patch and it fixes the issue for me as well.
>And this is expected - the second upstream patch just cuts out the same 2.38-release introduced code path that we were cutting out with @lopster 's patch - isn't it?
Yes, the latest patch deletes the code that was being bypassed by the patch I put up.
This bug can break the performance of potentially every application. Just yesterday I noticed suspicious periods of 100% CPU with a node application after upgrading the system.
People expect their Arch machines to work and not break in obscure ways, nevermind that the workaround is digging up a patch from the bugtracker and rebuilding glibc.
As an additional data point there are 77 libraries and 46 executables on my system that directly import posix_memalign. [1] [2]
Nobody can guarantee that these aren't affected.
[1] cd /usr/bin; file * | grep 'ELF 64-bit.*executable' | cut -d : -f 1 | while read f; do readelf -a "$f" | grep -E 'FUNC.+\bposix_memalign@GLIBC'; done | wc -l
[2] cd /usr/lib; file * | grep 'ELF 64-bit.*shared' | cut -d : -f 1 | while read f; do readelf -a "$f" | grep -E 'FUNC.+\bposix_memalign@GLIBC'; done | wc -l
@Toolybird commented on the glibc bug [1] saying that Arch will ship after the patches are committed to the glibc repo. The 2nd patch has been revised to resolve test issues and is in pre-commit review.
https://sourceware.org/bugzilla/show_bug.cgi?id=30723
https://drive.google.com/file/d/1Pg5WXVzVL5P2Ws-z_ubVQ3npekNerBT7/view?usp=sharing glibc-2.38-2.3-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1lZGwjDqRSwNzFORcf_NFmMxQFu3riZHK/view?usp=sharing lib32-glibc-2.38-2.3-x86_64.pkg.tar.zst
[1] https://inbox.sourceware.org/libc-alpha/877cq4yrpg.fsf%40oldenburg.str.redhat.com
[2] https://inbox.sourceware.org/libc-alpha/87a5uxtwaq.fsf%40oldenburg.str.redhat.com/
Another (unexpected) victim seems to have been cpulimit…
Same conditions (but just one hour of testing) with glibc-2.38-2.3-x86_64.pkg.tar.zst and lib32-glibc-2.38-2.3-x86_64.pkg.tar.zst: Also fine.
https://wiki.archlinux.org/title/Patching_packages
https://sourceware.org/git/?p=glibc.git;a=shortlog
https://sourceware.org/git/?p=glibc.git;a=blobdiff_plain;f=malloc/malloc.c;h=948f9759afa16f74b7bf3848589c0766e9f42cfb;hp=e2f1a615a4fc7b036e188a28de9cfb132b2351df;hb=542b1105852568c3ebc712225ae78b8c8ba31a78;hpb=039ff51ac7e02db1cfc0c23e38ac7bfbb00221d1
https://sourceware.org/git/?p=glibc.git;a=blobdiff_plain;f=malloc/malloc.c;h=d0bbbf371048ee8aa8a30c03b189cb268b8ad9e4;hp=948f9759afa16f74b7bf3848589c0766e9f42cfb;hb=0dc7fc1cf094406a138e4d1bcf9553e59edcf89d;hpb=648b832600e94c09e3cee2fa6d3c3684687505e4
< https://drive.google.com/file/d/1Pg5WXVzVL5P2Ws-z_ubVQ3npekNerBT7/view?usp=sharing >
&
< https://drive.google.com/file/d/1lZGwjDqRSwNzFORcf_NFmMxQFu3riZHK/view?usp=sharing >
with pacman -U lib32-glibc-2.38-2.3-x86_64.pkg.tar.zst glibc-2.38-2.3-x86_64.pkg.tar.zst
in arch 6.4.10-arch1-1 and using
mpv https://mcdn.daserste.de/daserste/de/master.m3u8
causes the same high cpu load as before/without patches
so the patches will not help me prevent unusal high cpu load for mpv streaming
screenshot htop with mpv https://mcdn.daserste.de/daserste/de/master.m3u8
=> Screenshot_mpv_20230815_205731__x.jpg
in comparison htop watching with vlc the same stream
=> Screenshot_vlc_20230815_205940__x.jpg
___________
$ inxi -CG
CPU:
Info: quad core model: Intel Core i7-8665U bits: 64 type: MT MCP cache:
L2: 1024 KiB
Speed (MHz): avg: 1060 min/max: 400/4800 cores: 1: 900 2: 900 3: 900
4: 2100 5: 900 6: 981 7: 900 8: 900
Graphics:
Device-1: Intel WhiskeyLake-U GT2 [UHD Graphics 620] driver: i915 v: kernel
Device-2: Chicony ThinkPad T490 Webcam driver: uvcvideo type: USB
Display: x11 server: X.Org v: 21.1.8 driver: X: loaded: modesetting
dri: iris gpu: i915 resolution: 1920x1080~60Hz
API: OpenGL v: 4.6 Mesa 23.1.5 renderer: Mesa Intel UHD Graphics 620 (WHL
GT2)
__
ps
observed since about 4 weeks that streaming the <oeffentlich rechtliche> begin stuttering after about 10-30sec resulting in stopping => <Audio device underrun detected> while increasing cpu loads
stuttering slows down with --cache=no in mpv.conf
Well, clearly you have an entirely different problem then. Please don't hijack tickets with unrelated spam...take it to the proper support channels (Forum/IRC/Mailing Lists/Reddit/etc) to see if anyone can help troubleshoot. Any more off-topic comments will be deleted.
We've only two anecdotal datapoints in favor of the patch, one against that can perfectly falsify the theory.
However, there're oc. multiple factors ignored:
@pabello, 1st off all reboot to ensure a clean slate after the package update to rule out FS cache issues.
Then does --cache=no also completely drop the CPU load?
Do the hack-a-rounds in https://bugs.archlinux.org/task/79300#comment220065 & https://bugs.archlinux.org/task/79300#comment220078 work for you and can you reproduce the issue w/ and w/o the patched libs on a local file following https://bugs.archlinux.org/task/79300#comment220881
If so
$ md5sum /usr/lib/libc.so.6
Otherwise I'd look into whether the stream gets maybe decoded in SW (mpv output) and/or
$ type mpv
$ ldd /usr/bin/mpv | grep libc
@Toolybird, feel free to clean up after us ;)
@pabello, if the bug mods don't like that kind of investigation here, move it to the bbs and dm seth (yes, I'm allowed to suggest that ;)
@Toolybird : last intrusion .. watched the mpv/ffmpeg prob since some weeks and found here the seemingly right track .. may further switch to bbs
@luebking :
< 1st off all reboot to ensure a clean slate after the package update to rule out FS cache issues > => done
< Then does --cache=no also completely drop the CPU load? > => no
< Do the hack-a-rounds in https://bugs.archlinux.org/task/79300#comment220065 & https://bugs.archlinux.org/task/79300#comment220078 work for you and can you reproduce the issue w/ and w/o the patched libs on a local file following https://bugs.archlinux.org/task/79300#comment220881 >
=> one machine thinkpad t490s = patched => unusual high CPU as mentioned in pic <Screenshot_mpv_20230815_205731__x.jpg> .. w/ and w/o variants of mpv.conf .. see below
=> other machine thinkpad yogi 3rd, almost same CPU = unpatched => same unususal high CPU .. w/ and w/o variants of mpv.conf .. see below
stuttering seems to disappear since last <pacman/yay -Syu> in both machines
upgrades done almost daily
both machines working only via wifi
high CPU load while mpvstreaming remains
as usual i remember mpv used below 3% CPU in thinkpads
stuttering and <Audio device underrun detected> never appears on manjaro/desktop via LAN
mpv.conf w/ and w/o <--cache=no> does not drop down/change CPU load
also tried <--demuxer-max-bytes=50MiB>, <--demuxer-max-back-bytes=25MiB> and/or <--ytdl-raw-options=cache-dir=/dev/null>
=> no change .. CPU load still unusual high as seen in pic
patched machine:
$ md5sum /usr/lib/libc.so.6
30c41839a22a17fcfa6001ca98bc3751 /usr/lib/libc.so.6
$ type mpv
mpv ist /usr/bin/mpv
$ ldd /usr/bin/mpv | grep libc
libcdio_paranoia.so.2 => /usr/lib/libcdio_paranoia.so.2 (0x00007f62820ff000)
libcdio_cdda.so.2 => /usr/lib/libcdio_cdda.so.2 (0x00007f62820f6000)
libcdio.so.19 => /usr/lib/libcdio.so.19 (0x00007f62820cb000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f627f000000)
libcairo.so.2 => /usr/lib/libcairo.so.2 (0x00007f627e0ce000)
libcrypto.so.3 => /usr/lib/libcrypto.so.3 (0x00007f6272200000)
libcairo-gobject.so.2 => /usr/lib/libcairo-gobject.so.2 (0x00007f62713f2000)
libcap.so.2 => /usr/lib/libcap.so.2 (0x00007f626f6ce000)
$ mpv /mnt/Daten/Videos/stream/ipTV.m3u
Playing: https://mcdn.daserste.de/daserste/de/master.m3u8
[ffmpeg/demuxer] hls: Can't support the subtitle(uri: master_subs_webvtt.m3u8)
Video --vid=1 (h264 640x360 50.000fps) (2147 kbps)
(+) Video --vid=2 (h264 1920x1080 50.000fps) (8501 kbps)
Video --vid=3 (h264 1280x720 50.000fps) (5491 kbps)
Video --vid=4 (h264 960x540 50.000fps) (2816 kbps)
Video --vid=5 (h264 480x270 50.000fps) (1311 kbps)
(+) Audio --aid=1 --alang=de (*) (aac 2ch 48000Hz) (1311 kbps)
Audio --aid=2 --alang=klare sprache (aac 2ch 48000Hz) (1311 kbps)
AO: [pulse] 48000Hz stereo 2ch float
VO: [gpu] 1920x1080 yuv420p
AV: 00:06:27 / 00:06:28 (100%) A-V: 0.000
Cannot seek in this stream.
You can force it with '--force-seekable=yes'.
AV: 00:06:27 / 00:06:28 (100%) A-V: 0.000
. . .
¿ please give me a hint > what you mean with < ... and dm seth ... > ?
i hope not to cause further confusion . . .
__________
@Toolybird : thanx for instruction + clearing
pardon me for breaching etiquette
Everything seems to be running fine. The cpu usage in mpv is back to normal and the read speed (both local and network) back to saturating the connection.
Open a thread on the BBS, find the user seth (should not be hard) and send a direct message (mail) linking your thread.
@Toolybird, I've a local copy of this subthread so feel free to just delete all related messages.