FS#80061 - [gnome-shell] Intermittent segmentation faults on boot
Attached to Project:
Arch Linux
Opened by Frantisek Sumsal (mrc0mmand) - Monday, 23 October 2023, 09:45 GMT
Last edited by Toolybird (Toolybird) - Thursday, 26 October 2023, 20:54 GMT
Opened by Frantisek Sumsal (mrc0mmand) - Monday, 23 October 2023, 09:45 GMT
Last edited by Toolybird (Toolybird) - Thursday, 26 October 2023, 20:54 GMT
|
Details
Description:
For the past couple of weeks/months I've been seeing occasional gnome-shell segfaults in our upstream systemd CI. We don't do anything special with it, the image just automatically boots into the graphical.target for some extra systemd-logind coverage. After some tweaking, the latest crash also saved a full (and symbolized) stack trace: #0 __GI_getenv (name=name@entry=0x7f213f472dda "EXPAT_ACCOUNTING_DEBUG") #1 0x00007f213f4715e0 in getDebugLevel.constprop.0 #2 0x00007f213f457da8 in parserInit #3 0x00007f213f4615b0 in parserCreate #4 0x00007f213f46183b in XML_ParserCreate_MM #5 0x00007f213f46184e in XML_ParserCreate #6 0x00007f214277c843 in FcConfigParseAndLoadFromMemoryInternal #7 0x00007f214277d297 in _FcConfigParse #8 0x00007f214277d47a in FcConfigParseAndLoadDir #9 _FcConfigParse #10 0x00007f2142780476 in FcParseInclude (parse=0x7f2002ff53e0) #11 FcEndElement (userData=0x7f2002ff53e0, name=<optimized out>) #12 0x00007f213f45f63f in doContent #13 0x00007f213f45cc14 in contentProcessor #14 doProlog #15 0x00007f213f45e7ed in prologProcessor #16 0x00007f213f4628ea in XML_ParseBuffer #17 0x00007f214277c945 in FcConfigParseAndLoadFromMemoryInternal 100 98183 100 98183 0 0 138k 0 --:--:-- --:--:-- --:--:-- 138k #18 0x00007f214277d297 in _FcConfigParse #19 0x00007f2142765191 in IA__FcConfigParseAndLoad #20 FcInitLoadOwnConfig (config=0x7f1fd8000b70) #21 0x00007f214276015d in FcInitLoadOwnConfigAndFonts (config=0x0) #22 IA__FcInitLoadConfigAndFonts () at ../fontconfig/src/fcinit.c:184 #23 FcConfigEnsure () at ../fontconfig/src/fccfg.c:96 #24 0x00007f214276548d in FcConfigInit () at ../fontconfig/src/fccfg.c:122 #25 IA__FcInit () at ../fontconfig/src/fcinit.c:193 #26 0x00007f21427a9412 in init_in_thread (task_data=<optimized out>) #27 0x00007f2143b669a5 in g_thread_proxy (data=0x559b6f5d7e60) #28 0x00007f21432aa9eb in start_thread (arg=<optimized out>) #29 0x00007f214332e7cc in clone3 () See the attachment (or [0]) for the whole thing, as it's quite big. In case it's needed, there's also the full journal [1] from the machine, as well as the list of all installed packages [2]. [0] https://jenkins-systemd.apps.ocp.cloud.ci.centos.org/job/upstream-vagrant-archlinux-sanitizers/7059/artifact//systemd-centos-ci/artifacts_all/artifacts_dxv80ir8/vagrant-logs.uwh/vagrant-arch-sanitizers-clang-testsuite.9H0/coredumpctl_collect_boot_FAIL.log [1] https://jenkins-systemd.apps.ocp.cloud.ci.centos.org/job/upstream-vagrant-archlinux-sanitizers/7059/artifact//systemd-centos-ci/artifacts_all/artifacts_dxv80ir8/vagrant-logs.uwh/vagrant-arch-sanitizers-clang-testsuite.9H0/journalctl-testsuite_PASS.log [2] https://jenkins-systemd.apps.ocp.cloud.ci.centos.org/job/upstream-vagrant-archlinux-sanitizers/7059/artifact//systemd-centos-ci/artifacts_all/artifacts_dxv80ir8/vagrant-logs.uwh/vagrant-arch-sanitizers-clang-installed-pkgs.txt Additional info: * package version(s) gnome-shell 1:45.0+r17+gebf2f8036-1 |
This task depends upon
Closed by Toolybird (Toolybird)
Thursday, 26 October 2023, 20:54 GMT
Reason for closing: Upstream
Additional comments about closing: Please see comments
Thursday, 26 October 2023, 20:54 GMT
Reason for closing: Upstream
Additional comments about closing: Please see comments
coredumpctl_collect_boot_FAIL...
Edit:
Does rebuilding with the address sanitizer '-fsanitize=address' detect any issues?
[1]: https://bugs.launchpad.net/ubuntu/+bug/1979118/comments/3
[2]: https://github.com/libexpat/libexpat/blob/R_2_5_0/expat/lib/xmlparse.c#L8389
[ 4.162456] archlinux dbus-daemon[439]: [session uid=120 pid=439] Successfully activated service 'org.freedesktop.systemd1'
[ 4.295723] archlinux /usr/lib/gdm-wayland-session[442]: dbus-daemon[442]: [session uid=120 pid=442] Activating service name='org.freedesktop.systemd1' requested by ':1.2' (uid=120 pid=443 comm="/usr/lib/gnome-session-binary --autostart /usr/sha")
[ 4.299184] archlinux /usr/lib/gdm-wayland-session[442]: dbus-daemon[442]: [session uid=120 pid=442] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
[ 4.299562] archlinux gnome-session[443]: gnome-session-binary[443]: WARNING: Could not check if unit gnome-session-wayland@gnome-login.target is active: Error calling StartServiceByName for org.freedesktop.systemd1: Process org.freedesktop.systemd1 exited with status 1
[ 4.299989] archlinux gnome-session-binary[443]: WARNING: Could not check if unit gnome-session-wayland@gnome-login.target is active: Error calling StartServiceByName for org.freedesktop.systemd1: Process org.freedesktop.systemd1 exited with status 1
[ 4.312398] archlinux gnome-session[443]: gnome-session-binary[443]: WARNING: Desktop file /usr/share/gdm/greeter/autostart/orca-autostart.desktop for application orca-autostart.desktop could not be parsed or references a missing TryExec binary
[ 4.312555] archlinux gnome-session-binary[443]: WARNING: Desktop file /usr/share/gdm/greeter/autostart/orca-autostart.desktop for application orca-autostart.desktop could not be parsed or references a missing TryExec binary
[ 4.562029] archlinux gnome-shell[455]: Running GNOME Shell (using mutter 45.0) as a Wayland display server
[ 4.621103] archlinux gnome-shell[455]: Failed to make thread 'KMS thread' realtime scheduled: GDBus.Error:org.freedesktop.DBus.Error.NameHasNoOwner: Name "org.freedesktop.RealtimeKit1" does not exist
[ 4.625626] archlinux org.gnome.Shell.desktop[455]: pci id for fd 12: 1013:00b8, driver (null)
[ 4.625962] archlinux org.gnome.Shell.desktop[455]: MESA-LOADER: failed to open cirrus: /usr/lib/dri/cirrus_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/dri, suffix _dri)
[ 4.906052] archlinux org.gnome.Shell.desktop[455]: pci id for fd 13: 1013:00b8, driver (null)
[ 4.906052] archlinux org.gnome.Shell.desktop[455]: kmsro: driver missing
[ 4.961439] archlinux gnome-shell[455]: Added device '/dev/dri/card0' (cirrus) using atomic mode setting.
[ 4.962980] archlinux gnome-shell[455]: Failed to initialize accelerated iGPU/dGPU framebuffer sharing: Not hardware accelerated
[ 4.963068] archlinux gnome-shell[455]: Created gbm renderer for '/dev/dri/card0'
[ 4.963184] archlinux gnome-shell[455]: Boot VGA GPU /dev/dri/card0 selected as primary
[ 5.183502] archlinux gnome-shell[455]: Disabling DMA buffer screen sharing (not hardware accelerated)
[ 5.193503] archlinux /usr/lib/gdm-wayland-session[442]: dbus-daemon[442]: [session uid=120 pid=442] Activating service name='org.a11y.Bus' requested by ':1.4' (uid=120 pid=455 comm="/usr/bin/gnome-shell")
[ 5.202405] archlinux /usr/lib/gdm-wayland-session[442]: dbus-daemon[442]: [session uid=120 pid=442] Successfully activated service 'org.a11y.Bus'
[ 5.214010] archlinux kernel: gnome-shell[455]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 5.216770] archlinux gnome-shell[455]: Using public X11 display :1024, (using :1025 for managed services)
[ 5.216932] archlinux gnome-shell[455]: Using Wayland display name 'wayland-0'
[ 5.219519] archlinux org.gnome.Shell.desktop[455]: AddressSanitizer:DEADLYSIGNAL
[ 5.219519] archlinux org.gnome.Shell.desktop[455]: =================================================================
[ 5.219710] archlinux org.gnome.Shell.desktop[455]: ==455==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000007c (pc 0x7fd6bc05f93d bp 0x612000027940 sp 0x7fd6864e8820 T28)
[ 5.219774] archlinux org.gnome.Shell.desktop[455]: ==455==The signal is caused by a READ memory access.
[ 5.219833] archlinux org.gnome.Shell.desktop[455]: ==455==Hint: address points to the zero page.
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #0 0x7fd6bc05f93d in getenv (/usr/lib/libc.so.6+0x4193d) (BuildId: 8bfe03f6bf9b6a6e2591babd0bbc266837d8f658)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #1 0x7fd6b73945df (/usr/lib/libexpat.so.1+0x1f5df) (BuildId: a98bfab551dfa3df6889c33d5fd2ccfa6d505608)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #2 0x7fd6b737add9 (/usr/lib/libexpat.so.1+0x5dd9) (BuildId: a98bfab551dfa3df6889c33d5fd2ccfa6d505608)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #3 0x7fd6b73845af (/usr/lib/libexpat.so.1+0xf5af) (BuildId: a98bfab551dfa3df6889c33d5fd2ccfa6d505608)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #4 0x7fd6b83eb842 (/usr/lib/libfontconfig.so.1+0x2d842) (BuildId: 2f7305d108d26daad426b3855fe9225ddfef356b)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #5 0x7fd6b83ec296 (/usr/lib/libfontconfig.so.1+0x2e296) (BuildId: 2f7305d108d26daad426b3855fe9225ddfef356b)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #6 0x7fd6b83ec479 (/usr/lib/libfontconfig.so.1+0x2e479) (BuildId: 2f7305d108d26daad426b3855fe9225ddfef356b)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #7 0x7fd6b83ef475 (/usr/lib/libfontconfig.so.1+0x31475) (BuildId: 2f7305d108d26daad426b3855fe9225ddfef356b)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #8 0x7fd6b738263e (/usr/lib/libexpat.so.1+0xd63e) (BuildId: a98bfab551dfa3df6889c33d5fd2ccfa6d505608)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #9 0x7fd6b737fc13 (/usr/lib/libexpat.so.1+0xac13) (BuildId: a98bfab551dfa3df6889c33d5fd2ccfa6d505608)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #10 0x7fd6b73817ec (/usr/lib/libexpat.so.1+0xc7ec) (BuildId: a98bfab551dfa3df6889c33d5fd2ccfa6d505608)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #11 0x7fd6b73858e9 in XML_ParseBuffer (/usr/lib/libexpat.so.1+0x108e9) (BuildId: a98bfab551dfa3df6889c33d5fd2ccfa6d505608)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #12 0x7fd6b83eb944 (/usr/lib/libfontconfig.so.1+0x2d944) (BuildId: 2f7305d108d26daad426b3855fe9225ddfef356b)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #13 0x7fd6b83ec296 (/usr/lib/libfontconfig.so.1+0x2e296) (BuildId: 2f7305d108d26daad426b3855fe9225ddfef356b)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #14 0x7fd6b83d4190 (/usr/lib/libfontconfig.so.1+0x16190) (BuildId: 2f7305d108d26daad426b3855fe9225ddfef356b)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #15 0x7fd6b83cf15c (/usr/lib/libfontconfig.so.1+0x1115c) (BuildId: 2f7305d108d26daad426b3855fe9225ddfef356b)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #16 0x7fd6b83d448c in FcInit (/usr/lib/libfontconfig.so.1+0x1648c) (BuildId: 2f7305d108d26daad426b3855fe9225ddfef356b)
[ 5.352442] archlinux org.gnome.Shell.desktop[455]: #17 0x7fd6b8416411 (/usr/lib/libpangoft2-1.0.so.0+0x9411) (BuildId: c4942d7c23fe50db42934220b31981cfbf464e48)
[ 5.353672] archlinux org.gnome.Shell.desktop[455]: #18 0x7fd6bcf3f9a4 (/usr/lib/libglib-2.0.so.0+0x8b9a4) (BuildId: 1916d89bc0f8f0932e584f87427c2fedfc8a293b)
[ 5.353672] archlinux org.gnome.Shell.desktop[455]: #19 0x7fd6bc0aa9ea (/usr/lib/libc.so.6+0x8c9ea) (BuildId: 8bfe03f6bf9b6a6e2591babd0bbc266837d8f658)
[ 5.353672] archlinux org.gnome.Shell.desktop[455]: #20 0x7fd6bc12e7cb (/usr/lib/libc.so.6+0x1107cb) (BuildId: 8bfe03f6bf9b6a6e2591babd0bbc266837d8f658)
[ 5.353672] archlinux org.gnome.Shell.desktop[455]: AddressSanitizer can not provide additional info.
[ 5.353672] archlinux org.gnome.Shell.desktop[455]: SUMMARY: AddressSanitizer: SEGV (/usr/lib/libc.so.6+0x4193d) (BuildId: 8bfe03f6bf9b6a6e2591babd0bbc266837d8f658) in getenv
[ 5.353672] archlinux org.gnome.Shell.desktop[455]: Thread T28 created by T0 here:
[ 5.394804] archlinux /usr/lib/gdm-wayland-session[494]: dbus-daemon[494]: Activating service name='org.a11y.atspi.Registry' requested by ':1.0' (uid=120 pid=455 comm="/usr/bin/gnome-shell")
[ 5.399525] archlinux dbus-daemon[370]: [system] Activating via systemd: service name='org.freedesktop.ColorManager' unit='colord.service' requested by ':1.16' (uid=120 pid=455 comm="/usr/bin/gnome-shell")
[ 5.404267] archlinux org.gnome.Shell.desktop[496]: Failed to initialize glamor, falling back to sw
[ 5.430379] archlinux org.gnome.Shell.desktop[455]: #0 0x7fd6bd04a497 in __interceptor_pthread_create /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_interceptors.cpp:208
[ 5.430379] archlinux org.gnome.Shell.desktop[455]: #1 0x7fd6bcf40f53 (/usr/lib/libglib-2.0.so.0+0x8cf53) (BuildId: 1916d89bc0f8f0932e584f87427c2fedfc8a293b)
[ 5.430379] archlinux org.gnome.Shell.desktop[455]: ==455==ABORTING
[ 5.430518] archlinux /usr/lib/gdm-wayland-session[498]: SpiRegistry daemon is running with well-known name - org.a11y.atspi.Registry
[ 5.430758] archlinux /usr/lib/gdm-wayland-session[494]: dbus-daemon[494]: Successfully activated service 'org.a11y.atspi.Registry'
[ 5.430825] archlinux systemd[1]: Starting colord.service...
[ 5.431582] archlinux gnome-session[443]: gnome-session-binary[443]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1
But I'm currently debugging a different issue that also involves rebooting a lot, so I'll keep an eye on the logs if something else (and possibly more helpful) pops up.
$ mkdir arch-debug
$ cd arch-debug
$ cat >Vagrantfile <<EOF
Vagrant.configure("2") do |config|
config.vm.box = "archlinux/archlinux"
config.vm.synced_folder ".", "/vagrant", disabled: true
end
EOF
$ vagrant up --provider=libvirt
$ vagrant ssh -c 'sudo bash -c "systemctl disable systemd-time-wait-sync; pacman --noconfirm -Sy gdm; systemctl set-default graphical.target; systemctl enable gdm"'
$ vagrant reload
$ while ! vagrant ssh -c 'systemctl --wait is-system-running; sleep 10; sudo journalctl -b --grep "[k]illed by signal"'; do vagrant reload; done
...
default: Warning: Connection refused. Retrying...
==> default: Machine booted and ready!
==> default: Creating shared folders metadata...
==> default: Machine already provisioned. Run `vagrant provision` or use the `--provision`
==> default: flag to force provisioning. Provisioners marked to run always will still run.
running
Oct 23 17:16:49 archlinux gnome-session-binary[356]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11
Oct 23 17:16:49 archlinux gnome-session[356]: gnome-session-binary[356]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11
"Relatively" meaning that it can take a couple dozen tries before gnome-shell crashes.
[0] https://gitlab.archlinux.org/archlinux/arch-boxes
Can repro using your instructions. Backtrace with debug symbols attached. Definitely seems fontconfig related.. but it still doesn't seem like an Arch packaging issue...
Hm, Mutter is calling setenv at various points during startup so I wouldn't be surprised if this is racing with Pango's threaded initialization of FontConfig.
[1]: https://gitlab.gnome.org/GNOME/gnome-shell/-/issues/6974
Thanks @loqs. It does indeed look very similar. I've linked this report there. After the analysis from @heftig, there is not much doubt about this being an upstream issue.