FS#80292 - [libdrm] Very slow application start on X11 under VMware (due to udev polling)

Attached to Project: Arch Linux
Opened by Stefan Hoffmeister (shoffmeister) - Sunday, 19 November 2023, 16:32 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:21 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Andreas Radke (AndyRTR)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

With the bugfix from https://bugs.archlinux.org/task/80284, processes using the Xorg X11 server (via libdrm) start with massive latency on VMware Workstation; this is not unexpected given the information present in task #80284.

The root cause for this is an interaction between a crude busy poll loop in libdrm, virtualization software taking a long time, and a change in libdrm which amplified these challenges.

The net effect is that, as-is, every process connecting to Xorg will get a startup penalty of at, say, 1.8 seconds (measured locally); I am also seeing delays of up to 2.8 seconds (most likely caused by usleep or stat timing/performance under varying virtualization config). This presents itself as

* very long time to desktop environment shell (under X11)
* applications such as KDE konsole starting with a delay of 1.8+ seconds

https://discussion.fedoraproject.org/t/x11-huge-delay-for-each-process-vmware-workstation-regression/95708 talks alot about this, https://github.com/shoffmeister/drm/commit/db85c9dc0553414ca77d5442fb7c203f550f267a has a hack to make things work better again.
This task depends upon

Closed by  Buggy McBugFace (bugbot)
Saturday, 25 November 2023, 20:21 GMT
Reason for closing:  Moved
Additional comments about closing:  https://gitlab.archlinux.org/archlinux/p ackaging/packages/libdrm/issues/1
Comment by Stefan Hoffmeister (shoffmeister) - Sunday, 19 November 2023, 16:33 GMT
FWIW, I believe that applying the fix in task #80284 was correct. It's just quiet inconvenient that this comes back to bite me that quickly ...
Comment by Stefan Hoffmeister (shoffmeister) - Sunday, 19 November 2023, 20:27 GMT
I added a convenience script for Archlinux patching to https://github.com/shoffmeister/drm/commits/hack/assume-single-static-gpu-device

FWIW, that hack works fine, but is not generally applicable - see the commit message. I mention the hack only to be discovered by those who might be interested in local patching.
Comment by Andreas Radke (AndyRTR) - Monday, 20 November 2023, 06:25 GMT
@Stefan: can you please bring this to the upstream tracker: https://gitlab.freedesktop.org/mesa/drm/-/issues so they are aware and try to find a proper solution.
Comment by Stefan Hoffmeister (shoffmeister) - Monday, 20 November 2023, 06:33 GMT
@AndyRTR - I'll try making (some) things happen upstream.

The actionable part on behalf of upstream is to address the usleep loop (in the "let's wait for udevd" case) which is not bounded by monotonic time.

Archlinux itself doesn't have much of an option to act; build option "udev==true" is The Right Approach.
Comment by Stefan Hoffmeister (shoffmeister) - Monday, 20 November 2023, 07:51 GMT
FWIW, there seems to be some additional interaction causing the issue to appear / disappear:

A _plain_ Archlinux installation ("archinstall") with the KDE desktop environment apparently will create a system which has LightDM as the login greeter. In that setup, everything works perfectly fine, no delays to be seen.

An EndeavourOS installation (which is essentially Archlinux with an opinionated installer, AFAICT) will create an environment where the greeter is sddm. In that setup, libdrm (2.4.117-2) apparently struggles to open a DRM device with the default udev acquisition strategy on VMware Workstation (again, this is due to the timing in that polling loop under virtualization).
Comment by Stefan Hoffmeister (shoffmeister) - Tuesday, 21 November 2023, 06:58 GMT
FWIW, https://gitlab.freedesktop.org/mesa/drm/-/issues/85 existed already, I added a comment explaining that this hits hard(er) in more use cases.

Loading...