FS#64805 - [nvidia-utils] [nvidia-390xx-utils] PrimaryGPU option breaks autodetection

Attached to Project: Arch Linux
Opened by Giancarlo Razzolini (grazzolini) - Wednesday, 11 December 2019, 00:44 GMT
Last edited by Giancarlo Razzolini (grazzolini) - Wednesday, 22 January 2020, 17:31 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Sven-Hendrik Haase (Svenstaro)
Felix Yan (felixonmars)
Giancarlo Razzolini (grazzolini)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 5
Private No

Details

Description:

The PrimaryGPU option shipped with the 10-nvidia-drm-outputclass.conf forces the NVIDIA GPU to be the primary one, breaking X autodetection feature. It also interferes with PRIME Render Offload setups, by either failing to start X completely, or making the setup to use reverse PRIME, rendering everything on the NVIDIA card by default.

Additional info:
* nvidia-utils-440.36-1
* nvidia-390xx-utils-390.132-1

Steps to reproduce:

Install nvidia-utils or nvidia-390xx-utils

Workaround:

Comment the PrimaryGPU option on the /usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf file.
This task depends upon

Closed by  Giancarlo Razzolini (grazzolini)
Wednesday, 22 January 2020, 17:31 GMT
Reason for closing:  Implemented
Comment by Giancarlo Razzolini (grazzolini) - Friday, 13 December 2019, 02:32 GMT Comment by Sven-Hendrik Haase (Svenstaro) - Friday, 13 December 2019, 05:45 GMT
If I remember right, there was a reason to put that thing in way way back. I don't remember right, it's been many years. Quite likely the relevant parts in the driver have changed enough by now to re-evaluate that.
Comment by Sven-Hendrik Haase (Svenstaro) - Friday, 13 December 2019, 05:45 GMT
Perhaps we should even just ask upstream directly on what we should ship here exactly.
Comment by Daniel Apolinario (dapolinario) - Friday, 13 December 2019, 11:44 GMT
I suggest that setting the 10-nvidia-drm-outputclass.conf file
simply be:

Section "OutputClass"
Identifier "nvidia"
MatchDriver "nvidia-drm"
Driver "nvidia"
EndSection

I relied on other distributions. The nvidia-prime package works correctly. On equipment where the nvidia card is not the main one, you must make explicit the 'Option "PrimaryGPU" "no"' setting.
Comment by Joaquín Ignacio Aramendía (Samsagax) - Friday, 13 December 2019, 12:21 GMT
To add to @dapolinario, from my experience, the ModulePath bits are important too. I managed to make it all work without conflict with this minimal file:

Section "OutputClass"
Identifier "nvidia"
MatchDriver "nvidia-drm"
Driver "nvidia"
Option "AllowEmptyInitialConfiguration"
ModulePath "/usr/lib/nvidia/xorg"
ModulePath "/usr/lib/xorg/modules"
EndSection

But I think those two lines could be in a separate "Files" section as well
Comment by Daniel Apolinario (dapolinario) - Friday, 13 December 2019, 12:39 GMT
I agree with @samsagax. I checked Xorg.0.log with the suggested settings and returned no errors, unlike the simplistic settings I suggested.
Comment by Giancarlo Razzolini (grazzolini) - Friday, 13 December 2019, 12:46 GMT
@Sven,

I have conducted a few tests over this last week. I believe that was added because on desktops that have both intel and nvidia, but not optimus setup, i.e. both cards have their own, separate outputs, it's usually preferable that the nvidia card is the primary one. That's not the case on optimus setups. Also, just by having the nvidia package installed, it makes the usage of xf86-video-intel impossible, because the other thing I found out, is that the rules on /usr/share/X11/xorg.conf.d are applied, regardless of what you have on /etc/X11/xorg.conf or /etc/X11/xorg.conf.d. And, it seems they are also applied *last*. No way to override them, as far as I know, without recompiling Xorg. Not that it really matters in this case, since nvidia recommends using the modesetting driver for prime render offload.

But, I've tested, after discussing with Robin Broda, and you can also do prime render offload with xf86-video-intel, you only loose the ability to switch providers with xrandr. But you can still offload things to the nvidia card.

So, this file breaks autodetection in various ways, not just related to prime render offload. I did a non exhaustive search of other distros, but didn't found any deploying a file like this. I think the minimum *ideal* file would be:

Section "OutputClass"
Identifier "nvidia"
MatchDriver "nvidia-drm"
Driver "nvidia"
Option "AllowEmptyInitialConfiguration"
ModulePath "/usr/lib/nvidia/xorg"
ModulePath "/usr/lib/xorg/modules"
EndSection

No section for intel driver is required, because a) modesetting *is* the default, if you don't have xf86-video-intel installed b) it breaks autodetection by just having the nvidia package *installed*. By the way, this bug would also affect nvidia-390xx.
Comment by David Roth (V1del) - Sunday, 12 January 2020, 15:14 GMT
  • Field changed: Percent Complete (100% → 0%)
While the general reasoning does sound sound this seems to break peoples systems that use the reverse prime setup: https://bbs.archlinux.org/viewtopic.php?pid=1881067#p1881067

Should we just tell them to manually set this up?
Comment by Giancarlo Razzolini (grazzolini) - Monday, 13 January 2020, 13:23 GMT
@David

First of all, this seems to affect setups where the nvidia card has outputs and/or it's a completely independent card. Setting the PrimaryGPU option is not the best solution for this. The right solution is to use reverse prime and use xrandr --setproviders.

Given that Xorg merges all configurations into a "meta" configuration, we should have the bare minimum on /usr/share/X11/xorg.conf.d. Because it will get added to whatever you configure on /etc/X11/xorg.conf and/or /etc/X11/xorg.conf.d. As far as I can tell, there is no way to deactivate this. The man page for xorg.conf explicitly tells that whatever is on /usr/share/X11 is looked up *last*.

I have looked up what debian, fedora, ubuntu and suse do and neither of these distros deploy a file that: a) forced intel to always use modesetting (it made xf86-video-intel not to work) b) set PrimaryGPU

So, to sum up, yes, people that have a separate independent nvidia card (or optimus setup with nvidia wired outputs) should use reverse prime or set PrimayGPU themselves.
Comment by Andri Möll (moll) - Monday, 13 January 2020, 13:42 GMT
I can add that on my regular desktop computer (no built-in GPUs) with two PCIe Nvidia GPUs (one GTX 960, other GTX 1080 Ti), the removal of PrimaryGPU broke Xorg entirely. That is, Xorg failed with "GPU screens are not yet supported by the NVIDIA driver". That even if the GTX 960 GPU was disabled by handing it over to `vfio-pci` during boot for later use in a virtual machine. Seems as if the 1080 Ti is triggering technology that for me seems to be related to laptops or systems with built-in GPUs, and without PrimaryGPU, bails out leaving a black screen and a puzzled user.
Comment by Giancarlo Razzolini (grazzolini) - Monday, 13 January 2020, 14:02 GMT
@Andri

If you have two GPU's using the same driver then, in this particular case, you can't use reverse prime, as far as I know. You have to set which one is the PrimaryGPU. And you need to even use the BusID to make sure you select the right card. Otherwise, it will be left to X itself to determine which one it'll use as primary, which may or not be deterministic.

Now, for your other case, where you disable one of the cards, you need to paste you X log, because if there's only one card being detected on the host system, it should be the primary regardless of what you set.
Comment by Sven-Hendrik Haase (Svenstaro) - Monday, 13 January 2020, 15:52 GMT
I trust Giancarlo's judgement on this as far as the packaging goes. We discussed adding some specific packages in the past that contain some pre-made config files for specific common setups. We could also document this in the wiki or just do both and refer to those packages in the wiki. Obviously the current situation isn't optimal but it's now technically more correct than our default from before which also only sometimes worked.
Comment by David Roth (V1del) - Tuesday, 21 January 2020, 13:00 GMT
Sorry for the lateish reply and thanks for opening this up for discourse again.

I generally do agree that we might opt for keeping the default as a plain baseline to add up on top. As far as I know the original file suggestion with PrimaryGPU added came from Aaron Plattner in response to a bug report (I can't for the life of me dig up said bug again, but I know it was here somewhere) back when we didn't have the native offloading method and it generally made sense to enable this under the assumption that people wanted to use their nvidia card on their systems when installing the nvidia driver. it also made reverse prime more trivial to implement because you didn't need an explicit config of your own and could just fire off the xrandr options to change the provider.

But as we do have native prime now, it might indeed make sense to reevaluate this. I don't think it's inherently problematic that we have to tell users to look at the config in the wiki now, but I know that I've adviced quite a few people to simply adjust their .xinitrcs or display manager configs under the assumption that the /usr/share config would be what it used to be.

In general it would've been nice to have had a news item about this, there have often been some "experiments"/change arounds with the nvidia packages that where only found out by either being bitten by it or interested enough in following changelogs.



Comment by Giancarlo Razzolini (grazzolini) - Tuesday, 21 January 2020, 14:13 GMT
@V1del

Arch is the only distro I know of (I've looked) that had this option and also that forced the intel driver to use modesetting. Keep in mind that the file distributed with the nvidia driver has only this:

# This xorg.conf.d configuration snippet configures the X server to
# automatically load the nvidia X driver when it detects a device driven by the
# nvidia-drm.ko kernel module. Please note that this only works on Linux kernels
# version 3.9 or higher with CONFIG_DRM enabled, and only if the nvidia-drm.ko
# kernel module is loaded before the X server is started.

Section "OutputClass"
Identifier "nvidia"
MatchDriver "nvidia-drm"
Driver "nvidia"
EndSection

I'm not entirely sure this deserves a news item, and I also don't consider this an "experiment". I get that the PrimaryGPU option might had been a good option to have on the default file, but now that we have a native prime solution, I think it should go. Also, this option is not required all the time, it's only when you really want to make the nvidia card the Primary one. And the documentation on wiki already mentions this.
Comment by David Roth (V1del) - Tuesday, 21 January 2020, 14:53 GMT
I know that it isn't required but it was there for a few years and people have - whether knowingly or not - started to rely on it being present.

I do agree that in the long run it's better to have the base line as is (especially now that we have native prime and the options directly conflict) and thus I agree that the change in and of itself shouldn't be reverted. But as it broke a lot of people's xorg and - IMO predictably so - it would have been nice to have a heads up along the lines of "We decided to slim down some package shipped config file, if you used an Optimus system with an external screen attached copy /usr/share/X11/xorg.conf.d/10-nvidia-drm.conf to /etc/X11/xorg.conf.d/10-nvidia-drm.conf and readd the PrimaryGPU option".

Comment by Giancarlo Razzolini (grazzolini) - Tuesday, 21 January 2020, 15:44 GMT
Let me say this again:

1) The PrimaryGPU option affects desktops that have more than one card (not optimus setups)
2) Optimus setups where the nvidia card has outputs wired to it.

In both cases, reverse prime is the way to go. PrimaryGPU is only used in case you actually want to override the way X autodetect things, and force it to use a certain card.
It's not an nvidia option at all, and it should not even have been there in the first place. I understand there might had been a situation where this was necessary, but it should have
been used only as a temporary solution.

If you can come up with a case where reverse prime does not work (I know there could be some) and/or there are more scenarios, then we can have a news entry.
Comment by David Roth (V1del) - Wednesday, 22 January 2020, 17:09 GMT
Yes and these two usecases broke without warning and it would have been nice to have a warning.

I fully agree with you that the file as shipped right now makes more sense in the long run and for having custom snippets that don't have to take predefined decisions from system files into account (and FWIW I did propose that in: https://bugs.archlinux.org/task/60102#comment173490 as well).

As far as this current incident goes it's probably too late have one, most people will have hopefully seen the forum thread or the wiki by now.



Comment by Giancarlo Razzolini (grazzolini) - Wednesday, 22 January 2020, 17:31 GMT
Good to know of another case where PrimaryGPU *by default* was causing issues. I'm not surprised. I'm going to close this for now, and yes, I don't see a point in making a news entry now.

Loading...