FS#51503 - [nvidia] gpu installed in this system is not supported due runtime pm

Attached to Project: Arch Linux
Opened by Bjoern Bidar (Thaodan) - Saturday, 22 October 2016, 21:08 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Friday, 16 December 2016, 15:55 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Sven-Hendrik Haase (Svenstaro)
Felix Yan (felixonmars)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
If runtime puts the gpu into suspend (d3cold) before the nvidia driver is loaded.
The gpu is offline and the driver can not accessing it, resulting in the message in the title.
Either the bumblebe projects implements a hack until nvidia does this or it wont work until nvidia changes their driver

Their are several ways to fix this:
1. Add pcie_port_pm=off to the kernel commandline
2. Do the opposide of this: https://wiki.archlinux.org/index.php/Power_management#PCI_Runtime_Power_Management
3. mark the gpu as hotplugable (as optimus is somekind of hotplug), so that the gpu not put to d3cold
4. Wake the gpu up before accessing it. (either in bumblebee/bbswitch or by nvidia).

The first two are the easyest to do and should be done first as using the gpu wont work without a fix.
The bug affects users of the bumblebee users with and without bumblebee.
Its clearly an upstream bug, but there should be at least a working about the issue.

Additional info:
* pkgs: linux >= 4.8.x, bumblebee (not specific to it especially), nvidia
* log files etc. see upstream bug:
* upstream url: https://github.com/Bumblebee-Project/Bumblebee/issues/810
* post in the nvidia forum: https://devtalk.nvidia.com/default/topic/971733/linux/nvidia-gtx-960m-not-supported-anymore-by-370-28-/?offset=5#4999717

Steps to reproduce:
1. boot the system with a kernel >= 4.8 without any of the fixes mentioned
2. try to use it.
3. wont work
This task depends upon

Closed by  Sven-Hendrik Haase (Svenstaro)
Friday, 16 December 2016, 15:55 GMT
Reason for closing:  Fixed
Comment by Doug Newgard (Scimmia) - Sunday, 23 October 2016, 04:33 GMT
The first two options are documentation/wiki issues. Sounds like this is only an issue if you use something separate to set up power management.
Comment by Bjoern Bidar (Thaodan) - Sunday, 23 October 2016, 08:19 GMT
No the issue affects anyone that disables the card via bbswitch or a similar module and has a system that supports d3cold suspend.
Comment by Sven-Hendrik Haase (Svenstaro) - Sunday, 23 October 2016, 17:06 GMT
Is this the same as  FS#51268 ?
Comment by Bjoern Bidar (Thaodan) - Sunday, 30 October 2016, 19:05 GMT
Don't know really, but it doesn't look like it is.
Please look at the last commits on the upstream bug report.
Using the branch pm-rework but not the commit 5c7b3f53f229c70bc49c710295967605ac5846e4 fixes the issue as it implements the support for pm-runtime suspend.

the only thing that needs to be done is to enable/disable the card before/after suspend (a solution is to stop/start bumblebee after/before suspend).
Comment by Bjoern Bidar (Thaodan) - Sunday, 30 October 2016, 19:10 GMT
The commit I posted was wrong the right id is: e0c68599bed6c11e37d5228a3c014b9575bf9edb
Comment by Sven-Hendrik Haase (Svenstaro) - Saturday, 05 November 2016, 15:27 GMT
What can we do here from a packaging standpoint that won't potentially break things for other users?
Comment by Bjoern Bidar (Thaodan) - Saturday, 05 November 2016, 15:34 GMT
Use the newer version, but test it for users with older notebooks first (<2015).
I already use it in bbswitch-pf and it makes no issues.
The only thing that needs to be done is that the gpu is on before suspend.
Comment by Julio (The_Loko) - Friday, 16 December 2016, 13:42 GMT
Fixed on nvidia 375.26, can't reproduce anymore.

Loading...