FS#65077 : Hang resuming from sleep / "ucsi_ccg failed to reset PPM!" (timeout)

FS#65077 - Hang resuming from sleep / "ucsi_ccg failed to reset PPM!" (timeout)

Attached to Project: Arch Linux
Opened by Nahuel Pastorale (XRovertoX) - Wednesday, 08 January 2020, 01:15 GMT
Last edited by freswa (frederik) - Thursday, 20 February 2020, 22:05 GMT

Task Type	Bug Report
Category	Kernel
Status	Closed
Assigned To	No-one
Architecture	x86_64
Severity	Low
Priority	Normal
Reported Version
Due in Version	Undecided
Due Date	Undecided
Percent Complete
Votes	3 Kevin Alberts (Kurocon) (2020-02-02) Ashwin Vishnu (jadelord) (2020-01-11) Nahuel Pastorale (XRovertoX) (2020-01-08)
Private	No

Details

Short description:
System hangs when resuming from sleep because of USB C driver timeout.

Long description:
Whenever I resume from sleep (done by closing the lid on my notebook kernel: PM: suspend entry (deep)) I experience a almost a minute long delay to wake up.
The same erorr message shows when booting too, but it does not stops nor delay the booting process.

```
kernel: ucsi_ccg 0-0008: failed to reset PPM!
kernel: ucsi_ccg 0-0008: PPM init failed (-110)
kernel: ucsi_ccg 0-0008: PPM NOT RESPONDING
kernel: PM: dpm_run_callback(): ucsi_ccg_resume+0x0/0x20 [ucsi_ccg] returns -110
kernel: PM: Device 0-0008 failed to resume: error -110
```

From what I've been able to investigate, this may be because of a USB Type-C controller on the NVIDIA GPU (RTX 2060 Mobile) from which this driver would be the responsible https://github.com/torvalds/linux/blob/master/drivers/usb/typec/ucsi/ucsi.c

A workaround is to add `blacklist ucsi_ccg` to /etc/modprobe.d/blacklist.conf

Additional info:
Kernel 5.4.8-arch1-1
Lenovo Y540-15IRH 81SX

Limited output of lspci -vvv:
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU106 USB Type-C Port Policy Controller (rev a1)
Subsystem: Lenovo TU106 USB Type-C UCSI Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin D routed to IRQ 141
Region 0: Memory at b4004000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee004d8 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b4] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Kernel driver in use: nvidia-gpu
Kernel modules: i2c_nvidia_gpu

Same bug reported by Nickolay Ponomarev: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1850238

Steps to reproduce:
0. I believe it is necessary to have a NVIDIA Turing card
1. Suspend the computer (in my case closing the lid)
2. Wake the computer from sleep

This task depends upon

Closed by freswa (frederik)
Thursday, 20 February 2020, 22:05 GMT
Reason for closing: None
Additional comments about closing: This seems pretty stalled to me. If it's still an issue. Please fill a re-open request. Thank you :)

Comment by Ashwin Vishnu (jadelord) - Saturday, 11 January 2020, 23:07 GMT

I have a NVIDIA Turing card [Quadro RTX 3000 Mobile / Max-Q] and I observe the same delay while resuming.

Additional details:

$ lspci -k | grep -A 2 -E "(VGA|3D)"

00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile) (rev 02)
DeviceName: Onboard - Video
Subsystem: Dell UHD Graphics 630 (Mobile)
--
01:00.0 VGA compatible controller: NVIDIA Corporation TU106GLM [Quadro RTX 3000 Mobile / Max-Q] (rev a1)
Subsystem: Dell TU106GLM [Quadro RTX 3000 Mobile / Max-Q]
Kernel driver in use: nvidia

$ journalctl -p err..alert -b 0 | tail -n 10
Jan 11 23:31:17 hostname kernel: ucsi_ccg 0-0008: i2c_transfer failed -110
Jan 11 23:31:17 hostname kernel: PM: dpm_run_callback(): ucsi_ccg_resume+0x0/0x20 [ucsi_ccg] returns -110
Jan 11 23:31:17 hostname kernel: PM: Device 0-0008 failed to resume: error -110
Jan 11 23:31:18 hostname kernel: iwlwifi 0000:6f:00.0: BIOS contains WGDS but no WRDS
Jan 11 23:31:18 hostname kernel: iwlwifi 0000:6f:00.0: BIOS contains WGDS but no WRDS
Jan 11 23:31:22 hostname kernel: iwlwifi 0000:6f:00.0: BIOS contains WGDS but no WRDS
Jan 11 23:48:27 hostname kernel: nvidia-gpu 0000:01:00.3: i2c timeout error e0000040
Jan 11 23:48:28 hostname kernel: ucsi_ccg 0-0008: i2c_transfer failed -110
Jan 11 23:55:07 hostname kernel: nvidia-gpu 0000:01:00.3: i2c timeout error e0000040
Jan 11 23:55:08 hostname kernel: ucsi_ccg 0-0008: i2c_transfer failed -110

I am going to try if the workaround helps.

	Tasks related to this task (0)

Duplicate tasks of this task (0)

Arch Linux

FS#65077 - Hang resuming from sleep / "ucsi_ccg failed to reset PPM!" (timeout)

Details

Loading...