FS#65077 - Hang resuming from sleep / "ucsi_ccg failed to reset PPM!" (timeout)

Attached to Project: Arch Linux
Opened by Nahuel Pastorale (XRovertoX) - Wednesday, 08 January 2020, 01:15 GMT
Last edited by freswa (frederik) - Thursday, 20 February 2020, 22:05 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To No-one
Architecture x86_64
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 3
Private No

Details

Short description:
System hangs when resuming from sleep because of USB C driver timeout.

Long description:
Whenever I resume from sleep (done by closing the lid on my notebook kernel: PM: suspend entry (deep)) I experience a almost a minute long delay to wake up.
The same erorr message shows when booting too, but it does not stops nor delay the booting process.

```
kernel: ucsi_ccg 0-0008: failed to reset PPM!
kernel: ucsi_ccg 0-0008: PPM init failed (-110)
kernel: ucsi_ccg 0-0008: PPM NOT RESPONDING
kernel: PM: dpm_run_callback(): ucsi_ccg_resume+0x0/0x20 [ucsi_ccg] returns -110
kernel: PM: Device 0-0008 failed to resume: error -110
```

From what I've been able to investigate, this may be because of a USB Type-C controller on the NVIDIA GPU (RTX 2060 Mobile) from which this driver would be the responsible https://github.com/torvalds/linux/blob/master/drivers/usb/typec/ucsi/ucsi.c

A workaround is to add `blacklist ucsi_ccg` to /etc/modprobe.d/blacklist.conf



Additional info:
Kernel 5.4.8-arch1-1
Lenovo Y540-15IRH 81SX

Limited output of lspci -vvv:
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU106 USB Type-C Port Policy Controller (rev a1)
Subsystem: Lenovo TU106 USB Type-C UCSI Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin D routed to IRQ 141
Region 0: Memory at b4004000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee004d8 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b4] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Kernel driver in use: nvidia-gpu
Kernel modules: i2c_nvidia_gpu


Same bug reported by Nickolay Ponomarev: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1850238



Steps to reproduce:
0. I believe it is necessary to have a NVIDIA Turing card
1. Suspend the computer (in my case closing the lid)
2. Wake the computer from sleep
This task depends upon

Closed by  freswa (frederik)
Thursday, 20 February 2020, 22:05 GMT
Reason for closing:  None
Additional comments about closing:  This seems pretty stalled to me. If it's still an issue. Please fill a re-open request. Thank you :)
Comment by Ashwin Vishnu (jadelord) - Saturday, 11 January 2020, 23:07 GMT
I have a NVIDIA Turing card [Quadro RTX 3000 Mobile / Max-Q] and I observe the same delay while resuming.

Additional details:

$ lspci -k | grep -A 2 -E "(VGA|3D)"

00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile) (rev 02)
DeviceName: Onboard - Video
Subsystem: Dell UHD Graphics 630 (Mobile)
--
01:00.0 VGA compatible controller: NVIDIA Corporation TU106GLM [Quadro RTX 3000 Mobile / Max-Q] (rev a1)
Subsystem: Dell TU106GLM [Quadro RTX 3000 Mobile / Max-Q]
Kernel driver in use: nvidia


$ journalctl -p err..alert -b 0 | tail -n 10
Jan 11 23:31:17 hostname kernel: ucsi_ccg 0-0008: i2c_transfer failed -110
Jan 11 23:31:17 hostname kernel: PM: dpm_run_callback(): ucsi_ccg_resume+0x0/0x20 [ucsi_ccg] returns -110
Jan 11 23:31:17 hostname kernel: PM: Device 0-0008 failed to resume: error -110
Jan 11 23:31:18 hostname kernel: iwlwifi 0000:6f:00.0: BIOS contains WGDS but no WRDS
Jan 11 23:31:18 hostname kernel: iwlwifi 0000:6f:00.0: BIOS contains WGDS but no WRDS
Jan 11 23:31:22 hostname kernel: iwlwifi 0000:6f:00.0: BIOS contains WGDS but no WRDS
Jan 11 23:48:27 hostname kernel: nvidia-gpu 0000:01:00.3: i2c timeout error e0000040
Jan 11 23:48:28 hostname kernel: ucsi_ccg 0-0008: i2c_transfer failed -110
Jan 11 23:55:07 hostname kernel: nvidia-gpu 0000:01:00.3: i2c timeout error e0000040
Jan 11 23:55:08 hostname kernel: ucsi_ccg 0-0008: i2c_transfer failed -110

I am going to try if the workaround helps.

Loading...