Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#72333 - NVMe SSD WD SN730 disappearing after suspend/resume

Attached to Project: Arch Linux
Opened by Elias Projahn (johrpan) - Sunday, 03 October 2021, 12:40 GMT
Last edited by Andreas Radke (AndyRTR) - Sunday, 10 October 2021, 18:14 GMT
Task Type Bug Report
Category Kernel
Status Assigned
Assigned To Jan Alexander Steffens (heftig)
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No

Details

Description:

I have a ThinkPad L13 Yoga Gen. 2 at hand which includes a NVMe SSD of the following model:

WDC PC SN730 SDBQNTY-512G-1001

The firmware version is 11170101 according to nvme-cli which is the most recent firmware advertised by Lenovo on LVFS [1]. The NVMe SSD seems to not wake up properly after suspending the system. I'm reproducing this with Arch Linux installed to the SSD (which obviously results in a system crash due to lots of file system errors) as well as using the most recent Arch Linux installation ISO as a live system. The exact error message I get is:

```
nvme 000:04:00.0: can't change power state from D3cold to D0 (config space inaccessible)
nvme nvme0 removing after probe failure status: -19
nvme0n1: detected capacity change from 1000215216 to 0
```

I found many similar cases involving other models where disabling APST could be used as a work-around, which didn't work for me. I also tried tweaking `acpi_osi` two multiple values advertised by the firmware ("Windows10", "Windows 2015", "Linux") which also didn't help.

Steps to reproduce the original issue:

1. Boot the live system.
2. Run `lsblk` to see the working NVMe controller.
3. Mount a partition on the NVMe SSD.
4. Suspend the system using `systemctl suspend`.
5. Resume the system.
6. Run `lsblk` again to see the NVMe controller being disappeared.

Steps to reproduce the non-work-around:

1. Boot the system with `nvme_core.default_ps_max_latency_us=0` to disable APST.
2. Run `nvme get-feature /dev/nvme0 -f 0x0c -H` to verify APST being disabled.
3. Mount a partition and suspend/resume as described above.
4. Run `lsblk` to see the NVMe controller being disappeared anyway.

[1] https://fwupd.org/lvfs/devices/com.lenovo.PCSN730.firmware
This task depends upon

Comment by Mark McBride (markmcb) - Thursday, 07 October 2021, 16:59 GMT
I can confirm this behavior on a different laptop and NMVe setup:

* System76 Lemur Pro (lemp10, Dec 2020) https://tech-docs.system76.com/models/lemp10/README.html
* Samsung SSD 980 PRO 500GB (OS)
* Samsung SSD 970 EVO Plus 1TB (Bulk storage)

Same as the original report, the 1TB will drop out sometimes after suspend, but not always. Maybe a third of the time? I've also tried the kernel parameter tweaks with a variety of settings to include 0 with no benefit. The OS drive always resumes without issue. Only the 2nd drive is affected. Reboot returns everything to normal.
Comment by Michel Koss (MichelKoss1) - Thursday, 07 October 2021, 18:51 GMT
Do you use power management tools like TLP?
Comment by Mark McBride (markmcb) - Thursday, 07 October 2021, 19:50 GMT
No TLP. Are you suggesting this might help? Or cause problems?
Comment by Michel Koss (MichelKoss1) - Friday, 08 October 2021, 14:18 GMT
I thought it may cause problems so thx for ruling it out.

Loading...