FS#57331 - [linux] System unable to enumerate NVME disks after kernel upgrade to 4.15

Attached to Project: Arch Linux
Opened by George (Vash63) - Thursday, 01 February 2018, 23:38 GMT
Last edited by Jan Alexander Steffens (heftig) - Sunday, 04 February 2018, 13:19 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description: After upgrading to 4.15 from [testing] I was unable to mount my root disk, which is located on a Samsung 960 Pro NVMe SSD. My /boot is located on a separate, older SATA disk and was able to boot the system and kick me to a Busybox recovery with no / disk. When trying to locate or mount the disks the disk is completely missing from /dev - there is no entry for /dev/nvme0n1 or any partitions under it. To recover my system I had to use an arch install drive, arch-chroot in and remove myself from testing repos, once the system was downgraded everything was happy again.


Additional info:
* package version(s)
Linux & Linux headers 4.15-1
* config and/or log files etc.
The only error I saw in dmesg was:
"nvme failed to set apst feature"
(paraphrased as I couldn't get the system to boot to copy logs)

I did find a user on LKML reporting what appears to be the same issue with 4.15: https://lkml.org/lkml/2017/12/14/907

However no solution is in this LKML thread. Reporting here as this could be a critical issue for anyone running this SSD.


Steps to reproduce:
1. Install a Samsung 960 Pro
2. Boot to Linux 4.15
3. Attempt to locate or mount the SSD
This task depends upon

Closed by  Jan Alexander Steffens (heftig)
Sunday, 04 February 2018, 13:19 GMT
Reason for closing:  Fixed
Additional comments about closing:  4.15.1-1
Comment by George (Vash63) - Friday, 02 February 2018, 00:23 GMT
I was able to boot my system by adding kernel flag 'pcie_aspm.policy=powersave' - based on the LKML thread it now defaults devices to 'powersupersave' which is probably what's breaking my system.
Comment by loqs (loqs) - Friday, 02 February 2018, 00:35 GMT
Please install the linux-lts kernel so you have a fallback kernel while testing the 4.15 one.
With the 4.15 kernel does the boot option nvme_core.default_ps_max_latency_us=250 make any difference?
You could try a adding a quirk at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v4.15#n2453
} else if (pdev->vendor == 0x144d) {
return NVME_QUIRK_NO_APST;
}
This should stop all NVME devices by Samsung from using APST
Edit:
never mind the above so it is the config change
CONFIG_PCIEASPM_POWER_SUPERSAVE=y from CONFIG_PCIEASPM_DEFAULT=y
plus a device bug

Loading...