FS#74916 - NVME device not found with linux-5.18.1.arch1-1

Attached to Project: Arch Linux
Opened by Karsten Elfenbein (Elfe) - Tuesday, 31 May 2022, 08:55 GMT
Last edited by Toolybird (Toolybird) - Saturday, 30 July 2022, 22:13 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
David Runge (dvzrv)
Architecture x86_64
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No


After upgrading to linux-5.18.1.arch1-1 a second identical nvme device is no longer found/visible. Only one of the 2 devices is visible in the /dev tree. This breaks/degrades raid-1 configurations.

Downgrading to 5.17.* fixes the issue atm.

The 5.18* kernel throws the error:
"globally duplicate IDs for nsid 1"

The device list under 5.17:
nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme1n1 /dev/ng1n1 20288229000XXXXXX8D2 Force MP600 1 1,00 TB / 1,00 TB 512 B + 0 B EGFM11.3
/dev/nvme0n1 /dev/ng0n1 20288229000XXXXXX8F3 Force MP600 1 1,00 TB / 1,00 TB 512 B + 0 B EGFM11.3
This task depends upon

Closed by  Toolybird (Toolybird)
Saturday, 30 July 2022, 22:13 GMT
Reason for closing:  Fixed
Additional comments about closing:  linux 5.18.13.arch1-1
Comment by loqs (loqs) - Tuesday, 31 May 2022, 09:17 GMT
I think you may need to add a quirk disabling namespace identifiers for those devices [1] similar to [2] the duplicate checking logic was broken before 5.18 [3]
Assuming the devices 1987:5016 which already have an entry which you could adjust as in the attached diff
[4] confirms a quirk needs to be added for devices which break the standard, by reporting the issue upstream.

[1] https://github.com/torvalds/linux/commit/00ff400e6deee00f7b15e200205b2708b63b8cf6
[2] https://github.com/torvalds/linux/commit/a98a945b80f8684121d477ae68ebc01da953da1f
[3] https://github.com/torvalds/linux/commit/e2724cb9f0c406b8fb66efd3aa9e8b3edfd8d5c8
[4]20220603114303.GA14056@lst.de/"> https://lore.kernel.org/linux-nvme/20220603114303.GA14056@lst.de/
Comment by redshoe (redshoe) - Friday, 24 June 2022, 05:53 GMT
Updating the kernel version from 5.18.5 to 5.18.6 did not solve the problem. So, adding the quirk [1] worked for me.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/nvme/host/pci.c?id=c4f01a776b28378f4f61b53f8cb0e358f4fa3721

Comment by Toolybird (Toolybird) - Saturday, 30 July 2022, 03:25 GMT
Candidate for a cherrypick?
Comment by Karsten Elfenbein (Elfe) - Saturday, 30 July 2022, 09:49 GMT
Upstream fix/quirk is included in core/linux-5.18.14.arch1-1
Comment by Toolybird (Toolybird) - Saturday, 30 July 2022, 22:13 GMT
Ahh, thanks. I was looking for the quirk mentioned by @redshoe. That one will end up in a future kernel anyway so I will close this ticket.