Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#80163 - [linux] NVMe controller crash WD SN770

Attached to Project: Arch Linux
Opened by ldare373 (ldare373) - Saturday, 04 November 2023, 20:23 GMT
Last edited by Toolybird (Toolybird) - Wednesday, 22 November 2023, 01:06 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To No-one
Architecture x86_64
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:
NVME controller crashes during IO-intensive tasks (compilation, large copies), causing the entire drive to stop working until the system is rebooted.
The drive's temperature seems fine right before it happens, and nothing shows up in "nvme smart-log" or "nvme error-log".
The issue has been present in kernels going back at least 6 months, but I don't remember which kernel version first had it.
I copied everything over to a Samsung 980 500GB and did not have any problems with it.

I've tried the following kernel parameters without success:
1. pcie_aspm=off nvme_core.default_ps_max_latency=0
2. #1 + iommu=soft
3. #1 + amd_iommu=off
4. #1 + amd_iommu=fullflush

Hardware:
AMD Ryzen 5 7600x
AMD B650 Chipset
WD_BLACK SN770 500GB

Packages:
linux 6.5.9-arch2-1
linux-firmware 20231030.2b304bfe-1

Attached:
dmesg -Tw > dmesg.txt
nvme id-ctrl -H /dev/nvme0 >nvme.txt
nvme error-log /dev/nvme0 > nvme-error.txt
nvme smart-log -H /dev/nvme0 > nvme-smart.txt
This task depends upon

Closed by  Toolybird (Toolybird)
Wednesday, 22 November 2023, 01:06 GMT
Reason for closing:  Upstream
Additional comments about closing:  If still an issue, please contact upstream as per the comments.
Comment by Toolybird (Toolybird) - Saturday, 04 November 2023, 22:02 GMT
This could easily be a hardware issue. Either way, it's clearly not an Arch packaging issue which means your best bet is reporting upstream to the relevant kernel folks, probably [1]. You might have seen this [2] already. And by leaving it for 6 months, you might have lost the chance to more easily identify a kernel regression via git-bisect [3]. Anyway, please let us know what you find out.

[1] https://lore.kernel.org/linux-nvme/
[2] https://lore.kernel.org/all/0SKHTR.QZGFTLD3Z8E01%40lyndeno.ca/
[3] https://wiki.archlinux.org/title/Kernel#Debugging_regressions
Comment by ldare373 (ldare373) - Saturday, 04 November 2023, 22:51 GMT
Alright, I'll report it upstream and link here once I can narrow down where the issue was introduced. Thanks!
Comment by agapito fernandez (agapito) - Saturday, 04 November 2023, 23:57 GMT
Is the WD_BLACK SN770 your main drive connected to the first slot of your MB?
Comment by ldare373 (ldare373) - Sunday, 05 November 2023, 00:03 GMT
>Is the WD_BLACK SN770 your main drive connected to the first slot of your MB?
It used to have root on it, but I moved it to the Samsung drive and the WD one now has some other partitions on it. The WD drive is still in the first slot.
Comment by agapito fernandez (agapito) - Sunday, 05 November 2023, 09:06 GMT
I'm sure if you swap hard drives, you'll have the same problem on the other one because your CPU needs more voltage at some point. The first slot is controlled by the CPU and not the chipset.
Comment by ldare373 (ldare373) - Friday, 10 November 2023, 08:50 GMT
I tried swapping the drives, and I continued to have issues with the same drive.
I also have tried kernels going back to 2020 with no luck, so I'm going to try downgrading linux-firmware.
Comment by agapito fernandez (agapito) - Friday, 10 November 2023, 16:35 GMT
Was your Samsung drive fine in the first slot?

Have u tried WD drive on Windows?
Comment by ldare373 (ldare373) - Sunday, 12 November 2023, 06:18 GMT
I haven't had any problems with the Samsung drive in either slot, and I have not tried the WD drive with Windows.

Loading...