FS#79439 - [linux] 6.4.11 rtsx driver bug prevents booting in some cases
Attached to Project:
Arch Linux
Opened by Gene (GeneC) - Tuesday, 22 August 2023, 12:03 GMT
Last edited by Jan Alexander Steffens (heftig) - Thursday, 21 September 2023, 19:52 GMT
Opened by Gene (GeneC) - Tuesday, 22 August 2023, 12:03 GMT
Last edited by Jan Alexander Steffens (heftig) - Thursday, 21 September 2023, 19:52 GMT
|
Details
linux kernel 6.4.11
There is a bug with the rtsx driver in 6.4.11 that can cause boot to fail on machines with some hardware that need the rtsx driver. In my case it presented as NVME failure and thus prevented machine from booting. 6.4.10 is fine. The hardware that triggers this is : 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01) Work around: blacklist the driver (its only a card reader). i.e. Add blacklist rtsx_pci blacklist rtsx_pci_sdmmc to /etc/modprobe.d/blacklist_rtsx.conf and rebuild initramfs. More details are available on lkml including the git bisect: https://lkml.org/lkml/2023/8/16/1183 As of now there is no upstream fix or revert that I am aware of. Should we revert commit 69304c8d285b77c9a56d68f5ddb2558f27abf406 until this is fixed upstream? |
This task depends upon
Closed by Jan Alexander Steffens (heftig)
Thursday, 21 September 2023, 19:52 GMT
Reason for closing: Fixed
Additional comments about closing: linux 6.5.4.arch2-1
Thursday, 21 September 2023, 19:52 GMT
Reason for closing: Fixed
Additional comments about closing: linux 6.5.4.arch2-1
Downgrading both kernels clears it for now for me.
[1] https://bbs.archlinux.org/viewtopic.php?id=288095
[2] https://linux-regtracking.leemhuis.info/regzbot/mainline/
$ uname -r
6.4.11-arch2-1
$ lspci -kd::ff00
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)
Subsystem: Dell RTS525A PCI Express Card Reader
Kernel driver in use: rtsx_pci
Kernel modules: rtsx_pci
$ journalctl -b --no-hostname -g rtsx
Aug 22 14:07:51 kernel: rtsx_pci 0000:03:00.0: enabling device (0000 -> 0002)
Does the machine use nvme for root?
$ findmnt -rvno source /
/dev/nvme0n1p5
$ lspci -kd::108
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
Subsystem: Samsung Electronics Co Ltd SSD 970 EVO
Kernel driver in use: nvme
Kernel modules: nvme
Glad you're not affected :)
$ grep model.name /proc/cpuinfo| uniq
model name : Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
It might have to do with the nvme model.
:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961/SM963
Lenovo ThinkPad T14 Gen 3, model 21CF000KUS
---
$ findmnt -rvno source /
/dev/nvme0n1p3
$ lspci -kd::108
03:00.0 Non-Volatile memory controller: SK hynix Platinum P41/PC801 NVMe Solid State Drive
Subsystem: SK hynix Platinum P41/PC801 NVMe Solid State Drive
Kernel driver in use: nvme
Kernel modules: nvme
https://bbs.archlinux.org/viewtopic.php?id=288092
https://bbs.archlinux.org/viewtopic.php?id=288140
https://bbs.archlinux.org/viewtopic.php?id=288152
https://bbs.archlinux.org/viewtopic.php?id=288177
https://bbs.archlinux.org/viewtopic.php?id=288182
Appreciate all the work on this.
$ grep -H $ /sys/devices/virtual/dmi/id/{product_name,board_{name,version},bios_{date,version}}
/sys/devices/virtual/dmi/id/product_name:XPS 15 9560
/sys/devices/virtual/dmi/id/board_name:05FFDN
/sys/devices/virtual/dmi/id/board_version:A00
/sys/devices/virtual/dmi/id/bios_date:11/10/2022
/sys/devices/virtual/dmi/id/bios_version:1.31.0
$ grep -H $ /sys/class/block/nvme0n1/device/{model,firmware_rev}
/sys/class/block/nvme0n1/device/model:Samsung SSD 970 EVO 1TB
/sys/class/block/nvme0n1/device/firmware_rev:2B2QEXE7
Model PC401 NVMe SK hynix 512GB
512.11 GB
FW Rev - 80002E00
nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
[1] https://lore.kernel.org/all/fa82d9dcbe83403abc644c20922b47f9%40realtek.com/
[2] https://bugzilla.kernel.org/show_bug.cgi?id=217802#c4
Hangs at running hooks [udev] and then does not recognize storage devices. After a few seconds drops me to an emergency shell.
thanks
- stock Toshiba (Dell OEM) XG4 1TB
- Crucial P3 1TB
- Sabrent Rocket 4 (SB-ROCKET-NVMe4) 1TB
pcie_aspm=off doesn't seem to do anything ftr.
Also, fwiw, my (functional) xps 9560 is using the systemd hooks instead of busybox:
$ grep ^HOOKS /etc/mkinitcpio.conf
HOOKS=(base systemd autodetect modconf kms keyboard block filesystems resume fsck)
[1] https://lore.kernel.org/lkml/fa82d9dcbe83403abc644c20922b47f9%40realtek.com/
[2] https://fwupd.org/lvfs/devices/com.dell.uefi34578c72.firmware
I initially experienced the issue with bios 1.29.0, but I upgraded to 1.31.0 and the issue is still there.
For reference I tested 6.4.12.zen1-1 which should be the latest linux-zen version; the known good version I use is 6.4.10.zen2-1.
Regarding systemd hooks: I'm probably also using those, but I'm using booster instead of mkinitcpio.
booster provides an alternative init for the initrd stage, so it doesn't run systemd. However I really don't expect the initramfs to make any difference here so I wouldn't worry too much about it. Just thought I'd record that info in case, since my 9560 is apparently the only working one.
I only noticed this bug while checking the tracker because I was affected by https://bugs.archlinux.org/task/79366 also in 6.4.11, but not on this laptop.
FS#79427FS#79427. The issue persists with linux 6.5.2.arch1-1. Consider increasing severity and/or priority.If you want to hasten the process please pursue it upstream. There has been no response to this request for a status update [1]. You could also submit the revert upstream yourself as although the commit may be technically correct it breaks existing behavior.
In regards to
FS#79427the boot failure was caused by this issue. There was a separate tpm issue that was visible on the console which has been resolved.[1] https://lore.kernel.org/lkml/5d38cf11-114a-4997-a0fc-4627402468f8%40sapience.com/
https://bugzilla.kernel.org/show_bug.cgi?id=217802
or to lkml mailing list as per [1] of @loqs previoius comment.
Thanks.
[1] https://wiki.archlinux.org/title/Unofficial_user_repositories#miffe
Lines 1329-1333
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/misc/cardreader/rtsx_pcr.c?h=v6.6-rc1
https://drive.google.com/file/d/1vNU6AIfflnsjuUtF5zOM-v15eN_5JCbC/view?usp=sharing linux-6.5.3.arch1-1.4-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1YbcANIaA0GpGLspn4jYCUWWMujP9N5CM/view?usp=sharing linux-headers-6.5.3.arch1-1.4-x86_64.pkg.tar.zst
[1] https://lore.kernel.org/regressions/995632624f0e4d26b73fb934a8eeaebc%40realtek.com/