FS#72645 - [linux][linux-zen] 5.15 kernel enables `CONFIG_FB_SIMPLEFB` which breaks (some) nvidia platform.

Attached to Project: Arch Linux
Opened by huyizheng (huyizheng) - Saturday, 06 November 2021, 04:27 GMT
Last edited by Jan Alexander Steffens (heftig) - Friday, 12 November 2021, 20:59 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Jan Alexander Steffens (heftig)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 13
Private No

Details

Description:

Arch's official kernel `linux` and `linux-zen` 5.15 enables `CONFIG_FB_SIMPLEFB` option, and disables legacy framebuffer drivers. However, this breaks some nvidia platform. With nvidia graphics cards only, TTY shows black screen and xorg can't start.

Journal:
```
Nov 04 12:14:44 myarch kernel: BUG: unable to handle page fault for address: ffffbde841925000
Nov 04 12:14:44 myarch kernel: #PF: supervisor read access in kernel mode
Nov 04 12:14:44 myarch kernel: #PF: error_code(0x0000) - not-present page
Nov 04 12:14:44 myarch kernel: PGD 100000067 P4D 100000067 PUD 1001be067 PMD 10af01067 PTE 0
Nov 04 12:14:44 myarch kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Nov 04 12:14:44 myarch kernel: CPU: 3 PID: 164 Comm: kworker/3:2 Tainted: P OE 5.15.0-zen1-1-zen #1 7a3d2b2579c7e36cd6f739acebc5bc24ef1ef2ba
Nov 04 12:14:44 myarch kernel: Hardware name: HASEE Computer PB50_70RF,RD,RC /PB50_70RF,RD,RC , BIOS 1.07.04RHZX1 02/01/2019
Nov 04 12:14:44 myarch kernel: Workqueue: events drm_fb_helper_damage_work
Nov 04 12:14:44 myarch kernel: RIP: 0010:memcpy_toio+0x23/0x50
Nov 04 12:14:44 myarch kernel: Code: c6 66 0f 1f 44 00 00 0f 1f 44 00 00 48 85 d2 74 28 40 f6 c7 01 75 33 48 83 fa 01 76 06 40 f6 c7 02 75 1f 48 89 d1 48 c1 e9 02 <f3> a5 f6 c2 02 74 02 66 a5 f6 c2 01 74 01 a4 31 d2 89 d1 89 d6 89
Nov 04 12:14:44 myarch kernel: RSP: 0018:ffffbde8409e7c68 EFLAGS: 00010206
Nov 04 12:14:44 myarch kernel: RAX: 0000000000001400 RBX: ffffbde8413862c0 RCX: 0000000000000068
Nov 04 12:14:44 myarch kernel: RDX: 00000000000002e0 RSI: ffffbde841925000 RDI: ffffbde841386400
Nov 04 12:14:44 myarch kernel: RBP: 0000000000000004 R08: 0000000000380000 R09: 0000000000000004
Nov 04 12:14:44 myarch kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffbde841924ec0
Nov 04 12:14:44 myarch kernel: R13: 00000000000002e0 R14: ffff9f2c433eaf00 R15: 0000000000000010
Nov 04 12:14:44 myarch kernel: FS: 0000000000000000(0000) GS:ffff9f33acac0000(0000) knlGS:0000000000000000
Nov 04 12:14:44 myarch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 04 12:14:44 myarch kernel: CR2: ffffbde841925000 CR3: 000000010145a002 CR4: 00000000003706e0
Nov 04 12:14:44 myarch kernel: Call Trace:
Nov 04 12:14:44 myarch kernel: drm_fb_blit_rect_dstclip+0x11a/0x140
Nov 04 12:14:44 myarch kernel: simpledrm_simple_display_pipe_update+0xc5/0xe0
Nov 04 12:14:44 myarch kernel: drm_atomic_helper_commit_planes+0xc8/0x320
Nov 04 12:14:44 myarch kernel: commit_tail+0x10f/0x2a0
Nov 04 12:14:44 myarch kernel: drm_atomic_helper_commit+0x1e0/0x210
Nov 04 12:14:44 myarch kernel: drm_atomic_helper_dirtyfb+0x1a5/0x280
Nov 04 12:14:44 myarch kernel: drm_fb_helper_damage_work+0x25e/0x330
Nov 04 12:14:44 myarch kernel: process_one_work+0x263/0x460
Nov 04 12:14:44 myarch kernel: ? process_one_work+0x460/0x460
Nov 04 12:14:44 myarch kernel: worker_thread+0x54/0x4e0
Nov 04 12:14:44 myarch kernel: ? process_one_work+0x460/0x460
Nov 04 12:14:44 myarch kernel: kthread+0x1b0/0x1e0
Nov 04 12:14:44 myarch kernel: ? __kthread_init_worker+0x60/0x60
Nov 04 12:14:44 myarch kernel: ret_from_fork+0x22/0x30
Nov 04 12:14:44 myarch kernel: Modules linked in: bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 xhci_pci rtsx_pci crc32c_intel xhci_pci_renesas i8042 serio nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) nvidia(POE)
Nov 04 12:14:44 myarch kernel: CR2: ffffbde841925000
Nov 04 12:14:44 myarch kernel: ---[ end trace 3e73755a330ba568 ]---
Nov 04 12:14:44 myarch kernel: RIP: 0010:memcpy_toio+0x23/0x50
Nov 04 12:14:44 myarch kernel: Code: c6 66 0f 1f 44 00 00 0f 1f 44 00 00 48 85 d2 74 28 40 f6 c7 01 75 33 48 83 fa 01 76 06 40 f6 c7 02 75 1f 48 89 d1 48 c1 e9 02 <f3> a5 f6 c2 02 74 02 66 a5 f6 c2 01 74 01 a4 31 d2 89 d1 89 d6 89
Nov 04 12:14:44 myarch kernel: RSP: 0018:ffffbde8409e7c68 EFLAGS: 00010206
Nov 04 12:14:44 myarch kernel: RAX: 0000000000001400 RBX: ffffbde8413862c0 RCX: 0000000000000068
Nov 04 12:14:44 myarch kernel: RDX: 00000000000002e0 RSI: ffffbde841925000 RDI: ffffbde841386400
Nov 04 12:14:44 myarch kernel: RBP: 0000000000000004 R08: 0000000000380000 R09: 0000000000000004
Nov 04 12:14:44 myarch kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffbde841924ec0
Nov 04 12:14:44 myarch kernel: R13: 00000000000002e0 R14: ffff9f2c433eaf00 R15: 0000000000000010
Nov 04 12:14:44 myarch kernel: FS: 0000000000000000(0000) GS:ffff9f33acac0000(0000) knlGS:0000000000000000
Nov 04 12:14:44 myarch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 04 12:14:44 myarch kernel: CR2: ffffbde841925000 CR3: 000000010145a002 CR4: 00000000003706e0
```

After I change the config like this and rebuild kernel:
```
diff --git a/trunk/config b/trunk/config
index 1c723d3..ec10482 100644
--- a/trunk/config
+++ b/trunk/config
@@ -2311,7 +2311,7 @@ CONFIG_ISCSI_IBFT=m
CONFIG_FW_CFG_SYSFS=m
# CONFIG_FW_CFG_SYSFS_CMDLINE is not set
CONFIG_SYSFB=y
-CONFIG_SYSFB_SIMPLEFB=y
+# CONFIG_SYSFB_SIMPLEFB is not set
CONFIG_GOOGLE_FIRMWARE=y
# CONFIG_GOOGLE_SMI is not set
CONFIG_GOOGLE_COREBOOT_TABLE=m
@@ -6438,8 +6438,8 @@ CONFIG_FB_SYS_IMAGEBLIT=y
# CONFIG_FB_FOREIGN_ENDIAN is not set
CONFIG_FB_SYS_FOPS=y
CONFIG_FB_DEFERRED_IO=y
-# CONFIG_FB_MODE_HELPERS is not set
-# CONFIG_FB_TILEBLITTING is not set
+CONFIG_FB_MODE_HELPERS=y
+CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
@@ -6451,9 +6451,9 @@ CONFIG_FB_DEFERRED_IO=y
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
-# CONFIG_FB_UVESA is not set
-# CONFIG_FB_VESA is not set
-# CONFIG_FB_EFI is not set
+CONFIG_FB_UVESA=m
+CONFIG_FB_VESA=y
+CONFIG_FB_EFI=y
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_OPENCORES is not set
```
Then this bug disappears. TTY can show and xory can start.

Additional info:
* package version(s)
linux 5.15.arch1-1
linux-zen 5.15.zen1-1
* config and/or log files etc.
* link to upstream bug report, if any

Steps to reproduce:

- Try boot linux or linux-zen 5.15 from official repository in a nvidia-only platform.
- Change the config and rebuild kernel, then try boot it in the same platform.
This task depends upon

Closed by  Jan Alexander Steffens (heftig)
Friday, 12 November 2021, 20:59 GMT
Reason for closing:  Fixed
Additional comments about closing:  linux 5.15.2.arch1-1
Comment by Akatsuki Rui (akiirui) - Saturday, 06 November 2021, 13:40 GMT
I have the same issue.
Log: https://fars.ee/1SMO

And the option name is "CONFIG_SYSFB_SIMPLEFB"

---

Update 1: Apply the patch and rebuild kernel, it works fine.

Update 2: The linux 5.15.1.arch1-2 still has the issue.
(https://github.com/archlinux/svntogit-packages/commit/8478ac78542b1cbe5c22d80d1e5d2136c230366a#diff-3e341d2d9c67be01819b25b25d5e53ea3cdf3a38d28846cda85a195eb9b7203a)
Comment by Andreas Flanitzer (the_shiver) - Saturday, 06 November 2021, 19:42 GMT
one more log with the same issue.

despite no video the system seems to be working otherwise e.g. (blind) local login is possible, remoting into the machine lets you even see what you are doing.

with nouveau everything works as expected.




EDIT: gpu is a 1080TI
Comment by zgrim (zgrim) - Saturday, 06 November 2021, 21:58 GMT
I have the same issue as well. No kernel panic/oops/bug though, not using nvidia modesetting, nor a login manager.
The system has just a GeForce RTX 2060 SUPER video card, no other video.
The nvidia driver version does not seem to matter, tried both 470xx and the current (as of this note) 495.44.
Console text is fine. But xorg shows just a black screen (only the mouse pointer is visible).
Applying the patch in here and building a custom kernel fixed the issue for me.
Please revert this config change in the kernel.
Comment by Ronan (ronjouch) - Tuesday, 09 November 2021, 03:29 GMT
I confirm that the framebuffer kernel flags suggested here fix my own issue https://bugs.archlinux.org/task/72658 , "Display of LUKS prompt over external monitor, which was already graphically corrupted in 5.14, regressed to invisible in 5.15, causing boot to appear stuck at 'Loading initial ramdisk ...'".

My machine is a Thinkpad T560 with *no* nvidia hardware: Intel Skylake GT2 / HD Graphics 520 using the i915 driver.

Happy to help trying new builds / compile flags if necessary. Thanks everybody.
Comment by Marius (Martchus) - Tuesday, 09 November 2021, 12:34 GMT
This also affects my Thinkpad with Haswel CPU and Intel HD graphics (which also uses LUKS). In an [Alpine issue](https://gitlab.alpinelinux.org/alpine/aports/-/issues/13165#note_190795) one mentioned that it would also help to add `kernel/drivers/gpu/drm/tiny/simpledrm.ko*` to the initramfs. However, I'm not sure how that would translate to the Arch Linux world (but e.g. `/usr/lib/modules/5.14.16-arch1-1/kernel/drivers/gpu/drm/tiny/simpledrm.ko.zst` seems to exist similarly and possibly one can add some mkinitcpio hook for it).
Comment by Jan Alexander Steffens (heftig) - Tuesday, 09 November 2021, 17:19 GMT
Any improvement in linux 5.15.1.arch1-2 ?
Comment by Jan Alexander Steffens (heftig) - Tuesday, 09 November 2021, 17:45 GMT
@Martchus You should try putting i915 into the initramfs ( MODULES=(i915) in mkinitcpio.conf )
Comment by Marius (Martchus) - Tuesday, 09 November 2021, 17:48 GMT
I'll try.
Comment by Marius (Martchus) - Tuesday, 09 November 2021, 18:02 GMT
That helped, thanks.
Comment by Jan Alexander Steffens (heftig) - Tuesday, 09 November 2021, 18:12 GMT
@huyizheng Does the issue disappear when you avoid loading the nvidia driver?
Comment by Akatsuki Rui (akiirui) - Tuesday, 09 November 2021, 18:15 GMT
@heftig the 5.15.1.arch1-2 hasn't any improvement for NVIDIA GPU only devices.

I have try add "nvidia nvidia-drm nvidia-modeset nvidia-uvm" to mkinitramfs.conf, but it still not works.

Nov 09 17:59:42 akii kernel: BUG: unable to handle page fault for address: ffffac1d51fa5000
Nov 09 17:59:42 akii kernel: #PF: supervisor read access in kernel mode
Nov 09 17:59:42 akii kernel: #PF: error_code(0x0000) - not-present page
Nov 09 17:59:42 akii kernel: PGD 100000067 P4D 100000067 PUD 1001b7067 PMD 1149d2067 PTE 0
Nov 09 17:59:42 akii kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Nov 09 17:59:42 akii kernel: CPU: 5 PID: 147 Comm: kworker/5:1 Tainted: P OE 5.15.1-zen1-2-zen #1 acdbcbea2d566f8d6b435a0de3f8fd7eb5b26dde
Nov 09 17:59:42 akii kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C94/MAG B550M MORTAR WIFI (MS-7C94), BIOS 1.94 09/23/2021
Nov 09 17:59:42 akii kernel: Workqueue: events drm_fb_helper_damage_work
Nov 09 17:59:42 akii kernel: RIP: 0010:memcpy_toio+0x23/0x50
Nov 09 17:59:42 akii kernel: Code: c6 66 0f 1f 44 00 00 0f 1f 44 00 00 48 85 d2 74 28 40 f6 c7 01 75 33 48 83 fa 01 76 06 40 f6 c7 02 75 1f 48 89 d1 48 c1 e9 02 <f3> a5 f6 c2 02 74 02 66 a5 f6 c2 01 74 01 a4 31 d2 89 d1 89 d6 89
Nov 09 17:59:42 akii kernel: RSP: 0018:ffffac1d406a3c68 EFLAGS: 00010216
Nov 09 17:59:42 akii kernel: RAX: 0000000000003c00 RBX: ffffac1d46004000 RCX: 0000000000000e00
Nov 09 17:59:42 akii kernel: RDX: 0000000000003c00 RSI: ffffac1d51fa5000 RDI: ffffac1d46004400
Nov 09 17:59:42 akii kernel: RBP: 0000000000000182 R08: 0000000001a00000 R09: 0000000000000004
Nov 09 17:59:42 akii kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffac1d51fa4c00
Nov 09 17:59:42 akii kernel: R13: 0000000000003c00 R14: ffff8df084b95400 R15: 00000000000001c0
Nov 09 17:59:42 akii kernel: FS: 0000000000000000(0000) GS:ffff8df78eb40000(0000) knlGS:0000000000000000
Nov 09 17:59:42 akii kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 09 17:59:42 akii kernel: CR2: ffffac1d51fa5000 CR3: 000000019ce10000 CR4: 0000000000750ee0
Nov 09 17:59:42 akii kernel: PKRU: 55555554
Nov 09 17:59:42 akii kernel: Call Trace:
Nov 09 17:59:42 akii kernel: drm_fb_blit_rect_dstclip+0x11a/0x140
Nov 09 17:59:42 akii kernel: simpledrm_simple_display_pipe_update+0xc5/0xe0
Nov 09 17:59:42 akii kernel: drm_atomic_helper_commit_planes+0xc8/0x320
Nov 09 17:59:42 akii kernel: commit_tail+0x10f/0x2a0
Nov 09 17:59:42 akii kernel: drm_atomic_helper_commit+0x1e0/0x210
Nov 09 17:59:42 akii kernel: drm_atomic_helper_dirtyfb+0x1a5/0x280
Nov 09 17:59:42 akii kernel: drm_fb_helper_damage_work+0x25e/0x330
Nov 09 17:59:42 akii kernel: process_one_work+0x263/0x460
Nov 09 17:59:42 akii kernel: worker_thread+0x54/0x4e0
Nov 09 17:59:42 akii kernel: ? process_one_work+0x460/0x460
Nov 09 17:59:42 akii kernel: kthread+0x1b0/0x1e0
Nov 09 17:59:42 akii kernel: ? __kthread_init_worker+0x60/0x60
Nov 09 17:59:42 akii kernel: ret_from_fork+0x22/0x30
Nov 09 17:59:42 akii kernel: Modules linked in: cmac algif_hash algif_skcipher af_alg bnep nvidia_drm(POE) nvidia_modeset(POE) snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common iwlmvm snd_hda_intel snd_intel_dspcfg edac_mce_amd snd_intel_sdw_acpi snd_hda_co>
Nov 09 17:59:42 akii kernel: xhci_pci tpm xhci_pci_renesas rng_core
Nov 09 17:59:42 akii kernel: CR2: ffffac1d51fa5000
Nov 09 17:59:42 akii kernel: ---[ end trace c38d0d7d7e5606ea ]---
Nov 09 17:59:42 akii kernel: RIP: 0010:memcpy_toio+0x23/0x50
Nov 09 17:59:42 akii kernel: Code: c6 66 0f 1f 44 00 00 0f 1f 44 00 00 48 85 d2 74 28 40 f6 c7 01 75 33 48 83 fa 01 76 06 40 f6 c7 02 75 1f 48 89 d1 48 c1 e9 02 <f3> a5 f6 c2 02 74 02 66 a5 f6 c2 01 74 01 a4 31 d2 89 d1 89 d6 89
Nov 09 17:59:42 akii kernel: RSP: 0018:ffffac1d406a3c68 EFLAGS: 00010216
Nov 09 17:59:42 akii kernel: RAX: 0000000000003c00 RBX: ffffac1d46004000 RCX: 0000000000000e00
Nov 09 17:59:42 akii kernel: RDX: 0000000000003c00 RSI: ffffac1d51fa5000 RDI: ffffac1d46004400
Nov 09 17:59:42 akii kernel: RBP: 0000000000000182 R08: 0000000001a00000 R09: 0000000000000004
Nov 09 17:59:42 akii kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffac1d51fa4c00
Nov 09 17:59:42 akii kernel: R13: 0000000000003c00 R14: ffff8df084b95400 R15: 00000000000001c0
Nov 09 17:59:42 akii kernel: FS: 0000000000000000(0000) GS:ffff8df78eb40000(0000) knlGS:0000000000000000
Nov 09 17:59:42 akii kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 09 17:59:42 akii kernel: CR2: ffffac1d51fa5000 CR3: 000000019ce10000 CR4: 0000000000750ee0
Nov 09 17:59:42 akii kernel: PKRU: 55555554



Comment by Marius (Martchus) - Tuesday, 09 November 2021, 18:15 GMT
Unfortunately it didn't actually help. I've just had an SD card (which contains the key) inserted - I've been using that as a workaround and accidentally forgot to remove it for the test (see https://bugs.archlinux.org/task/72658).
Comment by Akatsuki Rui (akiirui) - Tuesday, 09 November 2021, 18:24 GMT
@heftig
> Does the issue disappear when you avoid loading the nvidia driver?

Yep, it works with nouveau.

---

Update: To use nouveau, Xorg and TTY works, but in LUKS, the cursor isn't displayed. (I'm using sd-encrypt)
Comment by huyizheng (huyizheng) - Wednesday, 10 November 2021, 02:37 GMT
linux 5.15.1.arch1-2 doesn't help.

Before this bug report, I actually compiled several kernel with different configs. What I found is , both "unset CONFIG_SYSFB_SIMPLEFB" and "enable FB driver" is required to make my nvidia platform work. Only "unset CONFIG_SYSFB_SIMPLEFB" or only "enable FB driver"(what 5.15.1.arch1-2 does) doesn't help.

Update: I compiled some more configs, and found out that, based on 5.15.1.arch1-2 (which enables FB driver), both "unset CONFIG_SYSFB_SIMPLEFB" or "change CONFIG_SYSFB_SIMPLEFB from y to m" can solve this issue.
Comment by João O. Santos (Joao-O-Santos) - Wednesday, 10 November 2021, 09:47 GMT
Thank you so much @huyinzheng for reporting the issue and for figuring out how to fix it!

I didn't find your issue when I was looking for similar problems so I may have unknowingly opened a duplicate:  FS#72678 
Still, I believe some of you could login blindly but I don't believe my system does that.

Also, has the Arch Linux Team communicated if they are willing to consider making that change to the official config (i.e., moving that setting to module, CONFIG_SYSFP_SIMPLEFB=m)?
Comment by Jan Alexander Steffens (heftig) - Wednesday, 10 November 2021, 09:58 GMT
SYSFB_SIMPLEFB is not a tristate, only a bool, so it cannot be a module.

It will be disabled with the next release.
Comment by Marius (Martchus) - Wednesday, 10 November 2021, 10:01 GMT
> Still, I believe some of you could login blindly but I don't believe my system does that.

Same here (although storing the key on a SD card to avoid the necessity of the prompt helps).

I assume you're also using the Intel graphics card of the ASUS VivoBook S530U laptop? In my case it is definitely the Intel graphics card which is affected because my Laptop has no additional graphics card.

> It will be disabled with the next release.

Thanks, I suppose that's the only fix which will work for now.
Comment by João O. Santos (Joao-O-Santos) - Wednesday, 10 November 2021, 11:27 GMT
Thank you so much for the quick reply, and for your willingness to make the change @heftig

If there's anything I can do to test related changes in upcoming kernel versions, just let me know.

Loading...