FS#74886 - [nvidia] traps: Missing ENDBR with Linux 5.18.0-arch1

Attached to Project: Arch Linux
Opened by sven (commonuser) - Saturday, 28 May 2022, 20:29 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Monday, 06 June 2022, 15:37 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Sven-Hendrik Haase (Svenstaro)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 10
Private No

Details

Description:

Module loading fails with kernel error "Missing ENDBR".


Additional info:

> May 28 22:17:38 kernel: nvidia: loading out-of-tree module taints kernel.
> May 28 22:17:38 kernel: nvidia: module license 'NVIDIA' taints kernel.
> May 28 22:17:38 kernel: Disabling lock debugging due to kernel taint
> May 28 22:17:38 kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
> May 28 22:17:38 kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 507
> May 28 22:17:38 kernel:
> May 28 22:17:38 kernel: traps: Missing ENDBR: _nv011430rm+0x0/0x10 [nvidia]
> May 28 22:17:38 kernel: ------------[ cut here ]------------
> May 28 22:17:38 kernel: kernel BUG at arch/x86/kernel/traps.c:252!
> May 28 22:17:38 kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> May 28 22:17:38 kernel: CPU: 13 PID: 528 Comm: systemd-modules Tainted: P OE 5.18.0-arch1-1 #1 b71a70fe104889aac2f32556bc52f649da2881d2
> May 28 22:17:38 kernel: Hardware name: Dell Inc. XPS 15 9510/01V4T3, BIOS 1.9.0 03/17/2022
> May 28 22:17:38 kernel: RIP: 0010:exc_control_protection+0xc2/0xd0
> May 28 22:17:38 kernel: Code: 8b 93 80 00 00 00 be f9 00 00 00 48 c7 c7 d3 ab e6 9f e8 d1 01 50 ff e9 72 ff ff ff 48 c7 c7 ba ab e6 9f e8 c7 31 fb ff 0f 0b <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 66 0f 1f 00 55 53 48 89
> May 28 22:17:38 kernel: RSP: 0018:ffffb7f280ef7b48 EFLAGS: 00010002
> May 28 22:17:38 kernel: RAX: 0000000000000033 RBX: ffffb7f280ef7b68 RCX: 0000000000000027
> May 28 22:17:38 kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff93176f7616a0
> May 28 22:17:38 kernel: RBP: 0000000000000003 R08: 0000000000000000 R09: ffffb7f280ef7968
> May 28 22:17:38 kernel: R10: 0000000000000003 R11: ffffffffa06caa08 R12: 0000000000000000
> May 28 22:17:38 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> May 28 22:17:38 kernel: FS: 00007faac0cf8380(0000) GS:ffff93176f740000(0000) knlGS:0000000000000000
> May 28 22:17:38 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> May 28 22:17:38 kernel: CR2: 0000564ced0d4000 CR3: 0000000109ef0006 CR4: 0000000000f70ee0
> May 28 22:17:38 kernel: PKRU: 55555554
> May 28 22:17:38 kernel: Call Trace:
> May 28 22:17:38 kernel: <TASK>
> May 28 22:17:38 kernel: asm_exc_control_protection+0x22/0x30
> May 28 22:17:38 kernel: RIP: 0010:_nv011430rm+0x0/0x10 [nvidia]
> May 28 22:17:38 kernel: Code: 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 08 e8 07 0f 1e 00 48 83 c4 08 48 89 c7 e9 bb ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 <48> 89 f7 e9 18 08 00 00 0f 1f 84 00 00 00 00 00 48 89 f7 e9 18 08
> May 28 22:17:38 kernel: RSP: 0018:ffffb7f280ef7c10 EFLAGS: 00010202
> May 28 22:17:38 kernel: RAX: ffffffffc25e90e0 RBX: ffffffffc46e2b10 RCX: 0000000000000000
> May 28 22:17:38 kernel: RDX: 0000000000043187 RSI: 0000000000000010 RDI: ffffffffc46e2b10
> May 28 22:17:38 kernel: RBP: ffff931042b6dfe0 R08: 0000000000000020 R09: ffffffffc46e2b50
> May 28 22:17:38 kernel: R10: 0000000000039688 R11: ffff93178f7fa000 R12: 0000000000000010
> May 28 22:17:38 kernel: R13: ffff931042b6b000 R14: 00007faac158332c R15: ffffb7f280ef7d80
> May 28 22:17:38 kernel: ? _nv034888rm+0x20/0x20 [nvidia 41a8e80d4727066c67f87d1723f6a7740a16e698]
> May 28 22:17:38 kernel: _nv011428rm+0x24/0xe0 [nvidia 41a8e80d4727066c67f87d1723f6a7740a16e698]
> May 28 22:17:38 kernel: _nv034889rm+0xe/0xa0 [nvidia 41a8e80d4727066c67f87d1723f6a7740a16e698]
> May 28 22:17:38 kernel: _nv034892rm+0x1d/0x30 [nvidia 41a8e80d4727066c67f87d1723f6a7740a16e698]
> May 28 22:17:38 kernel: _nv034894rm+0x2f/0x40 [nvidia 41a8e80d4727066c67f87d1723f6a7740a16e698]
> May 28 22:17:38 kernel: _nv015562rm+0x15/0x70 [nvidia 41a8e80d4727066c67f87d1723f6a7740a16e698]
> May 28 22:17:38 kernel: _nv000644rm+0x9/0x20 [nvidia 41a8e80d4727066c67f87d1723f6a7740a16e698]
> May 28 22:17:38 kernel: ? cdev_add+0x4d/0x60
> May 28 22:17:38 kernel: rm_init_rm+0x17/0x60 [nvidia 41a8e80d4727066c67f87d1723f6a7740a16e698]
> May 28 22:17:38 kernel: nvidia_init_module+0x22e/0x5b0 [nvidia 41a8e80d4727066c67f87d1723f6a7740a16e698]
> May 28 22:17:38 kernel: ? nvidia_init_module+0x5b0/0x5b0 [nvidia 41a8e80d4727066c67f87d1723f6a7740a16e698]
> May 28 22:17:38 kernel: nvidia_frontend_init_module+0x50/0x91 [nvidia 41a8e80d4727066c67f87d1723f6a7740a16e698]
> May 28 22:17:38 kernel: ? nvidia_init_module+0x5b0/0x5b0 [nvidia 41a8e80d4727066c67f87d1723f6a7740a16e698]
> May 28 22:17:38 kernel: do_one_initcall+0x5a/0x220
> May 28 22:17:38 kernel: do_init_module+0x4a/0x240
> May 28 22:17:38 kernel: __do_sys_init_module+0x138/0x1b0
> May 28 22:17:38 kernel: do_syscall_64+0x5c/0x90
> May 28 22:17:38 kernel: ? syscall_exit_to_user_mode+0x26/0x50
> May 28 22:17:38 kernel: ? do_syscall_64+0x6b/0x90
> May 28 22:17:38 kernel: ? handle_mm_fault+0xb2/0x280
> May 28 22:17:38 kernel: ? do_user_addr_fault+0x1db/0x680
> May 28 22:17:38 kernel: ? do_syscall_64+0x6b/0x90
> May 28 22:17:38 kernel: ? exc_page_fault+0x74/0x170
> May 28 22:17:38 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
> May 28 22:17:38 kernel: iwlwifi 0000:00:14.3 wlp0s20f3: renamed from wlan0
> May 28 22:17:38 kernel: RIP: 0033:0x7faac0f12c3e
> May 28 22:17:38 kernel: Code: 48 8b 0d 5d b1 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2a b1 0e 00 f7 d8 64 89 01 48
> May 28 22:17:38 kernel: RSP: 002b:00007fff1f730d08 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> May 28 22:17:38 kernel: RAX: ffffffffffffffda RBX: 000055cd28f8eb10 RCX: 00007faac0f12c3e
> May 28 22:17:38 kernel: RDX: 00007faac158332c RSI: 0000000003bb4e40 RDI: 00007faaba54c010
> May 28 22:17:38 kernel: RBP: 00007faaba54c010 R08: 000055cd28f8ea10 R09: 0000000000000000
> May 28 22:17:38 kernel: R10: 0000000000000005 R11: 0000000000000246 R12: 00007faac158332c
> May 28 22:17:38 kernel: R13: 000055cd28f8ecc0 R14: 000055cd28f8e7c0 R15: 000055cd28f938c0
> May 28 22:17:38 kernel: </TASK>
> May 28 22:17:38 kernel: Modules linked in: mousedev snd_sof acpi_cpufreq(-) kfifo_buf snd_sof_utils hid_sensor_iio_common snd_soc_hdac_hda industrialio snd_hda_ext_core snd_ctl_led snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_realtek(+) hid_sensor_hub soundwire_bus nvidia(POE+) intel_ishtp_hid hid_mul>
> May 28 22:17:38 kernel: intel_lpss_pci processor_thermal_device psmouse videobuf2_common intel_lpss btbcm pcspkr snd i2c_i801 spi_intel_pci processor_thermal_rfim btmtk btintel spi_intel i2c_smbus soundcore i915 mei_me idma64 cfg80211 videodev processor_thermal_mbox drm_buddy ucsi_acpi mc bluetooth processo>
> May 28 22:17:38 kernel: vivaldi_fmap crc32_pclmul crc32c_intel ghash_clmulni_intel tpm_tis nvme aesni_intel tpm_tis_core crypto_simd tpm xhci_pci cryptd nvme_core rng_core rtsx_pci xhci_pci_renesas i8042 serio
> May 28 22:17:38 kernel: ---[ end trace 0000000000000000 ]---
> May 28 22:17:38 kernel: RIP: 0010:exc_control_protection+0xc2/0xd0
> May 28 22:17:38 kernel: Code: 8b 93 80 00 00 00 be f9 00 00 00 48 c7 c7 d3 ab e6 9f e8 d1 01 50 ff e9 72 ff ff ff 48 c7 c7 ba ab e6 9f e8 c7 31 fb ff 0f 0b <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 66 0f 1f 00 55 53 48 89
> May 28 22:17:38 kernel: RSP: 0018:ffffb7f280ef7b48 EFLAGS: 00010002
> May 28 22:17:38 kernel: RAX: 0000000000000033 RBX: ffffb7f280ef7b68 RCX: 0000000000000027
> May 28 22:17:38 kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff93176f7616a0
> May 28 22:17:38 kernel: RBP: 0000000000000003 R08: 0000000000000000 R09: ffffb7f280ef7968
> May 28 22:17:38 kernel: R10: 0000000000000003 R11: ffffffffa06caa08 R12: 0000000000000000
> May 28 22:17:38 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> May 28 22:17:38 kernel: FS: 00007faac0cf8380(0000) GS:ffff93176f740000(0000) knlGS:0000000000000000
> May 28 22:17:38 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> May 28 22:17:38 kernel: CR2: 0000564ced0d4000 CR3: 0000000109ef0006 CR4: 0000000000f70ee0
> May 28 22:17:38 kernel: PKRU: 55555554
> May 28 22:17:38 systemd[1]: systemd-modules-load.service: Main process exited, code=killed, status=11/SEGV
> May 28 22:17:38 systemd[1]: systemd-modules-load.service: Failed with result 'signal'.
> May 28 22:17:38 systemd[1]: Failed to start Load Kernel Modules.



Steps to reproduce:
Reboot with current Linux and nivida package.
This task depends upon

Closed by  Sven-Hendrik Haase (Svenstaro)
Monday, 06 June 2022, 15:37 GMT
Reason for closing:  Fixed
Comment by sven (commonuser) - Saturday, 28 May 2022, 20:30 GMT
Works fine with Linux 5.17.9-arch1-1.
Comment by Christian (Darius) - Saturday, 28 May 2022, 21:45 GMT
Have it as well, no booting into graphical terminal, no modules loaded. Quite don't understand why is it only LOW?

Mai 28 23:07:00 arch-precission kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 509
Mai 28 23:07:00 arch-precission kernel:
Mai 28 23:07:00 arch-precission kernel: traps: Missing ENDBR: _nv011430rm+0x0/0x10 [nvidia]
Mai 28 23:07:00 arch-precission kernel: ------------[ cut here ]------------
Mai 28 23:07:00 arch-precission kernel: kernel BUG at arch/x86/kernel/traps.c:252!
Mai 28 23:07:00 arch-precission kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
Mai 28 23:07:00 arch-precission kernel: CPU: 0 PID: 345 Comm: systemd-modules Tainted: P OE 5.18.0-arch1-1 #1 b71a70fe1048>
Mai 28 23:07:00 arch-precission kernel: usb 5-2.3.3: New USB device found, idVendor=2109, idProduct=2813, bcdDevice=90.01
Mai 28 23:07:00 arch-precission kernel: Hardware name: Dell Inc. Precision 5760/04NVXT, BIOS 1.6.0 12/10/2021
Mai 28 23:07:00 arch-precission kernel: RIP: 0010:exc_control_protection+0xc2/0xd0
Mai 28 23:07:00 arch-precission kernel: Code: 8b 93 80 00 00 00 be f9 00 00 00 48 c7 c7 d3 ab 66 b0 e8 d1 01 50 ff e9 72 ff ff ff 48 c7 >
Mai 28 23:07:00 arch-precission kernel: usb 5-2.3.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Mai 28 23:07:00 arch-precission kernel: RSP: 0018:ffffbb0e4110fc28 EFLAGS: 00010002
Mai 28 23:07:00 arch-precission kernel: RAX: 0000000000000033 RBX: ffffbb0e4110fc48 RCX: 0000000000000027
Mai 28 23:07:00 arch-precission kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9eb30f4216a0
Mai 28 23:07:00 arch-precission kernel: RBP: 0000000000000003 R08: 0000000000000000 R09: ffffbb0e4110fa48
Mai 28 23:07:00 arch-precission kernel: R10: 0000000000000003 R11: ffffffffb0ecaa08 R12: 0000000000000000
Mai 28 23:07:00 arch-precission kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Mai 28 23:07:00 arch-precission kernel: FS: 00007f8895148380(0000) GS:ffff9eb30f400000(0000) knlGS:0000000000000000
Mai 28 23:07:00 arch-precission kernel: usb 5-2.3.3: Product: USB2.0 Hub
Mai 28 23:07:00 arch-precission kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mai 28 23:07:00 arch-precission kernel: CR2: 000055a27fec80c0 CR3: 0000000105518003 CR4: 0000000000f70ef0
Mai 28 23:07:00 arch-precission kernel: PKRU: 55555554
Mai 28 23:07:00 arch-precission kernel: Call Trace:
Mai 28 23:07:00 arch-precission kernel: <TASK>
Mai 28 23:07:00 arch-precission kernel: usb 5-2.3.3: Manufacturer: VIA Labs, Inc.
Mai 28 23:07:00 arch-precission kernel: asm_exc_control_protection+0x22/0x30
Mai 28 23:07:00 arch-precission kernel: RIP: 0010:_nv011430rm+0x0/0x10 [nvidia]



Comment by loqs (loqs) - Saturday, 28 May 2022, 21:50 GMT
See https://bbs.archlinux.org/viewtopic.php?pid=2037802#p2037802 for options to work around the modules lack of IBT support
Edit:
https://github.com/NVIDIA/open-gpu-kernel-modules/issues/256
Edit2:
Based off open-gpu-kernel-modules hopefully it is as simple as
fcf-protection=none being set in two Makefiles and switching it to fcf-protection=branch plus adding in -mharden-sls=all for straight line speculation at least makes objtool happy.
Comment by loqs (loqs) - Monday, 30 May 2022, 14:09 GMT
Please test the attached source bundle.
Comment by loqs (loqs) - Monday, 30 May 2022, 14:21 GMT
Added a missing -mindirect-branch-register to nvidia-modeset/Makefile
Comment by Christian (Darius) - Monday, 30 May 2022, 19:32 GMT
Thanks a lot loqs for your great efforts. I build it it and seem to work now with 5.18.0-arch1-1 (removed ibt=off again )
Comment by Sven-Hendrik Haase (Svenstaro) - Monday, 06 June 2022, 00:12 GMT
I can patch the package with this for the time being but I'd be really happy if someone went ahead and upstreamed the patch so we don't have to keep it around in the package.
Comment by Sven-Hendrik Haase (Svenstaro) - Monday, 06 June 2022, 00:39 GMT
I pushed some patched packages to [testing], gimme some feedback on those.
Comment by Christian (Darius) - Monday, 06 June 2022, 05:37 GMT
5.18.1-arch1-1
nvidia-open-515.48.07-2
no ibt=off

-> All Fine, really thank you all
Comment by loqs (loqs) - Monday, 06 June 2022, 10:00 GMT Comment by lycheegrape (lycheegrape) - Monday, 06 June 2022, 10:24 GMT
Why no patch for nvidia and nvidia-dkms? They're affected by this issue, too
Comment by loqs (loqs) - Monday, 06 June 2022, 14:02 GMT
@lycheegrape because the code missing IBT and SLS in nvidia/nvidia-dkms is supplied by Nvidia precompiled and without the source to build it with the added hardening options.
Comment by Sven-Hendrik Haase (Svenstaro) - Monday, 06 June 2022, 15:37 GMT
I moved nvidia-open to [extra]. For the time being, I will consider this fixed but I hope that upstream will fix this in their code in the future.

Loading...