FS#65952 - [linux-hardened] 5.5.11.a-1 boot panic from Xorg with nvidia-dkms

Attached to Project: Arch Linux
Opened by James Hogan (jhogan) - Monday, 23 March 2020, 19:11 GMT
Last edited by Levente Polyak (anthraxx) - Thursday, 13 August 2020, 09:37 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Levente Polyak (anthraxx)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Updated linux-hardened from 5.4.25.a-2 (which works) to 5.5.11.a-1 (which panics during userland boot).

Reproduced a second time, so reached for the recovery stick to downgrade it.

Then tried linux-5.5.10-arch1-1, which works fine.

See attached photograph of panic message.

I have nvidia-dkms-440.64-5 installed for the following graphics card, which is what causes the taint:
01:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)

Happy to prod further or test kernel patches if it helps.

Here's a snippet I've transcribed:


Starting version 245.2-2-arch

A password is required to access the cryptlvm volume:
Enter passphrase for /dev/sda2:
/dev/mapper/VolGroup0-root: recovering journal
/dev/mapper/VolGroup0-root: clean, xx/xx files xx/xx blocks
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 637 Comm: Xorg Tainted: P W OE 5.5.11.a-1-hardened #1
Hardware name: Hewlett-Packard HP ProDesk 490 G2 MT/21F5, BIOS 02.11 01/14/2015
RIP: 0010: __kmalloc+0x2ec/0x2f0
...
Call Trace:
? sk_prot_alloc+0xd8/0x120
sk_prot_alloc+0xd8/0x120
sk_alloc+0x2e/0x280
__netlink_create+0x40/0xc0
netlink_create+0x106/0x260
__sock_create+0x105/0x1b0
__sys_socket+0x66/0x100
__x64_sys_socket+0x1a/0x20
do_syscall_64+0x51/0x140
entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
Kernel panic - not syncing: Fatal exception
...
This task depends upon

Closed by  Levente Polyak (anthraxx)
Thursday, 13 August 2020, 09:37 GMT
Reason for closing:  Fixed
Comment by Diisocyanate (Diisocyanate) - Saturday, 28 March 2020, 07:35 GMT
I think I have the same problem: I use linux hardened and nvidia-dkms, it crashes since 5.5.11.a-1. I have nvidia-dkms version 440.64-8. Here is the full panic message:

Mar 28 07:49:28 dom0 kernel: ------------[ cut here ]------------
Mar 28 07:49:28 dom0 kernel: refcount_t: underflow; use-after-free.
Mar 28 07:49:28 dom0 kernel: WARNING: CPU: 5 PID: 3431 at lib/refcount.c:28 refcount_warn_saturate+0xb3/0x100
Mar 28 07:49:28 dom0 kernel: Modules linked in: btrfs blake2b_generic xor raid6_pq snd_hda_codec_hdmi evdev snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c cfg80211 nf_tables_set rfkill nf_tables 8021q garp mrp stp llc nfnetlink coretemp ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt hid_generic usbhid hid dm_mod uas usb_storage sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ahci libahci aesni_intel crypto_simd cryptd glue_helper libata tg3 libphy alx scsi_mod mdio xhci_pci xhci_hcd tpm_tis tpm_tis_core tpm rng_core nvidia_drm(POE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart nvidia_uvm(OE) nvidia_modeset(POE) nvidia(POE) ipmi_devintf ipmi_msghandler [last unloaded: snd_intel_dspcfg]
Mar 28 07:49:28 dom0 kernel: CPU: 5 PID: 3431 Comm: Xorg.wrap Tainted: P OE 5.5.13.a-1-hardened #1
Mar 28 07:49:28 dom0 kernel: Hardware name: Gigabyte Technology Co., Ltd. Z170X-Gaming 3/Z170X-Gaming 3, BIOS F22j 03/09/2018
Mar 28 07:49:28 dom0 kernel: RIP: 0010:refcount_warn_saturate+0xb3/0x100
Mar 28 07:49:28 dom0 kernel: Code: 99 d4 08 01 01 e8 ff 80 c0 ff 0f 0b eb 99 80 3d 86 d4 08 01 00 75 90 48 c7 c7 50 e5 12 8f c6 05 76 d4 08 01 01 e8 df 80 c0 ff <0f> 0b e9 76 ff ff ff 80 3d 61 d4 08 01 00 0f 85 69 ff ff ff 48 c7
Mar 28 07:49:28 dom0 kernel: RSP: 0018:ffff9c7b82537d70 EFLAGS: 00010282
Mar 28 07:49:28 dom0 kernel: RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
Mar 28 07:49:28 dom0 kernel: RDX: 0000000000000001 RSI: 0000000000000086 RDI: 0000000000000001
Mar 28 07:49:28 dom0 kernel: RBP: ffff90f8ba67a4e8 R08: 000000000000048d R09: 0000000000000004
Mar 28 07:49:28 dom0 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff90f81dca02e8
Mar 28 07:49:28 dom0 kernel: R13: ffff90f81dca0000 R14: 0000000000000008 R15: 0000000000000000
Mar 28 07:49:28 dom0 kernel: FS: 000064e3e2b94540(0000) GS:ffff90fe1ed40000(0000) knlGS:0000000000000000
Mar 28 07:49:28 dom0 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 28 07:49:28 dom0 kernel: CR2: 000064e3e2a1e4f0 CR3: 00000004252cc001 CR4: 00000000003606e0
Mar 28 07:49:28 dom0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 28 07:49:28 dom0 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 28 07:49:28 dom0 kernel: Call Trace:
Mar 28 07:49:28 dom0 kernel: nv_drm_atomic_helper_disable_all+0xef/0x290 [nvidia_drm]
Mar 28 07:49:28 dom0 kernel: nv_drm_master_drop+0x25/0x60 [nvidia_drm]
Mar 28 07:49:28 dom0 kernel: drm_drop_master+0x28/0x40 [drm]
Mar 28 07:49:28 dom0 kernel: drm_master_release+0x9f/0xb0 [drm]
Mar 28 07:49:28 dom0 kernel: drm_file_free.part.0+0x223/0x280 [drm]
Mar 28 07:49:28 dom0 kernel: drm_release+0xa7/0xe0 [drm]
Mar 28 07:49:28 dom0 kernel: __fput+0xae/0x230
Mar 28 07:49:28 dom0 kernel: task_work_run+0x93/0xb0
Mar 28 07:49:28 dom0 kernel: exit_to_usermode_loop+0xda/0x100
Mar 28 07:49:28 dom0 kernel: do_syscall_64+0x122/0x140
Mar 28 07:49:28 dom0 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mar 28 07:49:28 dom0 kernel: RIP: 0033:0x64e3e2abdd07
Mar 28 07:49:28 dom0 kernel: Code: ff ff e8 3c e3 01 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 93 4c f9 ff
Mar 28 07:49:28 dom0 kernel: RSP: 002b:00007c6b6d340c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
Mar 28 07:49:28 dom0 kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 000064e3e2abdd07
Mar 28 07:49:28 dom0 kernel: RDX: 00007c6b6d340c80 RSI: 00000000c04064a0 RDI: 0000000000000003
Mar 28 07:49:28 dom0 kernel: RBP: 00007c6b6d340cd0 R08: 0000000000000000 R09: 00007c6b6d340af0
Mar 28 07:49:28 dom0 kernel: R10: 00000e0ffc52c64b R11: 0000000000000246 R12: 0000000000000003
Mar 28 07:49:28 dom0 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 00007c6b6d340c80
Mar 28 07:49:28 dom0 kernel: ---[ end trace 2c44ced639eceb6b ]---
Mar 28 07:49:28 dom0 kernel: ------------[ cut here ]------------
Mar 28 07:49:28 dom0 kernel: kernel BUG at mm/slub.c:2831!
Comment by Levente Polyak (anthraxx) - Saturday, 28 March 2020, 15:08 GMT
Can someone of you bisect the vanilla kernel patch introducing this between the two hardened versions?
The checks that fail are support for verifying slab sanitization and just shows up an underlaying problem

However, the second issue looks different from the root of the problem, so please both provide independent bisect-ing and a second issue would make sense to track both traces individually.
Comment by James Hogan (jhogan) - Tuesday, 05 May 2020, 13:52 GMT
Sorry I didn't get around to properly bisecting. I got stuck trying to get my custom kernel build recognised by DKMS so the nvidia module would build against it.

In any case the issue seems to be fixed in linux-hardened now, so feel free to close.

Loading...