Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#69916 - [linux] kernel NULL pointer dereference, black screen when using two graphics cards

Attached to Project: Arch Linux
Opened by Dennis Foster (dennisfoster) - Monday, 08 March 2021, 17:30 GMT
Last edited by Andreas Radke (AndyRTR) - Tuesday, 30 March 2021, 12:11 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To No-one
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description:

On linux-5.11 and above (up to 5.11.4) I cannot longer boot the system (GNOME/Wayland) using two graphics cards. It gets stuck with black screen with no response to keyboard/mouse.

In systemd journal there are some messages about kernel bug:

Mar 08 11:54:05 homeserver kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
Mar 08 11:54:05 homeserver kernel: #PF: supervisor read access in kernel mode
Mar 08 11:54:05 homeserver kernel: #PF: error_code(0x0000) - not-present page
Mar 08 11:54:05 homeserver kernel: PGD 0 P4D 0
Mar 08 11:54:05 homeserver kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Mar 08 11:54:05 homeserver kernel: CPU: 6 PID: 608 Comm: gnome-shell Tainted: G OE 5.11.4-arch1-1 #1
Mar 08 11:54:05 homeserver kernel: Hardware name: Gigabyte Technology Co., Ltd. A320M-S2H/A320M-S2H-CF, BIOS F2 11/03/2020
Mar 08 11:54:05 homeserver kernel: RIP: 0010:drm_gem_handle_create_tail+0xcb/0x190 [drm]
Mar 08 11:54:05 homeserver kernel: Code: 00 48 89 df e8 c6 20 59 f4 45 85 e4 78 77 48 8d 5d 18 4c 89 ee 48 89 df e8 42 fe 00 00 89 c2 85 c0 75 3e 48 8b 85 40 01 00 00 <48> 8b 40 08 48 85 c0 74 0f 4c 89 ee 48 89 ef e8 81 8b 91 f4 85 c0
Mar 08 11:54:05 homeserver kernel: RSP: 0018:ffffb7a7c16bfd30 EFLAGS: 00010246
Mar 08 11:54:05 homeserver kernel: RAX: 0000000000000000 RBX: ffffa0eabe065090 RCX: 0000000000000001


Everything works perfectly fine using linux-lts package (5.10.21).
I am using Radeon RX 470 as a primary card, and the older ATI FirePro 2270 as a secondary one in order to provide two extra monitor outputs.

I've attached my lspci output as well as full systemd log.
This task depends upon

Closed by  Andreas Radke (AndyRTR)
Tuesday, 30 March 2021, 12:11 GMT
Reason for closing:  Fixed
Comment by loqs (loqs) - Tuesday, 09 March 2021, 01:50 GMT
Possibly related to https://www.spinics.net/lists/amd-gfx/msg59532.html please try the referenced patch series.
Comment by rc0r (rc0r) - Tuesday, 09 March 2021, 10:50 GMT
I followed the instructions in [1] and [2] to build a custom kernel with the patches.

Unfortunately the problem persists:


-- Journal begins at Mon 2020-12-14 03:51:37 CET, ends at Tue 2021-03-09 11:38:55 CET. --
Mär 09 11:30:08 pcidev kernel: microcode: microcode updated early to revision 0xde, date = 2020-05-26
Mär 09 11:30:08 pcidev kernel: Linux version 5.11.4-arch1-1-amdpatches (linux-amdpatches@archlinux) (gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP PREEMPT Tue, 09 Mar 2021 09:43:09 +0000
Mär 09 11:30:08 pcidev kernel: Command line: BOOT_IMAGE=/vmlinuz-linux-amdpatches root=UUID=9aae8306-98a1-4722-a0a0-d2f1ee18afbb rw loglevel=3 quiet
[...]
Mär 09 11:30:36 pcidev kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
Mär 09 11:30:36 pcidev kernel: #PF: supervisor read access in kernel mode
Mär 09 11:30:36 pcidev kernel: #PF: error_code(0x0000) - not-present page
Mär 09 11:30:36 pcidev kernel: PGD 0 P4D 0
Mär 09 11:30:36 pcidev kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Mär 09 11:30:36 pcidev kernel: CPU: 7 PID: 524 Comm: Xorg Not tainted 5.11.4-arch1-1-amdpatches #1
Mär 09 11:30:36 pcidev kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z270 Gaming K6, BIOS P2.10 05/05/2017
Mär 09 11:30:36 pcidev kernel: RIP: 0010:drm_gem_handle_create_tail+0xcb/0x190 [drm]
Mär 09 11:30:36 pcidev kernel: Code: 00 48 89 df e8 c6 20 6d d1 45 85 e4 78 77 48 8d 5d 18 4c 89 ee 48 89 df e8 42 fe 00 00 89 c2 85 c0 75 3e 48 8b 85 40 01 00 00 <48> 8b 40 08 48 85 c0 74 0f 4c 89 ee 48 89 ef e8 81 8b a5 d1 85 c0
Mär 09 11:30:36 pcidev kernel: RSP: 0018:ffffbc6540d17d30 EFLAGS: 00010246
Mär 09 11:30:36 pcidev kernel: RAX: 0000000000000000 RBX: ffff99977187c090 RCX: 0000000000000001
Mär 09 11:30:36 pcidev kernel: RDX: 0000000000000000 RSI: ffffffffc05b9311 RDI: 0000000000000000
Mär 09 11:30:36 pcidev kernel: RBP: ffff99977187c078 R08: ffff99977187c140 R09: ffff99974efc5300
Mär 09 11:30:36 pcidev kernel: R10: ffff99975cdbab80 R11: ffffe38bc522cc08 R12: 0000000000000007
Mär 09 11:30:36 pcidev kernel: R13: ffff99975e8dd200 R14: ffff99975e8dd250 R15: ffff99975e8dd238
Mär 09 11:30:36 pcidev kernel: FS: 00007efceb033940(0000) GS:ffff99a67f1c0000(0000) knlGS:0000000000000000
Mär 09 11:30:36 pcidev kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mär 09 11:30:36 pcidev kernel: CR2: 0000000000000008 CR3: 000000010e100004 CR4: 00000000003706e0
Mär 09 11:30:36 pcidev kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mär 09 11:30:36 pcidev kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mär 09 11:30:36 pcidev kernel: Call Trace:
Mär 09 11:30:36 pcidev kernel: drm_gem_prime_fd_to_handle+0xfb/0x1d0 [drm]
Mär 09 11:30:36 pcidev kernel: ? drm_prime_destroy_file_private+0x20/0x20 [drm]
Mär 09 11:30:36 pcidev kernel: drm_ioctl_kernel+0xb2/0x100 [drm]
Mär 09 11:30:36 pcidev kernel: drm_ioctl+0x215/0x390 [drm]
Mär 09 11:30:36 pcidev kernel: ? drm_prime_destroy_file_private+0x20/0x20 [drm]
Mär 09 11:30:36 pcidev kernel: radeon_drm_ioctl+0x49/0x80 [radeon]
Mär 09 11:30:36 pcidev kernel: __x64_sys_ioctl+0x83/0xb0
Mär 09 11:30:36 pcidev kernel: do_syscall_64+0x33/0x40
Mär 09 11:30:36 pcidev kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mär 09 11:30:36 pcidev kernel: RIP: 0033:0x7efceba1be6b
Mär 09 11:30:36 pcidev kernel: Code: ff ff ff 85 c0 79 8b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 af 0c 00 f7 d8 64 89 01 48
Mär 09 11:30:36 pcidev kernel: RSP: 002b:00007ffe51f032a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mär 09 11:30:36 pcidev kernel: RAX: ffffffffffffffda RBX: 00007ffe51f032ec RCX: 00007efceba1be6b
Mär 09 11:30:36 pcidev kernel: RDX: 00007ffe51f032ec RSI: 00000000c00c642e RDI: 0000000000000017
Mär 09 11:30:36 pcidev kernel: RBP: 00000000c00c642e R08: 00007ffe51f03390 R09: 0000000000000003
Mär 09 11:30:36 pcidev kernel: R10: 00007efce9a2fab0 R11: 0000000000000246 R12: 00005635987223c0
Mär 09 11:30:36 pcidev kernel: R13: 0000000000000017 R14: 0000000000100000 R15: 00007ffe51f03a90
Mär 09 11:30:36 pcidev kernel: Modules linked in: joydev mousedev snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio uas usb_storage ftdi_sio usbhid wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 ip6_udp_tunnel udp_tunnel libcurve25519_generic libchacha libblake2s_generic intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_core kvm_intel snd_hwdep nls_iso8859_1 soundwire_bus vfat radeon fat kvm mei_hdcp irqbypass iTCO_wdt i915 crct10dif_pclmul intel_pmc_bxt crc32_pclmul ee1004 iTCO_vendor_support drm_ttm_helper intel_wmi_thunderbolt mxm_wmi ttm ghash_clmulni_intel snd_soc_core drm_kms_helper aesni_intel snd_compress ac97_bus crypto_simd snd_pcm_dmaengine cryptd glue_helper snd_pcm igb cec rapl e1000e intel_cstate intel_gtt snd_timer syscopyarea sysfillrect snd i2c_i801 mei_me
Mär 09 11:30:36 pcidev kernel: i2c_algo_bit sysimgblt intel_uncore pcspkr i2c_smbus mei dca soundcore fb_sys_fops wmi video mac_hid acpi_pad drm sg fuse crypto_user agpgart bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 xhci_pci crc32c_intel xhci_pci_renesas
Mär 09 11:30:36 pcidev kernel: CR2: 0000000000000008
Mär 09 11:30:36 pcidev kernel: ---[ end trace 5655bed8e65f3c63 ]---
Mär 09 11:30:36 pcidev kernel: RIP: 0010:drm_gem_handle_create_tail+0xcb/0x190 [drm]
Mär 09 11:30:36 pcidev kernel: Code: 00 48 89 df e8 c6 20 6d d1 45 85 e4 78 77 48 8d 5d 18 4c 89 ee 48 89 df e8 42 fe 00 00 89 c2 85 c0 75 3e 48 8b 85 40 01 00 00 <48> 8b 40 08 48 85 c0 74 0f 4c 89 ee 48 89 ef e8 81 8b a5 d1 85 c0
Mär 09 11:30:36 pcidev kernel: RSP: 0018:ffffbc6540d17d30 EFLAGS: 00010246
Mär 09 11:30:36 pcidev kernel: RAX: 0000000000000000 RBX: ffff99977187c090 RCX: 0000000000000001
Mär 09 11:30:36 pcidev kernel: RDX: 0000000000000000 RSI: ffffffffc05b9311 RDI: 0000000000000000
Mär 09 11:30:36 pcidev kernel: RBP: ffff99977187c078 R08: ffff99977187c140 R09: ffff99974efc5300
Mär 09 11:30:36 pcidev kernel: R10: ffff99975cdbab80 R11: ffffe38bc522cc08 R12: 0000000000000007
Mär 09 11:30:36 pcidev kernel: R13: ffff99975e8dd200 R14: ffff99975e8dd250 R15: ffff99975e8dd238
Mär 09 11:30:36 pcidev kernel: FS: 00007efceb033940(0000) GS:ffff99a67f1c0000(0000) knlGS:0000000000000000
Mär 09 11:30:36 pcidev kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mär 09 11:30:36 pcidev kernel: CR2: 0000000000000008 CR3: 000000010e100004 CR4: 00000000003706e0
Mär 09 11:30:36 pcidev kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mär 09 11:30:36 pcidev kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400



Attached you'll find my PKGBUILD file used for building the kernel.

[1] https://wiki.archlinux.org/index.php/Kernel/Arch_Build_System
[2] https://wiki.archlinux.org/index.php/Patching_packages#Applying_patches
   PKGBUILD (6.6 KiB)
Comment by loqs (loqs) - Tuesday, 09 March 2021, 14:27 GMT
The local files need to use .patch as the extension to be auto applied. Looking at your PKGBUILD I think the local file are called drmradeon_1 drmradeon_2 and drmradeon_3 and they should be called drmradeon_1.patch drmradeon_2.patch and drmradeon_3.patch.
Comment by rc0r (rc0r) - Tuesday, 09 March 2021, 15:24 GMT
Silly me! Changed the patch file names and verified the patches get applied, now I can log into my display manager without issues.

Can we expect these patches to be merged into the official Arch Linux kernels? And if so, is it possible to give a timeframe estimate?

Thanks for your help @loqs, much appreciated!
Comment by Dennis Foster (dennisfoster) - Thursday, 18 March 2021, 14:33 GMT
I can confirm now that the issue is fixed in recent 5.11.7 kernel.
Comment by rc0r (rc0r) - Monday, 22 March 2021, 08:59 GMT
Yes, just tested with 5.11.8 and the issue is gone.

Loading...