FS#80207 - [linux] 6.6.x i915 driver crash

Attached to Project: Arch Linux
Opened by D (Nebulosa) - Thursday, 09 November 2023, 11:21 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:13 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To No-one
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:
After upgrading to v6.6.1 from v6.5.9 (also tried downgrading to v6.6) - KDE starts, but audio not detects (after a couple minutes it detects finally), switching to other tty (CTRL+ALT+F1, ..F3, F4 etc) not working. Also, I can't reboot or even poweroff my PC - it just continue working with a blank screen. In GRUB shell or on linux kernel v6.5.9 rebooting and poweroff is working as expected.


Steps to reproduce:
Upgrade to latest linux kernel and reboot.
This task depends upon

Closed by  Buggy McBugFace (bugbot)
Saturday, 25 November 2023, 20:13 GMT
Reason for closing:  Moved
Additional comments about closing:  https://gitlab.archlinux.org/archlinux/p ackaging/packages/linux/issues/5
Comment by ray (chromer030) - Thursday, 09 November 2023, 13:35 GMT
Same here with blank screen.
Comment by loqs (loqs) - Thursday, 09 November 2023, 13:45 GMT
Please work through the support channels to determine the exact cause of the bug. See also [1] for advice on debugging kernel regressions.

[1]: https://wiki.archlinux.org/title/Kernel#Debugging_regressions
Comment by Toolybird (Toolybird) - Thursday, 09 November 2023, 20:23 GMT
As pointed out by @loqs, definitely seems like an upstream kernel regression. Some noteworthy lines from your log below. i915 driver kernel issues can be reported upstream at [1]. But as indicated, probably best if you first of all visit the Arch support channels for troubleshooting assistance. Please let us know what you find out.

[1] https://gitlab.freedesktop.org/drm/intel

ACPI: video: [Firmware Bug]: Duplicate ACPI video bus devices for the same VGA controller, please try module parameter "video.allow_duplicates=1" if the current driver doesn't work.
Hardware name: Sony Corporation VPCSB2X9R/VAIO, BIOS R2087H4 06/15/2012
RIP: 0010:video_get_max_state+0x17/0x30 [video]
Code: Unable to access opcode bytes at 0xffffffffc04a5ffd.

Call Trace:
<TASK>
? __die+0x23/0x70
? page_fault_oops+0x171/0x4e0
? exc_page_fault+0x7f/0x180
? asm_exc_page_fault+0x26/0x30
? video_get_max_state+0x17/0x30 [video 7b201efe94c3bedceddd3a2b0b46fc87b7fd7ce0]
__thermal_cooling_device_register.part.0+0xf2/0x2f0
acpi_video_bus_register_backlight.part.0.isra.0+0x414/0x570 [video 7b201efe94c3bedceddd3a2b0b46fc87b7fd7ce0]
acpi_video_register_backlight+0x57/0x80 [video 7b201efe94c3bedceddd3a2b0b46fc87b7fd7ce0]
intel_acpi_video_register+0x68/0x90 [i915 91011d57f4e61d512e7a501bb95f35e3223bce0f]
intel_display_driver_register+0x28/0x50 [i915 91011d57f4e61d512e7a501bb95f35e3223bce0f]
i915_driver_probe+0x790/0xb90 [i915 91011d57f4e61d512e7a501bb95f35e3223bce0f]
Comment by ray (chromer030) - Friday, 10 November 2023, 09:32 GMT Comment by D (Nebulosa) - Monday, 13 November 2023, 06:24 GMT
Thanks for clarifying the information. On bugzilla, the status is now RESOLVED CODE_FIX, so I tried installing the linux-mainline 6.7rc1 package and the regression was resolved.
Comment by loqs (loqs) - Tuesday, 14 November 2023, 14:31 GMT Comment by D (Nebulosa) - Wednesday, 15 November 2023, 05:48 GMT
Recheck OS working on 6.7.0-rc1-1-mainline and regression I still here. I'm apologizing for misleading.

Downgraded kernel again on 6.5.9.
Comment by loqs (loqs) - Wednesday, 15 November 2023, 13:12 GMT
@Nebulosa if the issue is building a patched kernel to test, the kernel linked below is 6.6.1-arch1 with [1] applied:

https://drive.google.com/file/d/1cSs8kTxkHY1qEzqM99ChvFADurTUnU3d/view?usp=sharing linux-6.6.1.arch1-1.2-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1mf8lQvhIYLNpzP0zOVT3uH4U4gpO_AQW/view?usp=sharing linux-headers-6.6.1.arch1-1.2-x86_64.pkg.tar.zst

[1] https://gitlab.freedesktop.org/drm/intel/uploads/fcccbfe2833d6f4679dff3608c735ee2/0001-drm-i915-Also-check-for-VGA-converter-in-eDP-probe.patch
Comment by D (Nebulosa) - Wednesday, 15 November 2023, 19:30 GMT
Thank you for participating! Building is not an issue, it just takes a long time (about 12 hours on my laptop).

So, I took PKGBUILD from: https://archlinux.org/packages/core/x86_64/linux/, added patch 0001-drm-i915-Also-check-for-VGA-converter-in-eDP-probe.patch, made sure that the changes were applied and installed the linux, linux-headers packages (version 6.6.1-arch1-2 in my case). Regression is still here. Dmesg: https://0x0.st/Hvjp.txt

Also, I tried to install your packages '6.6.1.arch1-1.2' but nothing changed. Dmesg: https://0x0.st/Hvjx.txt

6.5.9 is still the latest kernel that worked.
Comment by loqs (loqs) - Wednesday, 15 November 2023, 19:40 GMT
@Nebulosa were you able to bisect the kernel or confirm if 49d4648b65d03752904ac945aefa60044329a9a3 was the first bad commit as in 9636 [1]? If it is not 9636 then the issue still needs reporting upstream?

[1] https://gitlab.freedesktop.org/drm/intel/-/issues/9636
Comment by D (Nebulosa) - Thursday, 16 November 2023, 16:43 GMT
Can't understand how I can confirm is bad commit or not without of full procedure of bisecting?

Meanwhile, I read wiki, forum, etc., installed ccache and modprobed-db, removed docs building from PKGBUILD and started procedure of bisecting.
Comment by loqs (loqs) - Thursday, 16 November 2023, 17:07 GMT
> Can't understand how I can confirm is bad commit on not without of full procedure of bisecting?
If you are checking out the commits and using makepkg -e:
git checkout 49d4648b65d03752904ac945aefa60044329a9a3 # test if the commit is bad
git show 49d4648b65d03752904ac945aefa60044329a9a3 # confirm the commit is not a merge
git rev-parse 49d4648b65d03752904ac945aefa60044329a9a3~ # get the commits first parent
git checkout 9856308c94ca821fdc6f3440e4d4de069b09677c # test if parent of the commit above is good
Comment by D (Nebulosa) - Thursday, 16 November 2023, 17:40 GMT
Yes, I'm using this article https://wiki.archlinux.org/title/Bisecting_bugs_with_Git and https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git -for sources

After git checkout, I should build kernel again, right?

Comment by loqs (loqs) - Thursday, 16 November 2023, 18:31 GMT
Yes:
$ cd ../..
$ makepkg -efsi
Comment by D (Nebulosa) - Saturday, 18 November 2023, 16:53 GMT
Finally I get the answer, both of commit - Kernel panic or blank screen with blinking cursor.

Continue bisection proccess...
Comment by loqs (loqs) - Wednesday, 22 November 2023, 18:29 GMT
@chromer030 do you want to open a separate bug report for the issue you bisected asking for the inclusion of [1]?

[1]: https://cgit.freedesktop.org/drm-intel/commit/?h=for-linux-next&id=fcd479a79120bf0cd507d85f898297a3b868dda6
Comment by D (Nebulosa) - Thursday, 23 November 2023, 10:23 GMT
I've done bisect.

https://github.com/torvalds/linux/commit/0d16710146a10cf62b3efddee8ffd006432d5d7e - is first bad commit

What should I do next?
Comment by loqs (loqs) - Thursday, 23 November 2023, 14:26 GMT
As a cross check try 6.6 or 6.6.2 or 6.6.2.arch1-1 or 6.7-rc2 with 0d16710146a10cf62b3efddee8ffd006432d5d7e reverted.
Assuming you can no longer produce the issue with the revert you need to decide if you want to report the issue against the i915 driver [1] or acpi [2] possibly on the https://bugzilla.kernel.org product ACPI component ACPICA-Core and add to the CC list Michal Wilczynski <michal.wilczynski@intel.com> plus Rafael J. Wysocki <rafael.j.wysocki@intel.com> or the linux-acpi mailing list possibly by replying to https://lore.kernel.org/linux-acpi/20230703080252.2899090-3-michal.wilczynski%40intel.com/

$ perl scripts/get_maintainer.pl drivers/acpi/bus.c
"Rafael J. Wysocki" <rafael@kernel.org> (supporter:ACPI)
Len Brown <lenb@kernel.org> (reviewer:ACPI)
linux-acpi@vger.kernel.org (open list:ACPI)
linux-kernel@vger.kernel.org (open list)

[1]: https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html
[2]: https://docs.kernel.org/admin-guide/reporting-issues.html
Comment by D (Nebulosa) - Friday, 24 November 2023, 07:53 GMT
Yep, reverting commit helps, 6.7rc2 works fine.

Trivial patch in attach.

I think it's i915-specific bug, on my other instances (with Nvidia card or with other several vpses with virtual cards) there is no regression at all, so right decision will be reporting issue in gitlab.freedesktop org at first.
Comment by D (Nebulosa) - Friday, 24 November 2023, 09:04 GMT

Loading...