FS#42353 - [linux] 3.17.x Lockups
Attached to Project:
Arch Linux
Opened by jason ryan (jasonwryan) - Monday, 13 October 2014, 06:13 GMT
Last edited by Tobias Powalowski (tpowa) - Monday, 17 November 2014, 07:35 GMT
Opened by jason ryan (jasonwryan) - Monday, 13 October 2014, 06:13 GMT
Last edited by Tobias Powalowski (tpowa) - Monday, 17 November 2014, 07:35 GMT
|
Details
Description: After installing a 3.17 kernel (either 3.17-1
or 3.17-2 or compiling 3.17 myself) my machine will
completely lock up anywhere between 3-15 minutes after
booting. The screen will freeze (with no degradation) and
the machine accepts no input from keyboard or touchpad.
Trying to SSH fails with "no route to host". The only remedy
is a hard shutdown.
The journal ends with (there are no other error messages leading up to the final line): Oct 13 18:53:03 Shiv kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000001c This happens with (3.17-2) and without (3.17-1) the microcode update. 3.16-4 continues to work without issue. Additional info: * Intel(R) Core(TM) i5-3317U CPU @ 1.70GHz Steps to reproduce: Boot into 3.17.x and start working... |
This task depends upon
Just tried with 3.17.1-ARCH and the same lockup as initially described: nothing printed to the journal.
What else can I do to try and debug this?
http://git-scm.com/book/en/Git-Tools-Debugging-with-Git
check also if it's not a hardware problem ( bad memory modules, you can use memtest )
I'll start investigating the bisect approach.
might be related - 3.17.1-1 crashes while booting, 3.16.4-* works.
I've attached some printout from journalctl leading up to the crash, since I had some strange ata2 stuff going on leading up to the "BUG: unable to handle kernel NULL pointer dereference" line. Hope any of this information is helpful.
Oct 24 15:28:38 Veles kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
Oct 24 15:28:38 Veles kernel: IP: [<ffffffff811a3339>] mmu_notifier_unregister+0x19/0xe0
Oct 24 15:28:38 Veles kernel: PGD afe20067 PUD b1716067 PMD 0
Oct 24 15:28:38 Veles kernel: Oops: 0000 [#1] PREEMPT SMP
Got a little more infos in my journal, hope they are useful.
I *hope* that I can find the time to do some bisecting this weekend ... can't promise anything, though.
https://freedesktop.org/patch/34166/
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e9681366ea9e76ab8f75e84351f2f3ca63ee542c
this patch seems simple, just one file : /drivers/gpu/drm/i915/i915_gem_userptr.c
I can't see this patch on Kroah's queue as for now.
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/gpu/drm/i915/i915_gem_userptr.c?id=refs/tags/v3.17.2
I'm suffering myself random hard freezes (at least of STDIN/STDOUT) on my laptop, since 3.17-rc4. My logs look differnt, maybe
based on my custom kernel-configuration. I currently don't know my setting of PREEMPT and doesn't have access to my laptop and
reproducing a random bug isn't that easy (I can work for hours till freeze happens or just and half hour).
// edit
Gregs mailbot slapped me (of course) and told me to send it to the usual mailinglists for this. Done.
https://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/tree/queue-3.17/drm-i915-do-not-store-the-error-pointer-for-a-failed-userptr-registration.patch
We should see "3.17.3" soon, shall we just wait instead of patching your PKGBUILD?
I've already patched myself the kernel on my machine and it seems to be stable now.
I think it makes sense to wait for 3.17.3...
The patch is included! Waiting for Arch packaging.