FS#22714 - [kernel: 2.6.37-5] Freeze/Crash with Kernel 2.6.37

Attached to Project: Arch Linux
Opened by Ben Mehne (ben0mega) - Wednesday, 02 February 2011, 04:15 GMT
Last edited by Tobias Powalowski (tpowa) - Wednesday, 15 February 2012, 08:07 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 5
Private No

Details

Description: Crash (no response via any key-combo nor via cap lock).
I believe it was caused by the following bug (warning) in the kernel (could be unrelated, but it seems most likely)

The attached log is the last of many such traces.

Additional info:
* latest

Steps to reproduce:
Run computer for a period of time - crash (no known immediate precipitating agent)
Unloading btusb, bluetooth increases the time before a crash but does not stop it. xorg becomes the top process before a crash/freeze

(any logs available on request)
(22701 is a duplicate of this under the wrong severity level)
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Wednesday, 15 February 2012, 08:07 GMT
Reason for closing:  Upstream
Comment by Ben Mehne (ben0mega) - Wednesday, 02 February 2011, 04:17 GMT
My system actually froze while commenting on the other bug report - all my heat sensors were running cooler than they are now (the ones i am monitoring at least). no cpu spikes that appeared on screen (system monitor applet). only program running was chromium-browser-bin latest. this is happening too often
Comment by tuple (tuple) - Saturday, 05 February 2011, 01:17 GMT
I can report that I have the same issue. I've disabled bluetooth in the bios to no avail. Additionally I reinstalled in the hopes that the issue would be overcome. I did this because I can find nothing in any logs preceeding the freeze or that seem remotely related aside from the attached bug report. I've run local hardware diagnostics which gave no result.
Comment by tuple (tuple) - Saturday, 05 February 2011, 01:20 GMT
Apologies for the double comment :(
The freezes began with the update to the 2.6.37-5 kernel and occur at unpredictable intervals, but as often as 5 times a day.
Comment by Philip Rebohle (ThunderGod) - Sunday, 06 February 2011, 12:11 GMT
Have the problem as well, today and a few days ago. Computer runs about 6 to 15 hours a day. Can't see anything in the logs, but at least SysRQ doesn't work, so it seems to be be the same issue.
Comment by Philipp Kohlbecher (xt28) - Sunday, 06 February 2011, 16:23 GMT
I haven't tried the arch kernel yet, but building a vanilla kernel for x86 with binutils 2.21 produces an unstable kernel (see http://sourceware.org/bugzilla/show_bug.cgi?id=12327). The problem is that ld from binutils 2.21 marks "jiffies" as an absolute symbol. This causes a relocatable kernel to mess up its data whenever it writes to "jiffies". (There might be another problem with binutils 2.21 as well. I believe grub loads the kernel at its preffered address, so no relocation should be necessary. I'm not sure, though.)

Downgrading to binutils 2.20 for the kernel build fixed this for me.

If this isn't an option, you might try and temporarily adjust arch/x86/boot/relocs.c to include "jiffies" in rel_sym_regex, i.e. make it "(^_end$|^jiffies$)".

Hope this helps.
Comment by Philip Rebohle (ThunderGod) - Sunday, 06 February 2011, 17:48 GMT
any chance that the kernel in the repos will be rebuilt against binutils 2.20? massive kernel rebuilding can't be an option to all Arch users.
Comment by Tobias Powalowski (tpowa) - Sunday, 06 February 2011, 21:21 GMT
Thomas Allan any idea?
Comment by Thomas Bächler (brain0) - Sunday, 06 February 2011, 21:41 GMT
I have no idea if your crashes are related to this, but this is what both the kernel logs from above indicate:

'Tainted: P WC'

First, 'W' means that you had a warning before, you should try to find the first one that occurs (without 'Tainted: W') in order to allow finding the problem.

Second, P means 'proprietary'. You have a non-GPL kernel module (nvidia in both cases) loaded. While this hasn't been causing problems as long as I can remember, nobody in the kernel developer land will help you unless you can reproduce the problem without a 'P' taint.

Third, C means 'crap' - a staging driver is loaded, in both cases brcm80211. Staging drivers are not well-tested and likely unstable, so they are the first suspects when considering potential bugs. Your warnings occured in mac80211 somewhere, so they are with high probability connected to brcm80211. If your crashes are also related to that is unknown to me, but it is the best suspect.
Comment by manos (gkmanos21) - Sunday, 06 February 2011, 22:14 GMT
Same problem here. After the upgrade to 2.6.37-5 my system crashes 4-5 times a day and occur at unpredictable intervals.
Comment by Allan McRae (Allan) - Sunday, 06 February 2011, 22:14 GMT
This is not related to the binutils bug as we do not build a relocatable kernel:

config:# CONFIG_RELOCATABLE is not set
config.x86_64:# CONFIG_RELOCATABLE is not set

But if wanted I could update the binutils package to include the fix for that issue just to be double sure...
Comment by Thomas Bächler (brain0) - Sunday, 06 February 2011, 22:36 GMT
manos, could you please read my previous comment and post whether the same situation applies to your computer?
Comment by manos (gkmanos21) - Sunday, 06 February 2011, 23:43 GMT
Thomas, 'Tainted: P WC' is not listed anywhere in my /var/log/kernel.log file
   kernel_log (241.5 KiB)
   kernel_log (241.5 KiB)
Comment by Ben Mehne (ben0mega) - Monday, 07 February 2011, 01:41 GMT
I am seeing a taint (nvidia again) in the above kernel logs but I am not seeing the same ("cut here") kernel warnings
Comment by Thomas Bächler (brain0) - Monday, 07 February 2011, 08:09 GMT
Might be that the crashes are unrelated to the above oopses. At least manos doesn't use brcm80211 and still gets the crashes. Can anyone reproduce this who is not using the binary nvidia driver?
Comment by Nicolai Waniek (rochus) - Monday, 07 February 2011, 19:39 GMT
One of my systems Crashes as well frequently since the last update, but I'm unable to reproduce it with certainty nor was there anything in the logs until just a few minutes ago: My cpudynd seems to crash, but as I was not around (computer did a system update while I was outside) I don't know if the times fit and if the dynd-crash is related to the system freeze. There might be some bug in the memory management because one of the last times the system crashed I was logged in by ssh when suddenly every program started to segfault. memtest didn't show any errors after 2 full days of running.

I've attached my everything.log excerpt for the cpudynd segfault

pc: amd athlon 64 x2 dual core 4400+, amd 5770 graphics card running with the latest catalyst
Comment by Alphazo (alphazo) - Thursday, 10 February 2011, 08:49 GMT
I'm experiencing similar instabilities under 2.6.37-5. I got a complete system freeze (twice) by just starting a VirtualBox VM. Another time I zoomed into a document under LibreOffice and X died. Another time I could not enter my password on the screensaver page and I was forced to hard reboot. BTW I use the autogroup bash script that is supposed to provide better desktop experience under load https://bbs.archlinux.org/viewtopic.php?pid=855231#p855231 I don't know if hurt or not.

I'm attaching my kernel.log file in case it provides any relevant information.

PS: Toward the end there is an issue with iwlagn. This is an open issue I have with compat-wireless that doesn't load the module properly. This has been reported and does not cause any kernel panic.
Comment by Nicola (drakkan) - Friday, 11 February 2011, 19:57 GMT
similar crash here, often the system hang on shutdown
Comment by Nicolai Waniek (rochus) - Friday, 11 February 2011, 20:00 GMT
I have downgraded to the latest 2.6.36 package and running stable for about 3 days now.
Comment by manos (gkmanos21) - Friday, 11 February 2011, 20:49 GMT
Same here, I downgraded the kernel to 2.6.36.3-2 and running stable for 6 days now
Comment by Alphazo (alphazo) - Friday, 11 February 2011, 21:37 GMT
I'm also going back to 2.6.36.3-2 because beside stability issues I'm also unable to use bleeding edge compat-wireless on 2.6.37 and the built-in one is very unstable. https://bugs.archlinux.org/task/22633
Comment by Allan McRae (Allan) - Tuesday, 15 February 2011, 20:46 GMT
Are these issues fixed in 2.6.37-6?
Comment by Ben Mehne (ben0mega) - Wednesday, 16 February 2011, 05:37 GMT
I have not yet had a crash (crossing fingers) but the bug/warning still exists (still using tainted kernel - see log).
Comment by Ben Mehne (ben0mega) - Wednesday, 16 February 2011, 17:27 GMT
Same damn problem. Freezes. There was not a warning on the kernel log this time. In other news, I am physically sick and cannot spend the time trying to get non-proprietary modules to work at the moment.
Comment by tuple (tuple) - Wednesday, 16 February 2011, 22:10 GMT
I have found that I do not have freezes if I disable wireless network devices, which I've been doing through network manager. Additionally, I'm much more likely to get freezes or network breakage if I transfer lots of data over wireless. Sometimes its just that the wireless seems connected but doesn't allow for any network activity (pinging the default route fails) though I'm connected to the AP. I've been running a little fixWifi.sh script that often works:


/etc/init.d/networkmanager stop
rmmod brcm80211
rmmod mac80211
rmmod cfg80211
rmmod rfkill
sleep 2
modprobe rfkill
modprobe cfg80211
modprobe mac80211
modprobe brcm80211
sleep 2
/etc/init.d/networkmanager start

though not always and often the problem comes back soon after. Still getting the same logged errors though. I would switch to nouveau to help but it plays merry havoc with my screen resolution and, as this is my work machine, I'm a little hesitant about losing the graphical desktop for a few days.

Don't know if this will help anyone or narrow down the issue, but there it is :)
Comment by Ben Mehne (ben0mega) - Saturday, 19 February 2011, 23:30 GMT
The following is from the brcm80211 module - i have a feeling it is the root cause of the problem.

------------[ cut here ]------------
Feb 19 18:27:29 SilverLeaf kernel: WARNING: at net/mac80211/rx.c:2860 ieee80211_rx+0x32e/0x8b0 [mac80211]()
Feb 19 18:27:29 SilverLeaf kernel: Hardware name: MacBookPro6,2
Feb 19 18:27:29 SilverLeaf kernel: Modules linked in: arc4 ecb brcm80211(C) mac80211 cfg80211 hidp ipv6 rfcomm sco bnep l2cap snd_hda_codec_hdmi btusb bluetooth rfkill hid_apple uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 uas usbhid bcm5974 hid usb_storage snd_seq_dummy snd_hda_codec_cirrus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device joydev nouveau ttm drm_kms_helper snd_hda_intel snd_pcm_oss drm firewire_ohci tg3 snd_hda_codec snd_hwdep applesmc snd_pcm firewire_core i2c_i801 i2c_algo_bit snd_mixer_oss video uhci_hcd libphy ehci_hcd crc_itu_t usbcore input_polldev snd_timer i2c_core shpchp pci_hotplug output evdev sg snd ac battery button soundcore snd_page_alloc processor pcspkr intel_agp intel_gtt iTCO_wdt iTCO_vendor_support intel_ips tpm_tis tpm tpm_bios fuse ext4 mbcache jbd2 crc16 sr_mod cdrom sd_mod ata_piix pata_acpi libata scsi_mod
Feb 19 18:27:29 SilverLeaf kernel: Pid: 0, comm: kworker/0:0 Tainted: G WC 2.6.37-ARCH #1
Feb 19 18:27:29 SilverLeaf kernel: Call Trace:
Feb 19 18:27:29 SilverLeaf kernel: <IRQ> [<ffffffff8105683a>] warn_slowpath_common+0x7a/0xb0
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffff81056885>] warn_slowpath_null+0x15/0x20
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffffa04b855e>] ieee80211_rx+0x32e/0x8b0 [mac80211]
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffffa0532518>] ? wlc_dpc+0x178/0x7e0 [brcm80211]
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffffa04a0b51>] ieee80211_tasklet_handler+0xc1/0xd0 [mac80211]
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffff8105d2f2>] tasklet_action+0xa2/0x180
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffff8105dbf9>] __do_softirq+0xc9/0x250
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffff8102bfdc>] ? ack_apic_level+0x6c/0x1f0
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffff8100cddc>] call_softirq+0x1c/0x30
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffff8100f0b5>] do_softirq+0x65/0xa0
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffff8105de7d>] irq_exit+0x8d/0x90
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffff8100ecbc>] do_IRQ+0x6c/0xe0
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffff813a7353>] ret_from_intr+0x0/0x11
Feb 19 18:27:29 SilverLeaf kernel: <EOI> [<ffffffff812426db>] ? intel_idle+0xdb/0x1b0
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffff812426ba>] ? intel_idle+0xba/0x1b0
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffff812d781c>] cpuidle_idle_call+0x8c/0x170
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffff8100a23a>] cpu_idle+0xaa/0x160
Feb 19 18:27:29 SilverLeaf kernel: [<ffffffff8139dad1>] start_secondary+0x20d/0x214
Feb 19 18:27:29 SilverLeaf kernel: ---[ end trace 9bb3b3e194073094 ]---
Comment by Dave Reisner (falconindy) - Saturday, 19 February 2011, 23:35 GMT
This is an upstream bug fixed in 2.6.38-rc5. The warning is unrelated, but also does not seem to be present on 2.6.38-rc5.

https://patchwork.kernel.org/patch/504981/

Looks like it was backported and landed in 2.6.37.1. Ben, could you try the kernel in testing and see if this fixes the problem?
Comment by Philip Rebohle (ThunderGod) - Saturday, 19 February 2011, 23:39 GMT
Wondering if anyone experienced freezes with 2.6.38.
Comment by Dave Reisner (falconindy) - Saturday, 19 February 2011, 23:42 GMT
I've been compiling -rc's for my MacBookAir (uses the same brcm80211 driver) -- I have yet to see a kernel panic since 2.6.38-rc5.
Comment by Ben Mehne (ben0mega) - Tuesday, 22 February 2011, 16:43 GMT
No crash or warning [yet] on 37.1-1 I have used some pretty internet-heavy applications, so it seems to work for me.
Comment by tuple (tuple) - Tuesday, 22 February 2011, 16:51 GMT
Same here. upgraded kernel26 (2.6.37-6 -> 2.6.37.1-1) and no wireless issues or kernel panics. Wireless is much faster to authenticate to an AP again and I've had zero lockups. I would have expect minimum 1-3 kernel panics in the timeframe I've been running on wireless.
Comment by brad mrumlinski (edigi) - Thursday, 17 March 2011, 16:05 GMT
This is definitely much more stable now... But I can still reproduce the problem when I thread a bunch of http requests at once.

I would like to find out what firmware other people are using along with 2.6.38 and if I am using the best version located here git clone git://git.kernel.org/pub/scm/linux/kernel/git/dwmw2/linux-firmware.git

Thanks!
Comment by Alphazo (alphazo) - Thursday, 17 March 2011, 16:41 GMT
It all started when I undocked my X101 laptop using the appropriate button. Then about 10 minutes later I got kicked out of my X session, back to SLiM. Then later I got a complete system crash. I couldn't even use ALT+SysReq REISUB. There is a trace in my kernel.log related to i915 driver (see attached).
   kernel.log (170.8 KiB)
Comment by Alphazo (alphazo) - Thursday, 17 March 2011, 16:51 GMT
Forgot to mention that I use kernel 2.6.37.3-1, intel-dri 7.10.1-1, xf86-video-intel 2.14.0-3.
Comment by Thomas Bächler (brain0) - Thursday, 17 March 2011, 17:02 GMT
And now we have at least 2 (maybe 3) different issues in one bug report again. Just because your kernel also freezes doesn't mean you have the same (or any related) bug.
Comment by Leonid Isaev (lisaev) - Thursday, 14 July 2011, 20:03 GMT
Has anything changed since .37 days?
Comment by brad mrumlinski (edigi) - Thursday, 14 July 2011, 20:12 GMT
Nope... Still experiencing the same kernel lockups... Just not as often. I know they are related to brcm80211 because when I switch to wl i NEVER have these issues. My macbook air can go in and out of sleeps and different wireless networks for days with no kernel panics at all. (yes wl seems slower at connecting and is less stable at staying connected without interruption) but at least it doesn't take down the whole system with it...

Loading...