FS#26847 : [linux] lots of stalls and finaly freeze

FS#26847 - [linux] lots of stalls and finaly freeze

Attached to Project: Arch Linux
Opened by Mariusz Libera (mar04) - Friday, 11 November 2011, 12:35 GMT
Last edited by Tobias Powalowski (tpowa) - Saturday, 07 April 2012, 06:51 GMT

Task Type	Bug Report
Category	Kernel
Status	Closed
Assigned To	Tobias Powalowski (tpowa) Thomas Bächler (brain0)
Architecture	x86_64
Severity	Critical
Priority	Normal
Reported Version
Due in Version	Undecided
Due Date	Undecided
Percent Complete
Votes	28 KaiSVK (KaiSVK) (2012-04-06) Andrew (Shiz0) (2012-03-29) goran vucelic (simke) (2012-03-29) Julien (julroy67) (2012-03-28) Paul Ezvan (paulez) (2012-03-22) Lukas Jirkovsky (6xx) (2012-02-29) Philip Crump (craag) (2012-02-09) Mihai Coman (z0id) (2012-01-26) Anderson Medeiros Gomes (amg1127) (2012-01-16) Luis Manuel Ramos Da Costa (aliasbody) (2012-01-13) Mantas Mikulėnas (grawity) (2012-01-11) reaper (reaper) (2012-01-09) Wang Guan (jokester) (2011-12-23) Catalin David (cdavid) (2011-12-14) Vasyl Demin (zersaa) (2011-12-14) Zhtlancer (zhtlancer) (2011-12-13) Pierre Willaime (ppr) (2011-12-12) Carl Reinke (Mindless) (2011-12-07) Matthew William Cox (mwc) (2011-12-02) cha5on (cha5on) (2011-12-01) Björn Seifert (bseifert) (2011-11-29) Ondřej Konečný (andrew9888) (2011-11-25) Matus Komora (arabak) (2011-11-25) Francisco Pina (Stunts) (2011-11-22) Jan Alexander Steffens (heftig) (2011-11-16) Michael (madmike_) (2011-11-14) Mariusz Libera (mar04) (2011-11-11) Federico (fgm) (2011-11-11)
Private	No

Details

Description:
Since updating to 3.1 my laptop froze 3 times already. I don't know what is causing it but this time I checked kernel log and there are lots of "INFO: rcu_preempt_state detected stalls on CPUs/tasks:" messages. They start to appear an hour before system freezes. Meanwhile I only noticed that mp3 playback was little choppy.

Additional info:
* up to date stock Arch packages

Steps to reproduce:

kernel.log (617.2 KiB)

lspci (1.4 KiB)

This task depends upon

Closed by Tobias Powalowski (tpowa)
Saturday, 07 April 2012, 06:51 GMT
Reason for closing: Fixed

Comment by Federico (fgm) - Friday, 11 November 2011, 13:47 GMT

I have the same situation, with a notebook Latitude E6510. The problem seems to be the wireless, you can try to disable the wireless or install the Kernel LTS.

Comment by Liao Haohui (liaohaohui) - Friday, 11 November 2011, 14:11 GMT

I can't confirm this with my Acer Aspire 4740 laptop since I have been running for no more than 5 minutes. Within the 5 minutes my wireless connection with just hang/stop and I will not be able to connect to my wifi and Internet. I am not sure if this is closely related to the following problem in which I posted my comment:

https://bugs.archlinux.org/task/26674

Comment by Liao Haohui (liaohaohui) - Friday, 11 November 2011, 14:12 GMT

So I am currently still using linux-3.0.7-1. Linux-3.1-4 refused to work properly for my wifi connection.

Comment by Mariusz Libera (mar04) - Friday, 11 November 2011, 14:28 GMT

My wifi works ok with 3.0.7 and 3.1.0. In fact it was disabled last time freeze happened.

Comment by Jelle van der Waa (jelly) - Wednesday, 16 November 2011, 08:30 GMT

These bugs are mostly upstream and archlinux developers can't really do much to help you here. I advise you to contact / ask upstream the kernel devs, who work on the code and also can fix it. Googling the LKML mailing list might be give you some answers.

Comment by Jan Alexander Steffens (heftig) - Wednesday, 16 November 2011, 11:04 GMT

Seeing the same RCU stalls here.

Collected some dmesgs: https://gist.github.com/1363432
All from ZEN kernel, but hopefully still useful. The Arch kernel hung, as well.

Also running Sandy Bridge.

Comment by Mariusz Libera (mar04) - Wednesday, 16 November 2011, 11:45 GMT

Another freeze today with 3.1.1.
Don't really know how to report it upstream, bugzilla.kernel.org is down.

Comment by Michael (madmike_) - Wednesday, 16 November 2011, 16:41 GMT

I can trigger this with transmission-gtk after a couple of seconds.

Comment by Jan Alexander Steffens (heftig) - Thursday, 17 November 2011, 13:40 GMT

What hardware is everyone who suffers from this running? Sandy Bridge?

Comment by Liao Haohui (liaohaohui) - Thursday, 17 November 2011, 14:27 GMT

Acer Aspire 4740: lspci |grep -i bridge

I don't find any Sandy Bridge:

00:00.0 Host bridge: Intel Corporation Core Processor DRAM Controller (rev 12)
00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 (rev 05)
00:1c.1 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 (rev 05)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation Mobile 5 Series Chipset LPC Interface Controller (rev 05)
ff:00.0 Host bridge: Intel Corporation Core Processor QuickPath Architecture Generic Non-core Registers (rev 02)
ff:00.1 Host bridge: Intel Corporation Core Processor QuickPath Architecture System Address Decoder (rev 02)
ff:02.0 Host bridge: Intel Corporation Core Processor QPI Link 0 (rev 02)
ff:02.1 Host bridge: Intel Corporation Core Processor QPI Physical 0 (rev 02)
ff:02.2 Host bridge: Intel Corporation Core Processor Reserved (rev 02)
ff:02.3 Host bridge: Intel Corporation Core Processor Reserved (rev 02)

Comment by Liao Haohui (liaohaohui) - Thursday, 17 November 2011, 14:37 GMT

The following are my Acer Aspire 4740's hardward:

01:00.0 Ethernet controller: Broadcom Corporation NetLink BCM57780 Gigabit Ethernet PCIe (rev 01)

02:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)

For Linux kernel-3.0.7, the wlan0 will disconnect (I am using wpa_supplicant 0.7.3-4) some of the times but it will resume after 2 or 3 seconds. However, with linux-3.1-4, it will stall after a few minutes and I have to kill wpa_supplicant and reload for it to reconnect the wifi. So, there must be some miscommunication between the wpa_supplicant and the Linux kernel. My wpa configuration is as follows:

network={
ssid=something
proto=WPA2
key_mgmt=WPA-PSK
pairwise=TKIP
group=TKIP
psk=something
wpa_ptk_rekey=600
}

But I don't think it's a problem since previous versions of Linux worked! Even Linux-2.6.3x. Someone said it's a problem with acer_wmi, but I find it is of no harm to the wifi connection for Linux-3.0.7 (just the minor problem of disconnection, it resumes very fast though by itself).

Comment by Liao Haohui (liaohaohui) - Thursday, 17 November 2011, 14:48 GMT

I think the major problem for none kernel developer is that we don't know how to submit bugs to the upstream. Even http://www.kernel.org/pub/linux/docs/lkml/reporting-bugs.html is not restored (is this due to the kernel.org being hacked a few months ago?). For the present, I think archlinux may be the best place to collect related incidents so that we can make a clue of what happen during the changes from linux-3.0.x to linux-3.1.

Comment by Mariusz Libera (mar04) - Thursday, 17 November 2011, 17:32 GMT

@Liao Haohui: do you actually get "INFO: rcu_preempt_state detected stalls on CPUs/tasks:"
messages in your dmesg output? Because you're probably talking about some different issue.

@Jan Steffens: Sandy Bridge here

Comment by Michael (madmike_) - Thursday, 17 November 2011, 18:09 GMT

@jan no, i have got yonah/napa over here

Comment by ggarlic (ggarlic) - Saturday, 19 November 2011, 08:39 GMT

after upgraded to kernel 3.1.1, i got the same errors, just like mar04

my laptop CPU is T2050 yonah， not sandy bridge

update:
hi man, every time i go back home using network cable, the stalls and freeze problem is gone. in other words the problem only appears when i use wireless. my wireless card is Intel® PRO/Wireless 3945ABG

Comment by Liao Haohui (liaohaohui) - Saturday, 19 November 2011, 14:03 GMT

@Mariusz Libera: No "INFO: rcu_preempt_state detected stalls on CPUs/tasks:" from dmesg. So you are right. Sorry for my mistake.

Comment by Liao Haohui (liaohaohui) - Saturday, 19 November 2011, 14:33 GMT

I think the frequent disconnection of my wifi may be due to the rekeying with the following configuration:

wpa_ptk_rekey=600

However, I think Linux kernel 3.1 do have problem since it will hang the wifi connection.

Comment by Mariusz Libera (mar04) - Sunday, 20 November 2011, 22:57 GMT

Today I tried out kernel 3.2rc2 from AUR and it's mostly the same story.
It seems to be somehow network related but I'm not sure how.
Most of the time I'm using ethernet connection and my wireless is switched off,
so it's unlikely that a specific driver is broken.

One thing I can easily reproduce every time is that once stall messages start
to appear switching from X to tty always result in a lockup, from there only sysrq+reisub.
Other issues:
NetworkManager gets broken (at least gnome applet) - on/off doesn't work
and it always say I'm connected. Also I cannot quit firefox properly - next time
I run it it says it's already running. Suspend only locks screen.

I'm attaching one more log from 3.1.1 without vboxdrv and mei modules so it's not tainted.

Also found this:
https://lkml.org/lkml/2011/4/19/588
https://lkml.org/lkml/2011/8/2/415

last.log (76.6 KiB)

Comment by Francisco Pina (Stunts) - Tuesday, 22 November 2011, 13:12 GMT

I can confirm Mariusz Libera's comment. I have the same problem, and the same sympthoms.
I also get:
iwl4965 0000:0c:00.0: Error sending REPLY_LEDS_CMD: enqueue_hcmd failed: -5
spammed in /var/log/errors.log when the crash occurs.
Also, when the stalls happen I can't even do a simple "cat" on the logfiles. I have to restart to be able to even read the logs.
I am not on sandy bridge hardware. I have an old T7500 CPU.

Comment by Florian (wespe) - Tuesday, 22 November 2011, 15:25 GMT

Confirming Mariusz' and Francisco's comment on a Lenovo Thinkpad T410

* The behaviour is somehow strange. Firefox refuses to load new page, while ping on the command line is still working.
* Cannot logout/reboot from Gnome; System freezes on logout
* Switching to console using CTRL+ALT+F1 freezes the system.

Also confirming this statement:
"One thing I can easily reproduce every time is that once stall messages start
to appear switching from X to tty always result in a lockup,"

errors.log:
Nov 22 15:55:24 localhost kernel: [ 600.592028] INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by 3, t=126066 jiffies)
Nov 22 15:58:24 localhost kernel: [ 780.248711] INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by 1, t=180098 jiffies)
Nov 22 16:01:24 localhost kernel: [ 959.905340] INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by 1, t=234130 jiffies)
Nov 22 16:04:24 localhost kernel: [ 1139.561976] INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by 3, t=288162 jiffies)

Is there a workaround? Something like installing an old kernel version? I am definitely unproductive since this bug started to occur about one week ago.

lspci (2.6 KiB)

Comment by Francisco Pina (Stunts) - Tuesday, 22 November 2011, 17:09 GMT

I have found this:
http://www.kernel.org/doc/Documentation/RCU/stallwarn.txt
Seems relevant, especially to include in an upstream bug report.

Comment by Florian (wespe) - Thursday, 24 November 2011, 07:47 GMT

I didn't have a problem in the last 24 hours while WiFi was deactivated.

Comment by Francisco Pina (Stunts) - Thursday, 24 November 2011, 11:48 GMT

Upon further testing, I realized I can only reproduce this issue when both my wifi and wired connections are active.
This means that in order to avoid this, I need to use the hardware switch to turn off my wifi when using an ethernet cable.
Can anyone else confirm this?

Comment by Federico (fgm) - Thursday, 24 November 2011, 12:52 GMT

The same behaviour to me. I don't have any problem when mi wired is connected (the wifi is alive, but not connected).

Comment by Matus Komora (arabak) - Friday, 25 November 2011, 08:09 GMT

This happens to me, too. Every time I connect to WiFi at school (Eduroam - WPA2 Enterprise, PEAP, MSCHAP2), all apps communicating over the network freeze, one after another. For me downgrading kernel to 3.1.1-1 and acpid to 2.0.12-1 worked arround the problem, but this is not a permanent solution.
My notebook is Thinkpad Edge E320 1298-5VG. I've solved this problem also on another notebook (Thinkpad R61) the same way.

//EDIT

No, the downgrading wasnt the way, details in attached file.

kill_hang.log (19.5 KiB)

Comment by Florian (wespe) - Friday, 25 November 2011, 16:36 GMT

Upgraded to Kernel 3.1.2-1 and the problem seems to be gone. At least no crash/stall today.

Comment by Ondřej Konečný (andrew9888) - Friday, 25 November 2011, 17:32 GMT

I've noticed that the problem is occuring only on WiFi with WPA2 enterprise (the configuration arabak mentioned). There's no problem neither on my home WiFi (WPA2 PSK) nor on cable network (event with 802.1x set up exactly the same way as the WPA2 enterprise WiFi).

Comment by Mariusz Libera (mar04) - Friday, 25 November 2011, 17:48 GMT

Seems to be network related but not wifi specific, because as I said earlier
it happens with ethernet connection also. Are you guys using ipv6?
Random idea...

Comment by Ondřej Konečný (andrew9888) - Friday, 25 November 2011, 18:19 GMT

WiFi at school (the WPA2 enterprise): IPv6 only (native) - problems with stalls
WiFi at home: IPv4. - no problems
Cable network at school: IPv4 only (realized that I haven't setup IPv6) - no problems

Seems like it has something to do with IPv6.

Comment by Francisco Pina (Stunts) - Monday, 28 November 2011, 20:23 GMT

After further testing I have realized:
I can reproduce this bug on my netbook too (EeePC 901) which has a ralink wireless card, so it's defenetlie not intel related.
I can only reproduce this issue when using my university's wireless network - eduroam.
The issues happen regardless of the connection being on IPv4 or IPv6.
Current settings of eduroam:
Security: WPA2 enterprise
Authentication: Tunnelled TLS
Inner authentication: MSCHAPv2
No CA certificate
I am using networkmanager to connect.
Is anyone experiencing this issue with different settings?

Comment by Ondřej Konečný (andrew9888) - Wednesday, 30 November 2011, 16:12 GMT

My configuration:
NetworkManager
Security: WPA2 enterprise
Authentication: PEAP (PEAP version: auto)
Inner authentication: MSCHAPv2
CA Certificate

Comment by cha5on (cha5on) - Thursday, 01 December 2011, 02:45 GMT

I am also experiencing the same error. I can replicate it inconsistently by disconnecting and reconnecting to the wireless network (the connection attempt is not always unsuccessful).

My hardware: Lenovo X120e. Processor is AMD E350 (Zacate) and the wifi card uses the rtl8192ce driver.

I've attached the output of dmesg from the few seconds before and after the error occurred (this is after resuming from suspend and trying to reconnect to the wifi).

dmesg.dump (14.2 KiB)

Comment by Ondřej Konečný (andrew9888) - Thursday, 01 December 2011, 07:23 GMT

Okay, today my computer hanged againg. Part od error.log attached. Plus I have tons of following lines in everything.log (hundreds per second):
Dec 1 07:10:18 ondra-laptop NetworkManager[1237]: <info> Activation (wlan0) Stage 4 of 5 (IP6 Configure Get) scheduled...
Dec 1 07:10:18 ondra-laptop NetworkManager[1237]: <info> Activation (wlan0) Stage 4 of 5 (IP6 Configure Get) started...
Dec 1 07:10:18 ondra-laptop NetworkManager[1237]: <info> Activation (wlan0) Stage 5 of 5 (IP Configure Commit) scheduled...
Dec 1 07:10:18 ondra-laptop NetworkManager[1237]: <info> Activation (wlan0) Stage 4 of 5 (IP6 Configure Get) complete.
Dec 1 07:10:18 ondra-laptop NetworkManager[1237]: <info> Activation (wlan0) Stage 5 of 5 (IP Configure Commit) started...

But I have blacklisted the ipv6 kernel module and so far so good. Even on eduroam (my school's wifi). So I am runnig IPv4 only now.

//EDIT
So I apologise for misinformation eduroam is running dual-stack IPv4 and IPv6 (not only IPv6 as I stated before).

crash.log (19.8 KiB)

Comment by Jan Alexander Steffens (heftig) - Thursday, 01 December 2011, 08:27 GMT

Has anyone tried 3.1.3 or 3.1.4 yet?

Comment by Florian (wespe) - Thursday, 01 December 2011, 08:31 GMT

Yes. Since I upgraded to 3.1.2 and then to 3.1.3 the problem disappeared for me.

-> No crash since Nov. 22nd

Configuration:
NetworkManager
WPA2 Enterprise
Tunneled TLS
PAP
Certificate

Comment by Federico (fgm) - Thursday, 01 December 2011, 13:07 GMT

With the Kernel 3.1.3 the problem still present, I will probe today with the Kernel 3.1.4

Comment by Ondřej Konečný (andrew9888) - Thursday, 01 December 2011, 20:34 GMT

My previous post was with kernel 3.1.3. I will try 3.1.4 tomorrow.

Comment by Ondřej Konečný (andrew9888) - Friday, 02 December 2011, 08:28 GMT

So I tried 3.1.4 and problems are still present.

Comment by Matthew William Cox (mwc) - Friday, 02 December 2011, 10:50 GMT

I can confirm this issue with 3.1.4. I have this issue on a desktop machine without wireless. (The wired network does not use any form of port authentication/encryption like 802.1x, although it does have working ipv6 which is still somewhat unusual).

Asus Sabertooh 990FX, UEFI boot, Firmware 813
Phenom II 1100T

This issue also appears network related, as I can generally trigger a cascade of them by starting Deluge (a bittorrent client). The issue seems to be more frequent when using the in-kernel r8169 module, blacklisting it and using r8168 from testing seems to be marginally more stable (maybe 40% less crashy).

This issue does not appear to be related to the GPU despite the X issues. The same issue appears with either a GTX 560 Ti and Nvidia's blob or an AMD Radeon 5770 with either Catalyst 11.11 or the radeon driver.

dmesg.log (495 KiB)

Comment by Andrey (seld) - Monday, 05 December 2011, 19:32 GMT

Hi guys!

I am on openSUSE, but as I pretty sure this
is upstream kernel bug, so I think this
information might be relevant.

On my system as on many other bug is triggered
by particular WiFi network. I *never* get
"rcu_preempt_state detected stalls on CPUs/tasks"
in my home WiFi network, but I *always*
get it in my university's WiFi network.

This problematic WiFi network does *not* have any encryption enabled,
but as far as I know some MAC-filtering and some roaming
features are enabled. Interesting thing is that
I see "rcu_preempt_state detected stalls on CPUs/tasks"
in /var/log/messages exactly every three minutes
when my notebook is in university's WiFi network.

So grep'ing for "rcu_preempt" gives this curious picture
(notice that message appears on exactly on 42th second):

12/05/11 04:36:42 PM Starnote kernel [ 311.988158] INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by 1, t=60002 jiffies)
12/05/11 04:39:42 PM Starnote kernel [ 492.020162] INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by 1, t=240034 jiffies)
12/05/11 04:42:42 PM Starnote kernel [ 672.052094] INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by 1, t=420066 jiffies)
12/05/11 04:45:42 PM Starnote kernel [ 852.084161] INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by 1, t=600098 jiffies)
12/05/11 04:48:42 PM Starnote kernel [ 1032.116160] INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by 1, t=780130 jiffies)
12/05/11 04:51:42 PM Starnote kernel [ 1212.148111] INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by 0, t=960162 jiffies)
12/05/11 04:54:42 PM Starnote kernel [ 1392.180148] INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by 1, t=1140194 jiffies)

I have no idea what does it mean, since most of the time between
this messages there is just silence in /var/log/messages.
Probably I should look for some other logs, not sure what exactly.

Just a wild guess: can it be somehow be connected with
some DHCP lease times or some other network timeout?
There has to be a reason why it happens
exactly every three minutes.

Not sure what information I should provide in this case,
so here is some information about my system:
- Kernel version: 3.1.4-1
- I'm using NetworkManager (within KDE)
- WiFi card is Intel 4965AGN
- IPv4 and IPv6 enabled
- When I try to switch to different virtual terminal (e.g. Alt+Ctrl+F1)
after getting the error and subsequently kernel taint system hangs
and responds in the best case only to "magic sysrq" combinations.

Did not try connecting with wired Ethernet
yet with new kernel, but will try that later too.

Comment by Mariusz Libera (mar04) - Wednesday, 07 December 2011, 16:42 GMT

Few days ago I installed Fedora on another partition and I've been running it since then.
It's using kernel 3.1.2 and so far no problems. Same hardware, same network connection,
same desktop environment and apps. Any chance this is Arch specific?

Comment by Thomas Bächler (brain0) - Wednesday, 07 December 2011, 16:50 GMT

Didn't someone mention above that this also happened on openSuSE. Can you run 'zcat /proc/config.gz | grep -v ^# | sort -u' on Arch and Fedora, then attach a diff of the result? (I hope that Fedora has /proc/config.gz enabled)

Comment by Mariusz Libera (mar04) - Wednesday, 07 December 2011, 17:25 GMT

It has no /proc/config.gz but it has config file in /boot directory.

f_a_diff (40.2 KiB)

Comment by Jan Alexander Steffens (heftig) - Monday, 12 December 2011, 07:22 GMT

Translated to an unified diff, just so it's easier to look at.

diff (80.8 KiB)

Comment by ruijiangli (ruijiangli) - Monday, 12 December 2011, 08:56 GMT

The same situation. I am using Dell Precision M4600 with Arch x64. Below is my experience.
1. In kernel 3.1.4. the bug still exists.
2. The bug is related to wireless, but not limited to a specific driver. With the same machine(M4600) I tried two adaptors, intel 6300(iwlagn) comes with the machine, and atheros AR5212(madwifi, ath5k). Both wireless cards will cause the bug when using Transmission.
3. The bug is not related to encryption(WPA2) since I have also experienced the bug with an unencrypted wireless network.
4. It's a kernel bug not specific to Arch. I have used fedora 16 live cd(32 bit) on the M4600 with intel 6300 card, the same situation.
5. It seems that wired network is not influenced. Everything is OK if I am not using the wireless network.
6. I have not experienced the bug when using fedora 16 on a Thinkpad T60 with intel 3945 adaptor.
7. My workround is to use KVM, bypassing the wireless adaptor to Winxp and taking the winxp in KVM as a gateway. Everything is fine with this approach.

Comment by Thomas Bächler (brain0) - Monday, 12 December 2011, 09:47 GMT

ruijiangli, these last observations are indeed very helpful and detailed. Sadly, the kernel.org bugzilla is still down, so there is nowhere to report it. Could one of the affected people post to the linux and linux-wireless mailing lists? There is really nothing we can do, except provide the information upstream.

Comment by Jan Alexander Steffens (heftig) - Monday, 12 December 2011, 13:18 GMT

Could this be tied to running virtualization software?

I'm running libvirt with QEMU/KVM myself, which sets up a virtual network (virbr0) and a NAT. Though I only sometimes run a VM.

Also, I set net.core.bpf_jit_enable=1 .

Comment by ruijiangli (ruijiangli) - Monday, 12 December 2011, 13:21 GMT

Hi Thomas, I will try to post to linux-wireless maillist.
I suggest you try the old kernel26-lts, which may be a good workround for you. A few hours ago I tried the old kernel, in a house which easily triggered the bug(hang, frequently disconnect, unable to associate to ap) on kernel 3.x. After about an hour's test with bittorrent and ftp transfer, I find that the system is fairly stable.
For my Atheros card, ath5k works flawlessly.
For my intel 6300 card, there are still some error messages, but not fatal. What is more, I did not lose connection even if the error message occurs.
The error message is more useful than that of kernel 3.x, as pasted below
----------------
[ 498.520520] ------------[ cut here ]------------
[ 498.520535] WARNING: at include/net/mac80211.h:2206 rate_control_send_low+0xc1/0xe0 [mac80211]()
[ 498.520537] Hardware name: Precision M4600
[ 498.520538] Modules linked in: ipt_MASQUERADE xt_state ipt_REJECT xt_tcpudp iptable_filter nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_proto_gre nf_nat_tftp nf_conntrack_tftp nf_nat_sip nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables iwlagn iwlcore aesni_intel cryptd aes_x86_64 aes_generic ipv6 nvidia(P) fuse uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 btusb bluetooth arc4 ecb snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep mac80211 serio_raw snd_pcm i2c_i801 snd_timer psmouse snd soundcore snd_page_alloc iTCO_wdt cfg80211 iTCO_vendor_support i2c_core led_class evdev dell_laptop rfkill ppdev pcspkr parport_pc parport dcdbas wmi video button output ac thermal battery cpufreq_powersave cpufreq_ondemand acpi_cpufreq freq_table processor ext4 mbcache jbd2 crc16 sd_mod ahci libata xhci ehci_hcd scsi_mod usbcore [last unloaded: ath]
[ 498.520574] Pid: 1813, comm: phy1 Tainted: P W 2.6.32.49-1-lts #1
[ 498.520575] Call Trace:
[ 498.520581] [<ffffffff8105f728>] warn_slowpath_common+0x78/0xb0
[ 498.520583] [<ffffffff8105f774>] warn_slowpath_null+0x14/0x20
[ 498.520587] [<ffffffffa022d311>] rate_control_send_low+0xc1/0xe0 [mac80211]
[ 498.520590] [<ffffffffa0ef459b>] rs_get_rate+0x6b/0x280 [iwlagn]
[ 498.520594] [<ffffffffa022d864>] rate_control_get_rate+0xc4/0xe0 [mac80211]
[ 498.520598] [<ffffffffa0233fe6>] invoke_tx_handlers+0x636/0xea0 [mac80211]
[ 498.520601] [<ffffffffa0234a78>] ? ieee80211_tx_prepare+0x178/0x3d0 [mac80211]
[ 498.520604] [<ffffffffa0235445>] ieee80211_tx+0x75/0x230 [mac80211]
[ 498.520608] [<ffffffffa0235727>] ieee80211_xmit+0x127/0x280 [mac80211]
[ 498.520611] [<ffffffffa02364b1>] ieee80211_tx_skb+0x61/0x70 [mac80211]
[ 498.520615] [<ffffffffa0238b2a>] ieee80211_send_probe_req+0x12a/0x180 [mac80211]
[ 498.520618] [<ffffffffa0229e30>] ? ieee80211_sta_work+0x0/0x1150 [mac80211]
[ 498.520621] [<ffffffffa0228480>] ieee80211_mgd_probe_ap_send+0x40/0x70 [mac80211]
[ 498.520624] [<ffffffffa0229e30>] ? ieee80211_sta_work+0x0/0x1150 [mac80211]
[ 498.520627] [<ffffffffa022ab12>] ieee80211_sta_work+0xce2/0x1150 [mac80211]
[ 498.520630] [<ffffffffa0229e30>] ? ieee80211_sta_work+0x0/0x1150 [mac80211]
[ 498.520633] [<ffffffff8107e99d>] worker_thread+0x14d/0x2a0
[ 498.520636] [<ffffffff810848f0>] ? autoremove_wake_function+0x0/0x40
[ 498.520638] [<ffffffff8107e850>] ? worker_thread+0x0/0x2a0
[ 498.520639] [<ffffffff81084188>] kthread+0x88/0x90
[ 498.520642] [<ffffffff8104d478>] ? finish_task_switch+0x48/0xd0
[ 498.520644] [<ffffffff810130aa>] child_rip+0xa/0x20
[ 498.520646] [<ffffffff81084100>] ? kthread+0x0/0x90
[ 498.520648] [<ffffffff810130a0>] ? child_rip+0x0/0x20
[ 498.520649] ---[ end trace 1e87ce0ddc8dcfab ]---

Comment by Tim (tes) - Monday, 12 December 2011, 13:40 GMT

My machine does not have wireless, but it still has this problem. The network does use IPv6 though. I've so far disabled ipv6 this morning and it hasn't crashed yet (but it sometimes manages a whole day without problems, so that doesn't say much).

I've had some trouble with NetworkManager and the handling of IPv6 router advertisements. Now, those advertisements are sent every 5 minutes, and the stalls happen every 3 minutes, so they are probably not related. I haven't looked into that further, because Wireshark stops working when the stalls start. But I AM wondering if anybody is experiencing this without using NetworkManager?

Also, I've noticed that the stalls start suspiciously often when I'm browsing Youtube. Might be because Youtube is on IPv6, or maybe Flash is triggering some bug in the IPv6 stack? At least that last option would explain why it doesn't show up very often.

(x64 on r8169)

Comment by Tim (tes) - Monday, 12 December 2011, 13:50 GMT

ruijiangli, are you also getting stalls? You seem to be experiencing some other bug (maybe it has the same cause though).

Here is the backtrace from my computer:

kernel: [ 4052.963054] INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by 1, t=72034 jiffies)
kernel: [ 4052.963068] sending NMI to all CPUs:
kernel: [ 4052.963081] NMI backtrace for cpu 0
kernel: [ 4052.963088] CPU 0
kernel: [ 4052.963093] Modules linked in: fuse rfcomm bnep cryptd aes_x86_64 aes_generic snd_hda_codec_realtek nvidia(P) snd_hda_intel snd_hda_codec ecb btusb snd_hwdep snd_pcm firewire_ohci bluetooth joydev snd_timer snd edac_core r8169 soundcore rfkill firewire_core snd_page_alloc psmouse i2c_piix4 serio_raw sp5100_tco mii wmi crc_itu_t edac_mce_amd k10temp evdev button pcspkr it87 cn adt7475 hwmon_vid i2c_core cpufreq_ondemand powernow_k8 freq_table processor mperf ipv6 autofs4 hid_microsoft usbhid hid ext4 mbcache jbd2 crc16 sr_mod cdrom sd_mod pata_acpi ohci_hcd ahci libahci pata_atiixp pata_jmicron libata ehci_hcd xhci_hcd scsi_mod usbcore
kernel: [ 4052.963200]
kernel: [ 4052.963206] Pid: 0, comm: swapper Tainted: P 3.1.4-1-ARCH #1 Gigabyte Technology Co., Ltd. GA-770TA-UD3/GA-770TA-UD3
kernel: [ 4052.963219] RIP: 0010:[<ffffffff8103ae5b>] [<ffffffff8103ae5b>] native_safe_halt+0xb/0x10
kernel: [ 4052.963237] RSP: 0018:ffffffff81801e58 EFLAGS: 00000246
kernel: [ 4052.963243] RAX: 0000000000000000 RBX: ffffffff81801ea4 RCX: 0000000000000020
kernel: [ 4052.963249] RDX: 0000000000000000 RSI: 0000000000000086 RDI: 0000000000000086
kernel: [ 4052.963254] RBP: ffffffff81801e58 R08: ffffffff818a5300 R09: 0000000000000000
kernel: [ 4052.963260] R10: 000000000336e391 R11: 0000000000000001 R12: ffffffff81934220
kernel: [ 4052.963265] R13: 0000000000000000 R14: ffffffffffffffff R15: 000000000008c000
kernel: [ 4052.963273] FS: 00007f97c5756880(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000
kernel: [ 4052.963279] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
kernel: [ 4052.963285] CR2: 00007ff0bd3be000 CR3: 0000000114366000 CR4: 00000000000006f0
kernel: [ 4052.963290] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: [ 4052.963296] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
kernel: [ 4052.963303] Process swapper (pid: 0, threadinfo ffffffff81800000, task ffffffff8189d020)
kernel: [ 4052.963308] Stack:
kernel: [ 4052.963313] ffffffff81801e88 ffffffff8101d313 ffffffff81801ea4 ffffffff81934220
kernel: [ 4052.963324] ffffffff81800000 ffffffffffffffff ffffffff81801eb8 ffffffff8101db3a
kernel: [ 4052.963334] ffffffff81934220 0000000081800000 ffffffff81801fd8 ffffffff81934220
kernel: [ 4052.963343] Call Trace:
kernel: [ 4052.963353] [<ffffffff8101d313>] default_idle+0x53/0x2a0
kernel: [ 4052.963362] [<ffffffff8101db3a>] amd_e400_idle+0x9a/0x120
kernel: [ 4052.963370] [<ffffffff81013236>] cpu_idle+0xd6/0x120
kernel: [ 4052.963380] [<ffffffff813e6912>] rest_init+0x96/0xa4
kernel: [ 4052.963389] [<ffffffff8194fc15>] start_kernel+0x3bf/0x3cc
kernel: [ 4052.963398] [<ffffffff8194f347>] x86_64_start_reservations+0x132/0x136
kernel: [ 4052.963407] [<ffffffff8194f140>] ? early_idt_handlers+0x140/0x140
kernel: [ 4052.963415] [<ffffffff8194f44d>] x86_64_start_kernel+0x102/0x111
kernel: [ 4052.963420] Code: 55 48 89 e5 66 66 66 66 90 fa 5d c3 0f 1f 40 00 55 48 89 e5 66 66 66 66 90 fb 5d c3 0f 1f 40 00 55 48 89 e5 66 66 66 66 90 fb f4 <5d> c3 0f 1f 00 55 48 89 e5 66 66 66 66 90 f4 5d c3 0f 1f 40 00
kernel: [ 4052.963490] Call Trace:
kernel: [ 4052.963497] [<ffffffff8101d313>] default_idle+0x53/0x2a0
kernel: [ 4052.963504] [<ffffffff8101db3a>] amd_e400_idle+0x9a/0x120
kernel: [ 4052.963512] [<ffffffff81013236>] cpu_idle+0xd6/0x120
kernel: [ 4052.963520] [<ffffffff813e6912>] rest_init+0x96/0xa4
kernel: [ 4052.963528] [<ffffffff8194fc15>] start_kernel+0x3bf/0x3cc
kernel: [ 4052.963537] [<ffffffff8194f347>] x86_64_start_reservations+0x132/0x136
kernel: [ 4052.963545] [<ffffffff8194f140>] ? early_idt_handlers+0x140/0x140
kernel: [ 4052.963553] [<ffffffff8194f44d>] x86_64_start_kernel+0x102/0x111
... repeated for other cpus ...

It also happens with nouveau loaded.

Comment by ruijiangli (ruijiangli) - Monday, 12 December 2011, 15:40 GMT

Hi Tim, the backtrace I pasted above is for kernel 2.6.32, as you see, no stalls appear, and the system is still functional after the backtrace.
But for kernel 3.x the things are different.

1. With wireless network, I am *always* getting stalls, and I don't think the backtraces appear after the stall message is useful. After stall, the system does not hang immediately, but I can no longer start new programs. Sometimes, the applications I already opened begin to behave strangely. Switching to console(ctrl+alt+Fx) or shutdown command will hang the system immediately.
2. I never get stalls if I am not using wireless network.
3. I have ipv6 enabled on both wired and wireless network, and for the wired network I never stalls. But I still can not exclude the ipv6 reason because I also get the ipv6 address when using wireless.

Comment by Jan Alexander Steffens (heftig) - Monday, 12 December 2011, 15:53 GMT

http://pkgbuild.com/~heftig/linux-zen/linux-zen-3.2.0rc5-1-x86_64.pkg.tar.xz

Here's a package of Linux 3.2.0rc5 to try (named linux-zen).

For me, it completely breaks WiFi (refuses to associate with my AP), but maybe other people have more luck.

PS:
What a bad time for our build server to vanish. :(

Here's a mirror of the package: http://paste.xinu.at/pur/

Comment by Zhtlancer (zhtlancer) - Tuesday, 13 December 2011, 07:15 GMT

I got confused with this problem on 3.1.x kernel as well, and I've tried to fall back to lts kernel, but the lts kernel in repo is too old and my wireless card(Dell 1390) is not properly supported. So I compiled a kernel of version 3.0.9 with some modifications on the PKGBUILD of package “linux”, and world is quiet then...
So maybe this could proof that this problem is introduced since 3.1.x? Or at least on Dell 1390 wireless card? Hope this could help~

Comment by Hi (raylz) - Wednesday, 14 December 2011, 08:20 GMT

Im having the same problem with Kernel 3.1, 3.0 works perfectly fine for me. This is my lspci
00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07)
00:01.0 PCI bridge: Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port (rev 07)
00:1a.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03)
00:1a.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03)
00:1a.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03)
00:1c.2 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 (rev 03)
00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 03)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 03)
00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93)
00:1f.0 ISA bridge: Intel Corporation ICH9M LPC Interface Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03)
01:00.0 VGA compatible controller: ATI Technologies Inc M92 [Mobility Radeon HD 4500/5100 Series]
01:00.1 Audio device: ATI Technologies Inc RV710/730 HDMI Audio [Radeon HD 4000 series]
02:00.0 System peripheral: JMicron Technology Corp. SD/MMC Host Controller
02:00.2 SD Host controller: JMicron Technology Corp. Standard SD Host Controller
02:00.3 System peripheral: JMicron Technology Corp. MS Host Controller
02:00.4 System peripheral: JMicron Technology Corp. xD Host Controller
05:00.0 Network controller: Intel Corporation Centrino Wireless-N 1000
08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)

After a while my WLAN connection gets really slow or halts completely. If I then try to change the network or use ifconfig wlan0 down, the system freezes and i have to kill the Laptop with my power button.

Comment by Catalin David (cdavid) - Wednesday, 14 December 2011, 15:20 GMT

Hey, guys!

Finally someone who understands my pain... I've been having this bug for about 2 months now (since I installed Arch) and had no idea what is happening.

As mentioned before, the lockups happen usually when connecting to my university Wireless infrastructure: eduroam and jacobs wireless networks (attached config from Network-Manager).

I am running this on my laptop, a Dell N5010 with a Broadcom BCM4313 wireless card, an i7 - 740qm (not Sandy Bridge), 8GB RAM.

I am running the following kernel:

Linux archidea 3.1.5-1-pae #1 SMP PREEMPT Tue Dec 13 11:15:08 EST 2011 i686 Intel(R) Core(TM) i7 CPU Q 740 @ 1.73GHz GenuineIntel GNU/Linux

but the freeze also happens in the non-PAE kernel.

If I can help any more, please let me know. I can trigger the bug whenever I am at my university.

Just a random thought, can this be a bug in NetworkManager? I haven't tried Wicd yet, but I have a friend who claims to have had the same bug and installed Wicd.

Catalin

cpuinfo (6.7 KiB)

kernel.log (32.2 KiB)

lsmod (4 KiB)

lspci (42.7 KiB)

eduroam (0.4 KiB)

jacobs (0.2 KiB)

Comment by Catalin David (cdavid) - Wednesday, 14 December 2011, 15:51 GMT

I've just switched to Wicd and I've been connected to eduroam wireless for more than 30 minutes and no crash.

The configuration is identical to the one from NetworkManager, connected to the same AP.

So, switching from NetworkManager to Wicd seems to be a workaround.

Catalin

Comment by Ondřej Konečný (andrew9888) - Wednesday, 14 December 2011, 15:54 GMT

I have blacklisted the ipv6 module and I've had no problems since then. My school's WiFi (eduroam) is running dual stack, so for now I use only IPv4. This (blacklisting ipv6) could be a workaround for people who don't insist on IPv6 connectivity.

Comment by Hi (raylz) - Wednesday, 14 December 2011, 16:04 GMT

This is a different issue, I've had this issue too, nm took ages to connect until I deactivate ipv6. The stated prooblem happens in spite of the fix for nm

Comment by Jan Alexander Steffens (heftig) - Thursday, 15 December 2011, 04:05 GMT

Running a patched linux 3.2.0rc5 right now, and I haven't run into the problem yet.

Attached the patch I used to fix the association. It's a combination of two patches from the linux-wireless list.

test.patch (1.5 KiB)

Comment by Andrey (seld) - Thursday, 15 December 2011, 13:00 GMT

Confirming that on Kernel 3.1.4 turning off IPv6 works on my system (I'm using WiFi):

) no stalls
) no kernel taints
) Internet connection works

I tried with NetworkMangager and without it (using ifup).
Either variant works fine if IPv6 is turned off.

Comment by Jan Alexander Steffens (heftig) - Thursday, 15 December 2011, 23:10 GMT

Alright. I rebuilt it again with iwlwifi fixed. For i686, too. Could anyone confirm whether this kernel works fine?

http://pkgbuild.com/~heftig/linux-zen/linux-zen-3.2.0rc5-2-x86_64.pkg.tar.xz
http://pkgbuild.com/~heftig/linux-zen/linux-zen-3.2.0rc5-2-i686.pkg.tar.xz

Comment by Jonathan Hudson (stronnag) - Saturday, 17 December 2011, 08:45 GMT

Not for me (eeepc 901). With this kernel I get rcu_preempt detected stalls within 1 minute of boot (a record). Fortunately, the Ubuntu 11.10 kernel (3.0.0-15-generic) does not exhibit this behaviour, which is what I use with Arch userland.

Comment by Ondřej Konečný (andrew9888) - Sunday, 18 December 2011, 14:55 GMT

I've tried the kernel but no success. I haven't managed to compile the nvidia module. No nvidia -> no X -> no networkmanager -> couldn't try IPv6 connection at school.

Comment by Jan Alexander Steffens (heftig) - Tuesday, 20 December 2011, 10:21 GMT

Finally ran into the problem again using 3.2rc6. :(

Attached dmesg.

hang-3.2rc6 (455.8 KiB)

Comment by ruijiangli (ruijiangli) - Wednesday, 21 December 2011, 03:30 GMT

recently I tried the following settings, and here are the results.
1. Newest Arch 3.1.5 kernel with ipv6 disabled-------> no trouble.
2. Newest Arch 3.1.5 kernel with ipv6 enabled--------> stall as always.
3. Newest Fedora 16 kernel (in update repo, version 3.1.5-6.fc16.x86_64) with ipv6 enabled----->no trouble....
I remember that the initial fedora 16 release(livecd) brought stalls when wireless is enabled. So perhaps Fedora has fixed the bug during these days, but the patches are not merged to mainline kernel.

Comment by Tim (tes) - Wednesday, 21 December 2011, 14:14 GMT

Just had a few stalls on 3.1.5 and after that Firefox refused to open. As a test, I killed NetworkManager, and surprise, everything immediately started working again and the stalls stopped.

Comment by Mariusz Libera (mar04) - Thursday, 05 January 2012, 12:15 GMT

Just wanted to confirm ruijiangli observations with kernel 3.1.6.
Disabling IPv6 fixes this issue, on Fedora IPv6 doesn't cause problems.

Comment by Tom Gundersen (tomegun) - Tuesday, 17 January 2012, 23:47 GMT

Another data point: http://paste.xinu.at/BiNzt/

@ruijiangli: do you happen to know what version of the fedora kernel was the broken one? The only commit that I found which might be related seems to be to make ipv6 built-in rather than a module: http://lists.fedoraproject.org/pipermail/kernel/2011-June/003105.html, which was done in 3.0.

Comment by ruijiangli (ruijiangli) - Wednesday, 18 January 2012, 01:53 GMT

@tomegun: I checked the fedora 16 livecd, the kernel version is 3.1.0-7.fc16.

However, I am not sure the bug is directly associated with ipv6. Perhaps the bug still exists in the upstream kernel now, the reason we get stalls in Arch may be that, combined with the specific kernel configuration from Fedora, the bug happens to be masked, whereas combined with the kernel configuration of Arch, it does not.

Comment by Mihai Coman (z0id) - Tuesday, 24 January 2012, 17:26 GMT

Hello. I also have this issue when using IPv6 (dual stack). I can't recall my laptop locking up when I'm not using IPv6. I'm using kernel 3.2.1-2.

Comment by Matthew William Cox (mwc) - Monday, 30 January 2012, 21:25 GMT

I previously had this issue on a desktop (no wireless) running dual-stack ipv4 and ipv6 under kernel 3.1. Since upgrading to 3.2 the issue hasn't appeared.

Comment by Carl Reinke (Mindless) - Tuesday, 31 January 2012, 15:16 GMT

Just tested linux 3.2.2-1, and it does exhibit this issue on my machine.

Comment by Philip Crump (craag) - Thursday, 09 February 2012, 09:49 GMT

Running linux 3.2.5, NetworkManager, and when connected to dual-stack eduroam (University 802.1x) I am getting these problems.

It seems to be aggravated by bulk downloads from IPv6 addresses, causing task stalls followed by complete kernel freeze requiring hard reboot, within 20 minutes of power-on.

File: relevant dmesg output just before complete freeze, boot messages have been taken off.

dmesg.txt (21 KiB)

Comment by Jan Alexander Steffens (heftig) - Thursday, 09 February 2012, 12:31 GMT

Does it still happen when you set NM to "Ignore" IPv6 instead of "Automatic"?

Comment by Philip Crump (craag) - Thursday, 09 February 2012, 13:13 GMT

> Does it still happen when you set NM to "Ignore" IPv6 instead of "Automatic"?

No, with IPv6 set to Ignore it was stable for an hour (then I had to go), with no relevant kernel messages. So the problem is only when IPv6 is set to 'Automatic' and the wireless network gives it a v4 and a v6 address.

Comment by Miha Verlic (fluke571) - Thursday, 09 February 2012, 14:26 GMT

I've encounted the same problem. Unless I set IPv6 in NetworkManager to Ignore, I'm getting random crashes and some of the network related processes are stalled (unmounting remote filesystem for example). Killing NetworkManager helps though and it even unlocks these processes.

The funny thing is - even if I set IPv6 to Ignore in NM, interface will still receive IPv6 address (via radvd) and system works without problems. Lockups occour only when IPv6 setting in NM is set to IPv6/Automatic.

Comment by Philip Crump (craag) - Thursday, 09 February 2012, 14:32 GMT

Ok, well, I have IPv6 set to 'Ignore' in NM, and yet ifconfig gives:

> wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 metric 1
> inet 152.78.163.209 netmask 255.255.254.0 broadcast 152.78.163.255
> inet6 fec0::b:762f:68ff:fe2c:5d3 prefixlen 64 scopeid 0x40<site>
> inet6 2002:984e:cc34:b:762f:68ff:fe2c:5d3 prefixlen 64 scopeid 0x0<global>

However, there is no IPv6 route. Accessing IPv6 sites does not work for me like this.

Comment by Jan Alexander Steffens (heftig) - Thursday, 09 February 2012, 16:05 GMT

Setting NM to "Ignore" will leave IPv6 to the kernel, which does understand autoconfiguration and should set up a default route.

When NM is set to "Automatic", IPv6 will be handled by NM instead. Having NM handle IPv6 is probably the proper way. Together with dhclient, it also understands DHCPv6 and takes other networks (e.g. VPNs) into account when setting up the route table.

NM talks to the kernel via netlink. Maybe the issue is here? Also check syslog/journal for messages from NM.

TODO: Check if NetworkManager with libnl3 is any better.

Comment by Ondřej Konečný (andrew9888) - Wednesday, 15 February 2012, 12:14 GMT

So today I've tried to connect at school again (as the holidays came to an end). The problem is still there. Even killing NetworkManager did not help (although somebody stated before that it helped). After I had rebooted and disabled IPv6 in NetworkManager the computer ran without any problem.

Versions of the packages (64 bit distribution):
linux-3.2.5-1
networkmanager 0.9.2.0-1
kdeplasma-applets-networkmanagement 1:0.9.0rc4-1

Comment by Zhtlancer (zhtlancer) - Thursday, 16 February 2012, 04:56 GMT

Today the linux-lts package upgrades to 3.0.21-1, and I've encountered this problem during use... Is there anyone else who has met this situation?
Maybe this problem has been 'adopted' by the changes between 3.0.20 & 3.0.21, this should be a clue for those who are working on this problem.

Comment by Anderson Medeiros Gomes (amg1127) - Thursday, 16 February 2012, 07:53 GMT

I think Kernel Bug Tracker system is working now. I opened a bug report there, which follows this Arch Linux task.

https://bugzilla.kernel.org/show_bug.cgi?id=42780

Comment by Jan Alexander Steffens (heftig) - Tuesday, 27 March 2012, 19:56 GMT

I've built my current zen kernel with voluntary preemption, please test if this makes the problem go away.

Packages at http://pkgbuild.com/~heftig/linux-zen/

Comment by Frederic Bezies (fredbezies) - Thursday, 29 March 2012, 09:51 GMT

Could this fix be an answer ? http://patchwork.ozlabs.org/patch/149020/

Comment by Carl Lei (XeCycle) - Tuesday, 03 April 2012, 10:07 GMT

@fredbezies I backported the patch, now my system is up 24 min without problems. Before I would have reproduced the problem in 10 minutes or so, looks like a fix.

Comment by Frederic Bezies (fredbezies) - Tuesday, 03 April 2012, 12:15 GMT

Well linux 3.3.1 is out : https://lkml.org/lkml/2012/4/2/342

Eric Dumazet (3):
net: bpf_jit: fix BPF_S_LDX_B_MSH compilation
net: fix napi_reuse_skb() skb reserve
net: fix a potential rcu_read_lock() imbalance in rt6_fill_node()

Last line for Eric Dumazet is this bug fixing. Can't wait to test linux 3.3.1

Comment by Carl Lei (XeCycle) - Tuesday, 03 April 2012, 13:21 GMT

I can confirm that, with the attached patch, which is modified from http://patchwork.ozlabs.org/patch/149020/ to match 3.2.13, works for me. Since I can reproduce the bug on linux-lts, I recommend applying the patch to linux-lts also.

ipv6-lock.patch (0.3 KiB)

Comment by Lukas Jirkovsky (6xx) - Friday, 06 April 2012, 17:34 GMT

I can confirm it has been fixed for me by linux 3.3.1 and 3.2.14.

	Tasks related to this task (1)
	~~FS#27524 - [kernel3] Softlockup using WPA2 Enterprise~~

Duplicate tasks of this task (3)
~~FS#27457 - System Hangs with Wifi (BCM4313) after upgrade (probably upower)~~
~~FS#27524 - [kernel3] Softlockup using WPA2 Enterprise~~
~~FS#29014 - Kernel crash when disconnected from university wifi~~

Arch Linux

FS#26847 - [linux] lots of stalls and finaly freeze

Details

Loading...