FS#28477 - [dhcpcd] connection issues with v5.5.4
Attached to Project:
Arch Linux
Opened by John Stephens (uj-x52) - Friday, 17 February 2012, 03:24 GMT
Last edited by Allan McRae (Allan) - Saturday, 28 April 2012, 11:54 GMT
Opened by John Stephens (uj-x52) - Friday, 17 February 2012, 03:24 GMT
Last edited by Allan McRae (Allan) - Saturday, 28 April 2012, 11:54 GMT
|
Details
Description: Immediately following the upgrade to
dhcpcd-5.5.4, I was unable to connect through my router to
the internet using either a wired or wireless connection.
Downgrading to dhcpcd-5.2.12 solved the issue.
Additional info: * package version(s) * config and/or log files etc. Steps to reproduce: |
This task depends upon
Closed by Allan McRae (Allan)
Saturday, 28 April 2012, 11:54 GMT
Reason for closing: Fixed
Additional comments about closing: Comments indicate all is fixed.
Saturday, 28 April 2012, 11:54 GMT
Reason for closing: Fixed
Additional comments about closing: Comments indicate all is fixed.
paste "ip a" ouput when using dhcpcd-5.5.4
But at home I also have IPv6 and dhcpcd crashes when the router advertisement is noticed.
I executed dhcpcd from the command line to see what would happen and I got a memory corruption detected error. See the attached text file.
To ensure it's not my machine tossing bits, I ran memtest86+ for a couple of hours. It found no errors after 3 passes.
I also removed dhcpcd-5.5.4-1-x86_64.pkg.tar.xz from my pacman cache so it had to redownload the package, and I reinstalled it.
I haven't tested my desktop yet, so I will see if that needs upgrading dhcpcd and check out 5.5.4-1 on that machine. Wonder if it messes up as well.
If I revert to IPv4 only, it works fine.
dhcpcd[1725]: eth0: Router Advertisement from fe80::76ea:3aff:febe:2a1c
*** glibc detected *** dhcpcd: malloc(): memory corruption: 0x0000000001423130 ***
Downgrading back to dhcpd 5.2.12 "solves" the issue. I'm capable of using IPv6 using that one.
Any suggestions?
I was just checking out the dhcpcd git repository. There are some IPv6 related commits in there after the 5.5.4 version tag.
So I will check that out and try to communicate with upstream if I have any issues.
Short result: dhcpcd from git works for me.
My eth0 gets a valid IPv4 and IPv6 address and I can reach IPv6 websites
See the attached "dhcpcd-git_works" file for a short log of my command line output during testing
If it is be possible to provide a package based on dhcpcd git, that would be splendid.
Got this in my dmesg | tail log :
[ 1754.223614] dhcpcd[9160]: segfault at 7e80 ip 00007f32ecbe80cf sp 00007fff86cdcc30 error 4 in libc-2.15.so[7f32ecb6e000+197000]
FS#29010@others: please test if THIS bug is fixed in 5.5.4-2
I don't recall an explicit fix for it. If this happens, I'll need a decent backtrace to fix it as it works perfectly for me on Linux i386 laptop against my IPv6 NetBSD router.
[ 1117.336076] dhcpcd[722]: segfault at 100cca0 ip 00007f0e725e60cf sp 00007fff03778660 error 4 in libc-2.15.so[7f0e7256c000+197000]
I just waited and it happened. Just tell me what you need to get a backtrace. Oh, and it could be a x86_64 only bug, as it works perfectly for you.
A log from another crash, don't know if it will be useful : http://pastebin.com/xsgQ02wD
To get a backtrace, you need a core file the crash made. Normally it's /dhcpcd.core
sudo gdb /sbin/dhcpcd
core /dhcpcd.core
bt
I recently did some memory analysis on dhcpcd's new IPv6 RA handling and found two problems which are resolved here:
http://roy.marples.name/projects/dhcpcd/changeset/a6b8c6b39ce648d01382ed058b40d820fb847f7a
I strongly suspect it will fix that crash here.
Mar 22 23:59:05 localhost kernel: [13056.052039] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=1747026 jiffies)
Mar 22 23:59:05 localhost kernel: [13056.052045] INFO: Stall ended before state dump start
Mar 23 00:02:05 localhost kernel: [13236.158676] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=1801058 jiffies)
Mar 23 00:02:05 localhost kernel: [13236.158682] INFO: Stall ended before state dump start
Have to deactivate networkmanager and use network daemon from archlinux.
I've released dhcpcd-5.5.5 now. Hopefully that will put this issue to rest.
After NM connections are enabled, got this in var/log/errors.log
Mar 23 18:18:24 localhost NetworkManager[2863]: <error> [1332523104.686380] [nm-system.c:1061] nm_system_replace_default_ip6_route(): (eth0): failed to set IPv6 default route: -1
I will report any "INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=1801058 jiffies)" if they are somes.
Mar 23 18:29:57 localhost kernel: [11537.358864] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 1, t=18002 jiffies)
Mar 23 18:29:57 localhost kernel: [11537.358879] INFO: Stall ended before state dump start
Mar 23 18:32:57 localhost kernel: [11717.468856] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 1, t=72035 jiffies)
Mar 23 18:32:57 localhost kernel: [11717.468870] INFO: Stall ended before state dump start
Happening when used with NetworkManager 0.9.2.0-3
Killing NM and replacing it with network tool of archlinux fix the bug.
Appeared when I launched a firefox development build process :
Mar 29 10:53:45 localhost kernel: [ 9359.196253] INFO: rcu_preempt detected stalls on CPUs/tasks: { P1327} (detected by 1, t=1044611 jiffies)
Mar 29 10:56:45 localhost kernel: [ 9539.303049] INFO: rcu_preempt detected stalls on CPUs/tasks: { P1327} (detected by 0, t=1098643 jiffies)
Mar 29 10:59:43 localhost kernel: [ 9719.410952] INFO: rcu_preempt detected stalls on CPUs/tasks: { P1327} (detected by 1, t=1152675 jiffies)
Mar 29 11:02:43 localhost kernel: [ 9899.519298] INFO: rcu_preempt detected stalls on CPUs/tasks: { P1327} (detected by 1, t=1206708 jiffies)
Mar 29 11:05:44 localhost kernel: [10079.625664] INFO: rcu_preempt detected stalls on CPUs/tasks: { P1327} (detected by 1, t=1260740 jiffies)
That INFO: rcu_preempt errors you have don't look dhcpcd related though, more like kernel ones.
I would check which process owns the PID reported and file a ticket against that.
Will wait and report ;)
Got this :
Mar 29 11:15:44 localhost kernel: [10680.319022] INFO: task khubd:82 blocked for more than 120 seconds.
Mar 29 11:15:44 localhost kernel: [10680.319031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
I'm going to assume this bug is fixed. I'll close it in a few days if I don't hear otherwise.
https://bugzilla.kernel.org/show_bug.cgi?id=42780 ; patch add been applied around march 28 on 3.3.x stable git.
So until 3.3.1, I dumped NetworkManager for now. So, could you please wait until linux 3.3.1 is released to close this bug ?
Thanks a lot.
Apr 4 07:34:29 localhost kernel: [ 2070.730261] ICMPv6 RA: ndisc_router_discovery() failed to add default route.
Bad NetworkManager !