Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#28477 - [dhcpcd] connection issues with v5.5.4

Attached to Project: Arch Linux
Opened by John Stephens (uj-x52) - Friday, 17 February 2012, 03:24 GMT
Last edited by Allan McRae (Allan) - Saturday, 28 April 2012, 11:54 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Ronald van Haren (pressh)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description: Immediately following the upgrade to dhcpcd-5.5.4, I was unable to connect through my router to the internet using either a wired or wireless connection. Downgrading to dhcpcd-5.2.12 solved the issue.


Additional info:
* package version(s)
* config and/or log files etc.


Steps to reproduce:
This task depends upon

Closed by  Allan McRae (Allan)
Saturday, 28 April 2012, 11:54 GMT
Reason for closing:  Fixed
Additional comments about closing:  Comments indicate all is fixed.
Comment by Gaetan Bisson (vesath) - Friday, 17 February 2012, 05:35 GMT
That is very strange because it works for me; don't you think it would help to know what makes your setup specific?
Comment by Ionut Biru (wonder) - Friday, 17 February 2012, 06:39 GMT
can you check MTU?

paste "ip a" ouput when using dhcpcd-5.5.4
Comment by John Stephens (uj-x52) - Friday, 17 February 2012, 16:53 GMT
Attached is the results of ip-a as requested along with logs of when trying to connect using 5.5.4. Now that I look at them closer, it seems like it might be an issue with IPv-6. The router I'm using is a Linksys e3000. Any suggestions?
Comment by Médéric Boquien (mboquien) - Sunday, 19 February 2012, 18:29 GMT
I have the exact same problem. My modem/router also has IPv6 activated. Downgrading to the previous version fixes the problem.
Comment by Stefan Joosten (Ultraman) - Tuesday, 13 March 2012, 18:09 GMT
I seems to have the same problem. When I am on an IPv4 only router, dhcpcd works fine.

But at home I also have IPv6 and dhcpcd crashes when the router advertisement is noticed.
I executed dhcpcd from the command line to see what would happen and I got a memory corruption detected error. See the attached text file.

To ensure it's not my machine tossing bits, I ran memtest86+ for a couple of hours. It found no errors after 3 passes.
I also removed dhcpcd-5.5.4-1-x86_64.pkg.tar.xz from my pacman cache so it had to redownload the package, and I reinstalled it.

I haven't tested my desktop yet, so I will see if that needs upgrading dhcpcd and check out 5.5.4-1 on that machine. Wonder if it messes up as well.
Comment by Stefan Joosten (Ultraman) - Friday, 16 March 2012, 10:06 GMT
I tested my desktop as well now. This fails also when IPv6 is enabled.
If I revert to IPv4 only, it works fine.

dhcpcd[1725]: eth0: Router Advertisement from fe80::76ea:3aff:febe:2a1c
*** glibc detected *** dhcpcd: malloc(): memory corruption: 0x0000000001423130 ***

Downgrading back to dhcpd 5.2.12 "solves" the issue. I'm capable of using IPv6 using that one.

Any suggestions?
Comment by Ronald van Haren (pressh) - Friday, 16 March 2012, 10:09 GMT
Please file a bug report upstream
Comment by Stefan Joosten (Ultraman) - Friday, 16 March 2012, 10:14 GMT
Thanks.
I was just checking out the dhcpcd git repository. There are some IPv6 related commits in there after the 5.5.4 version tag.
So I will check that out and try to communicate with upstream if I have any issues.
Comment by Ronald van Haren (pressh) - Friday, 16 March 2012, 10:16 GMT
Okay, let me know how it turns out. If needed I could provide a git package.
Comment by Stefan Joosten (Ultraman) - Friday, 16 March 2012, 11:17 GMT
I compiled dhcpcd from git, currently at commit 8316b244cd9ea57008e1385dbd8d40732d380465, using just the configuration defaults and tested it.
Short result: dhcpcd from git works for me.
My eth0 gets a valid IPv4 and IPv6 address and I can reach IPv6 websites

See the attached "dhcpcd-git_works" file for a short log of my command line output during testing

If it is be possible to provide a package based on dhcpcd git, that would be splendid.
Comment by Ronald van Haren (pressh) - Sunday, 18 March 2012, 17:20 GMT
please test 5.5.4-2 in [testing]
Comment by Frederic Bezies (fredbezies) - Monday, 19 March 2012, 09:54 GMT
Segfaults every 10 to 15 minutes :(

Got this in my dmesg | tail log :

[ 1754.223614] dhcpcd[9160]: segfault at 7e80 ip 00007f32ecbe80cf sp 00007fff86cdcc30 error 4 in libc-2.15.so[7f32ecb6e000+197000]
Comment by Ronald van Haren (pressh) - Tuesday, 20 March 2012, 08:15 GMT
@Frederic: Don't think it is related. See my comment on  FS#29010 

@others: please test if THIS bug is fixed in 5.5.4-2
Comment by Roy Marples (rsmarples) - Tuesday, 20 March 2012, 09:58 GMT
There is one report over at Gentoo where dhcpcd can crash on IPv6 RA.
I don't recall an explicit fix for it. If this happens, I'll need a decent backtrace to fix it as it works perfectly for me on Linux i386 laptop against my IPv6 NetBSD router.
Comment by Frederic Bezies (fredbezies) - Tuesday, 20 March 2012, 12:19 GMT
Got a crash on a freshly reinstalled archlinux.

[ 1117.336076] dhcpcd[722]: segfault at 100cca0 ip 00007f0e725e60cf sp 00007fff03778660 error 4 in libc-2.15.so[7f0e7256c000+197000]

I just waited and it happened. Just tell me what you need to get a backtrace. Oh, and it could be a x86_64 only bug, as it works perfectly for you.

A log from another crash, don't know if it will be useful : http://pastebin.com/xsgQ02wD
Comment by Roy Marples (rsmarples) - Tuesday, 20 March 2012, 13:52 GMT
That's useless to me.
To get a backtrace, you need a core file the crash made. Normally it's /dhcpcd.core
sudo gdb /sbin/dhcpcd
core /dhcpcd.core
bt

I recently did some memory analysis on dhcpcd's new IPv6 RA handling and found two problems which are resolved here:
http://roy.marples.name/projects/dhcpcd/changeset/a6b8c6b39ce648d01382ed058b40d820fb847f7a

I strongly suspect it will fix that crash here.
Comment by Frederic Bezies (fredbezies) - Tuesday, 20 March 2012, 15:15 GMT
I know. Had a crash, but no core. So I have to wait until your new code is provided in a new package.
Comment by Ronald van Haren (pressh) - Tuesday, 20 March 2012, 15:59 GMT
I'll do a new git checkout and update the package tonight. That's all I can do until the weekend as I have to travel for work.
Comment by Ronald van Haren (pressh) - Thursday, 22 March 2012, 22:48 GMT
Maybe you care to tell us if your problem is solved now...?
Comment by Frederic Bezies (fredbezies) - Thursday, 22 March 2012, 23:05 GMT
Sorry for the late answer... Still spamming !

Mar 22 23:59:05 localhost kernel: [13056.052039] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=1747026 jiffies)
Mar 22 23:59:05 localhost kernel: [13056.052045] INFO: Stall ended before state dump start
Mar 23 00:02:05 localhost kernel: [13236.158676] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=1801058 jiffies)
Mar 23 00:02:05 localhost kernel: [13236.158682] INFO: Stall ended before state dump start

Have to deactivate networkmanager and use network daemon from archlinux.
Comment by Roy Marples (rsmarples) - Friday, 23 March 2012, 09:44 GMT
Sidney Amani found another issue since the last commit.
I've released dhcpcd-5.5.5 now. Hopefully that will put this issue to rest.
Comment by Frederic Bezies (fredbezies) - Friday, 23 March 2012, 16:17 GMT
Thanks for the info. Just waiting now for a rebuild :)
Comment by Ronald van Haren (pressh) - Friday, 23 March 2012, 16:30 GMT
5.5.5 is up in [testing] now. Please report back.
Comment by Frederic Bezies (fredbezies) - Friday, 23 March 2012, 17:22 GMT
Well...

After NM connections are enabled, got this in var/log/errors.log

Mar 23 18:18:24 localhost NetworkManager[2863]: <error> [1332523104.686380] [nm-system.c:1061] nm_system_replace_default_ip6_route(): (eth0): failed to set IPv6 default route: -1

I will report any "INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=1801058 jiffies)" if they are somes.
Comment by Frederic Bezies (fredbezies) - Friday, 23 March 2012, 17:36 GMT
Still happening. Just launched a torrent grabbing...

Mar 23 18:29:57 localhost kernel: [11537.358864] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 1, t=18002 jiffies)
Mar 23 18:29:57 localhost kernel: [11537.358879] INFO: Stall ended before state dump start
Mar 23 18:32:57 localhost kernel: [11717.468856] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 1, t=72035 jiffies)
Mar 23 18:32:57 localhost kernel: [11717.468870] INFO: Stall ended before state dump start

Happening when used with NetworkManager 0.9.2.0-3

Killing NM and replacing it with network tool of archlinux fix the bug.
Comment by Frederic Bezies (fredbezies) - Saturday, 24 March 2012, 19:14 GMT
Well, looks like my installation was "busted". Done another installation, and this time it looks like this spamming bug is dead.
Comment by Frederic Bezies (fredbezies) - Thursday, 29 March 2012, 09:10 GMT
Well, spamming bug back :(

Appeared when I launched a firefox development build process :

Mar 29 10:53:45 localhost kernel: [ 9359.196253] INFO: rcu_preempt detected stalls on CPUs/tasks: { P1327} (detected by 1, t=1044611 jiffies)
Mar 29 10:56:45 localhost kernel: [ 9539.303049] INFO: rcu_preempt detected stalls on CPUs/tasks: { P1327} (detected by 0, t=1098643 jiffies)
Mar 29 10:59:43 localhost kernel: [ 9719.410952] INFO: rcu_preempt detected stalls on CPUs/tasks: { P1327} (detected by 1, t=1152675 jiffies)
Mar 29 11:02:43 localhost kernel: [ 9899.519298] INFO: rcu_preempt detected stalls on CPUs/tasks: { P1327} (detected by 1, t=1206708 jiffies)
Mar 29 11:05:44 localhost kernel: [10079.625664] INFO: rcu_preempt detected stalls on CPUs/tasks: { P1327} (detected by 1, t=1260740 jiffies)
Comment by Roy Marples (rsmarples) - Thursday, 29 March 2012, 09:14 GMT
dhcpcd-5.5.6 was released yesterday with a few more fixes.

That INFO: rcu_preempt errors you have don't look dhcpcd related though, more like kernel ones.
I would check which process owns the PID reported and file a ticket against that.
Comment by Frederic Bezies (fredbezies) - Thursday, 29 March 2012, 09:17 GMT
Thanks for the info. Just have to wait now. But I want to add that dhcpcd 5.5.5 still segfaults with NM 0.9.40 :[

Will wait and report ;)

Got this :
Mar 29 11:15:44 localhost kernel: [10680.319022] INFO: task khubd:82 blocked for more than 120 seconds.
Mar 29 11:15:44 localhost kernel: [10680.319031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Comment by Ronald van Haren (pressh) - Thursday, 29 March 2012, 18:08 GMT
Moved 5.5.5 to core. 5.5.6 is now in [testing]. Happy testing!
Comment by Frederic Bezies (fredbezies) - Friday, 30 March 2012, 05:41 GMT
Thanks. So far so good, but with NM 0.9.2.0. I'm waiting for gnome-unstable to be fully upgraded and see if dhcpcd still crash or not.
Comment by Frederic Bezies (fredbezies) - Friday, 30 March 2012, 20:52 GMT
Guess what ? Just upgraded to gnome-unstable and NetworkManager 0.9.4.0 : crash ! Looks like NetworkManager is guilty here.
Comment by Ronald van Haren (pressh) - Monday, 02 April 2012, 06:16 GMT
Please report it upstream to the networkmanager devs.

I'm going to assume this bug is fixed. I'll close it in a few days if I don't hear otherwise.
Comment by Frederic Bezies (fredbezies) - Monday, 02 April 2012, 06:27 GMT
Well, it is still happening with dhcpcd 5.5.6. But I think it could be related to a bug in linux kernel and its IPv6 code. Which will be fixed in 3.3.1.

https://bugzilla.kernel.org/show_bug.cgi?id=42780 ; patch add been applied around march 28 on 3.3.x stable git.

So until 3.3.1, I dumped NetworkManager for now. So, could you please wait until linux 3.3.1 is released to close this bug ?

Thanks a lot.
Comment by Frederic Bezies (fredbezies) - Wednesday, 04 April 2012, 05:45 GMT
Well, dhcpcd works now... But every single time it crash now, NetworkManager is guilty.

Apr 4 07:34:29 localhost kernel: [ 2070.730261] ICMPv6 RA: ndisc_router_discovery() failed to add default route.

Bad NetworkManager !
Comment by Frederic Bezies (fredbezies) - Saturday, 28 April 2012, 11:20 GMT
Sorry for being late (holidays !). Since new version of networkmanager (backporting a fix for IPv6 support), dhcpcd does not segfault now.

Loading...