FS#28887 - [netcfg] unable to configure IPv6 adresses due to duplicate address detection

Attached to Project: Arch Linux
Opened by Steve Caligo (scaligo) - Tuesday, 13 March 2012, 07:33 GMT
Last edited by Jouke Witteveen (jouke) - Thursday, 17 May 2012, 23:18 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Jouke Witteveen (jouke)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No



In some IPv6 configurations, netcfg isn't able to bring up an interface:
RTNETLINK answers: Invalid argument

This behaviour is caused by the script not waiting for the IPv6 addresses being ready before proceeding. Adding a POST_UP='sleep <n>' doesn't work in this case.

Additional info:
* package version(s)

core/netcfg 2.6.8-1

* config and/or log files etc.

'addr add 2001:0db8::1/64'
'addr add 2001:0db8::2/64'
'addr add 2001:0db8::3/64'
'2001:0db8:1000::/64 via 2001:0db8::ffff src 2001:0db8::1'

As the addresses are still in "tentative" state when netcfg adds the routes, the above RTNETLINK error is raised, because duplicate address detection (DAD) isn't done yet:

inet6 2001:0db8::1/64 scope global tentative
valid_lft forever preferred_lft forever
inet6 2001:0db8::2/64 scope global tentative
valid_lft forever preferred_lft forever
inet6 2001:0db8::3/64 scope global tentative
valid_lft forever preferred_lft forever

The attached patch recycles the TIMEOUT variable to wait for the addresses to leave the DAD state before continuing with the routing setup. Note that this also fixes some daemons that "forget" to bind an IPv6 on startup, such as sshd or named.
This task depends upon

Closed by  Jouke Witteveen (jouke)
Thursday, 17 May 2012, 23:18 GMT
Reason for closing:  Fixed
Additional comments about closing:  2.8.3 (0d3e5)
Comment by Ryan Egesdahl (deriamis) - Monday, 09 April 2012, 07:48 GMT
I can confirm the problem exists, but the problem is actually worse than described. If netcfg gets an error during interface configuration, it will bring the interface back down, which can leave remote hosts completely inaccessible. Also, the problem doesn't just exist with ROUTES6 - any routing configuration in GATEWAY6, POST_UP, or PRE_DOWN will cause the same issue.

The above patch applies cleanly to a just-updated system and fixed the problem for me.
Comment by Jouke Witteveen (jouke) - Tuesday, 15 May 2012, 12:01 GMT
I need more information to tackle this, as I cannot reproduce the issue.

000) As far as I can see, the OP's ADDR6 array should just list addresses and not prefix each entry with 'addr add'.
001) In its current form, the patch should use and document a DAD_TIMEOUT variable for clarity (just TIMEOUT is way too generic in this case).
011) The `ip` command in the patch should probably include a device specification (we don't want to wait on others).
010) If it is the route adding that causes problems, shouldn't we wait for DAD to finish only in case of adding routes? (a POST_UP='sleep <n>' should work in other cases).
110) Would this problem be gone when the kernel has optimistic dad?
111) Should we fail if we end up with dadfailed addresses in `ip addr`.
101) What causes dad? If it is only SLAAC, we should only wait in that case and we might be better off modifying bring_interface.
100) Can we benefit from using ifcfg instead of calling ip directly?

I have numbered my concerns in gray code (personal favorite) so that there is some fun to this nasty stuff after all.
Comment by Steve Caligo (scaligo) - Tuesday, 15 May 2012, 18:16 GMT
000) Now that you're saying it... it should say IFCFG=() instead of ADDR6=(). But even for ADDR6, you have to add the netmask: otherwise you'll end up with an /128, which is hardly what you want.

Here's a new configuration, I played around with in on netcfg-2.8.2. Note that I'm using both IFCFG and ROUTES6 blocks in order to defer the routing part. Of course, it'd work with an ADDR6 as well:

'addr add 2001:0db8::1/64 dev eth0'
'addr add 2001:0db8::2/64 dev eth0'
'addr add 2001:0db8::3/64 dev eth0'
'2001:0db8:1000::/64 via 2001:0db8::ffff src 2001:0db8::1'

001) Sure.
011) Added.
010) It is not only routing. Any daemon trying to bind to an IP address still in "tentative" state will fail. Popular ones are sshd and named.
110) No, I don't think so. I tried enabling that one in my very first tests and it didn't help.
111) If by "fail" you mean "removing all the adresses", that doesn't sound like a good idea to me. I'd be unable to reach my server and would have to beg for an out-of-band access to regain access to it.
101) I'm only running "static" and "dhcp". Both show a "tentative" state at some point.
100) For my part, I've never used ifcfg directly and it doesn't look very flexible to me: I may want to restrict the "scope" of an IP address for load-balancing purposes (assuming it's still valid on IPv6) or add some other fancy flag.

The attached patch also changes the way you handle the "$route" part in "ip route add", as it's preventing me from setting up my routing: I'm (ab)using it to add the "src" parameter. It'd of course be cleaner to do so in the IFCFG block, but at that point my addresses are still in DAD as you know, so I can't.

What about waiting for DAD to complete every time a "ip route" command fails and retrying that same command afterwards?
Comment by Jouke Witteveen (jouke) - Tuesday, 15 May 2012, 20:56 GMT
Thanks for the answers.

010) So within netcfg only routing is problamatic?
111) In your patch, we still fail at the first route that cannot be added. We should figure out what behavior we want in case of a time-out.
101) The 'dhcp' case is not covered by your patch.
100) Yeah, let's not use it :-).
1100) Do we need support for the 'nodad' flag in `ip addr`? It isn't a real solution as we would still mess up without it.

I'm no fan of uncommenting "$route", but there should indeed be a better place for this than POST_UP. Is there a reason the IPCFG block isn't directly above the '# Set hostname' block? Do we need to do stuff before IPv6 addressing that we rather do after IPv4 addressing?
Comment by Steve Caligo (scaligo) - Wednesday, 16 May 2012, 18:17 GMT
I moved the DAD stuff into a wait_for_dad() function and calling it in different places to catch some potential errors caused by tentative addresses. It shouldn't delay the boot sequence too much, especially if the IPCFG sequence is chosen carefully: I added a DAD call in case one of these lines fails and one at the very end, in case no custom routing is defined at all. Of course it's entirely superfluous in IPv4-only setups.

The "ip route" call is now untouched again, so ROUTES6 can only be used for straightforward cases and anything else has to be done through IPCFG. For this to work in all cases, the IPCFG block has to be called after the ADDR6 one, as you suggested.

010) I don't know for sure: in theory, any command that requires a confirmed IP address should be impacted (rule? xform?).
101) Now it is. Although static routes shouldn't be used with stateful DHCPv6, it doesn't hurt waiting for the addresses to be confirmed in order to avoid trouble with daemons trying to bind to a given IP address still in tentative state

As for the other ones... well:
111) Maybe tolerate and display errors when failing to add routes through ROUTES6 and nevertheless stop at first error in IPCFG?
1100) I quickly went through some RFCs and DAD seems only mandatory for autoconfiguration, so one could skip it for IP addresses that are defined statically by the sysadmin. After all, he's supposed to know what he's doing. Doing so in ADDR6 doesn't trigger my wait_for_dad() anymore.
Comment by Jouke Witteveen (jouke) - Wednesday, 16 May 2012, 18:24 GMT
I was playing with something similar this morning (attached). But didn't get things to my liking yet. I presume nodad cannot work with the current setup? That is: ADDR6=('<address> nodad') doesn't work. Is that correct? I wouldn't mind solving that as we go.
Comment by Steve Caligo (scaligo) - Wednesday, 16 May 2012, 18:42 GMT
No, that doesn't work. Both ADDR6 and ROUTES6 only support "simple" arguments, i.e. an address part and an optional netmask.
Comment by Jouke Witteveen (jouke) - Wednesday, 16 May 2012, 22:14 GMT
Yeah, ip doesn't take any quirks :-P.

I'm increasing impact a bit, so no patch this time. Could you test the attached file?
I) I moved routes around to match their intended (commit c8be1) use.
II) My fickle mind decided addresses and routes should not be quoted after all. (Yippee!)
III) IPCFG is treated after the common proceedings.
   ethernet (9.9 KiB)
Comment by Steve Caligo (scaligo) - Thursday, 17 May 2012, 05:57 GMT
Thanks for the changes, it works fine for me so far, in both DHCP and static setups.

A little warning might be useful thou:
Using only IPCFG (to add the IP addresses) and ROUTES6 (for routing) won't work anymore now, because ROUTES6 comes first. So people with such a setup will have to modify their configuration and switch to ADDR6/ROUTES6 instead. These changes are of course trivial.
Comment by Jouke Witteveen (jouke) - Thursday, 17 May 2012, 09:21 GMT
Nice catch! Thanks for all your help.