FS#41345 - [glibc] getaddrinfo() always prefers IPv4 over IPv6 on a dual-stack system, ignoring /etc/gai.conf

Attached to Project: Arch Linux
Opened by Andrej Podzimek (andrej) - Friday, 25 July 2014, 21:54 GMT
Last edited by Allan McRae (Allan) - Monday, 28 July 2014, 13:27 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Allan McRae (Allan)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:

Since one of the recent updates, getaddrinfo() prefers IPv4 instead of IPv6 on a dual stack system. All applications using getaddrinfo() are affected (psi-plus, Thunderbird, Chromium, OpenSSH, ...). You can inspect what getaddrinfo() returns by installing perl-socket-getaddrinfo from AUR, for example.

Additional info:
* package version(s)
The failing package version: glibc 2.19-5
The old and working package version: glibc 2.19-4

* config and/or log files etc.

As for config files, I tried the default /etc/gai.conf, no /etc/gai.conf at all, and also my tweaked /etc/gai.conf (that favors even 6to4 over IPv4). Nothing helps, i.e., IPv4 always wins. After installing perl-socket-getaddrinfo from AUR, you can easily have a look at this:

$ getaddrinfo www.google.com
Resolved host 'www.google.com', service '0'

socket(AF_INET , SOCK_STREAM, IPPROTO_TCP) + '74.125.232.49:0'
socket(AF_INET , SOCK_DGRAM , IPPROTO_UDP) + '74.125.232.49:0'
socket(AF_INET , SOCK_RAW , IPPROTO_IP ) + '74.125.232.49:0'
[...]
socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP) + '[2a00:1450:400d:802::1012]:0'
socket(AF_INET6, SOCK_DGRAM , IPPROTO_UDP) + '[2a00:1450:400d:802::1012]:0'
socket(AF_INET6, SOCK_RAW , IPPROTO_IP ) + '[2a00:1450:400d:802::1012]:0'

The expected result would be the other way round. On an old (not updated for >1 month) ArchLinux system with glibc 2.19-4, I get the correct and expected outcome (with *all* the /etc/gai.conf options mentioned above).

$ getaddrinfo www.google.com
Resolved host 'www.google.com', service '0'

socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP) + '[2a00:1450:4001:807::1013]:0'
socket(AF_INET6, SOCK_DGRAM , IPPROTO_UDP) + '[2a00:1450:4001:807::1013]:0'
socket(AF_INET6, SOCK_RAW , IPPROTO_IP ) + '[2a00:1450:4001:807::1013]:0'
[...]
socket(AF_INET , SOCK_STREAM, IPPROTO_TCP) + '173.194.112.240:0'
socket(AF_INET , SOCK_DGRAM , IPPROTO_UDP) + '173.194.112.240:0'
socket(AF_INET , SOCK_RAW , IPPROTO_IP ) + '173.194.112.240:0'

Steps to reproduce:

Try to connect to a dual stack server from a dual stack client. IPv4 will be preferred instead of IPv6.
This task depends upon

Closed by  Allan McRae (Allan)
Monday, 28 July 2014, 13:27 GMT
Reason for closing:  None
Additional comments about closing:  It seems that the bug is related to the strongswan AUR package rather than to glibc and getaddrinfo(). StrongSwan started setting 'preferred_lft 0' on IPv6 IPSec addresses (but not on IPv4), which confuses getaddrinfo() and causes it to prefer IPv4 (assuming there is no outbound IPv6 connectivity).
Comment by Andrej Podzimek (andrej) - Saturday, 26 July 2014, 00:49 GMT
Downgrading to glibc 2.19-4 doesn't help at all, which implies that something else got spoiled in the recent updates.
IPv6 is "de facto" disabled in ArchLinux at the moment.
Comment by Allan McRae (Allan) - Saturday, 26 July 2014, 01:48 GMT
Yes - nothing relevant has changed in glibc.
Comment by Andrej Podzimek (andrej) - Saturday, 26 July 2014, 07:16 GMT
Further experiments lead me to a weird result: Address selection works, at the first glance, just fine (i.e., as before) when the machine is connected to a 6to4 (or IPv6) network with an IPv6 address obtained from router advertisements. In these cases IPv6 is always preferred and 6to4 can be preferred over IPv4 if set in /etc/gai.conf.

However, whenever my machine's IPv6 addresses are obtained using IPSec (StrongSwan) (which is the most common case for my laptop -- it's a "road warrior" connecting to various IPv4 networks behind NAT), getaddrinfo() prefers native IPv4 addresses, though it used to prefer IPv6 in this case before. Surprisingly, when I use both an IPv6 address from IPSec and a 6to4 address from a local router, connections are made preferably from the 6to4 address rather than from the "native" IPv6 addresses configured by IPSec. (But getaddrinfo() does prefer IPv6 in this mixed case.)

This is really strange. Up to a certain point, getaddrinfo() would treat all IPv6 addresses equally, no matter how they were configured, but something must have changed recently, not necessarily in getaddrinfo().

There's a difference I have noticed: The slightly outdated machine where getaddrinfo() still works as expected doesn't have any "metric" for IPv4 routes when I look at them using 'ip route show table all'. The machine with the incorrect address ordering does have some "metric" set even for some of the IPv4 routes. Yet changing the metric manually seems to have no effect and I have no firm evidence that getaddrinfo() is actually using the metrics when it comes to ordering of results.

Anyway, getaddrinfo() behaves as if it could somehow determine that a machine's IPv6 connectivity leads through an IPSec tunnel (capable of 'default' routing) and prefers IPv4. (Yet IPv6 works perfectly fine with 'ssh -6' or when connecting to IPv6-only machines.) I can't see a way to tell getaddrinfo() that using the IPSec IPv6 tunnel by default was my *intention* and that it should just avoid IPv4 whenever possible. (In general, preferring the IPSec tunnel (for IPv4 and IPv6) has the additional benefit of preserving most TCP connections when switching between unrelated WiFi networks or WiFi and ethernet. Preferring native addresses automatically will break this.)

Anyway, I'm no longer convinced that this has to be a glibc bug... I have no idea what got wrong and why. Perhaps I should post all this as a forum question instead.
Comment by Andrej Podzimek (andrej) - Saturday, 26 July 2014, 07:33 GMT
Grrr... Some of my assumptions above may be wrong again. As a quick test, I added some IPv4 addresses to the IPSec tunnel. The IPSec tunnel is now preferred for making connections when possible, as dictated by the various source routing rules configured by StrongSwan. That's OK. But IPv4 is preferred over IPv6 all the time, even though IPv6 would work (checked using ssh -6 as well as IPv6-only servers) and IPSec configures a "real" IPv6 address, not just 6to4. Well, now I'm really puzzled. getaddrinfo() simply prefers IPv4 for no obvious reason and it does't seem to care whether a locally obtained address or an IPSec "source routing" address is available -- as long as there's IPv4 connectivity of any kind on the system, getaddrinfo() gives IPv4 results precedence. :-(
Comment by Allan McRae (Allan) - Saturday, 26 July 2014, 09:04 GMT
Was this working as expected previously? If so, can you identify the update that caused the issue?
Comment by Andrej Podzimek (andrej) - Saturday, 26 July 2014, 19:21 GMT
Yes, it worked previously.
On the other hand, I cannot identify the package that causes the problem. I have a working machine with glibc 2.19-4 where everything works exactly as expected. But downgrading to glibc 2.19-4 on my up-to-date machine doesn't help at all, the issue is still there. So it's probably not a glibc problem. I have no idea what could be causing this.

Edit: test-ipv6.com says "Your browser is avoiding IPv6.", confirming what I already observed. But it's not the browser's fault, it's just getaddrinfo() that got crazy.
Comment by Andrej Podzimek (andrej) - Monday, 28 July 2014, 13:09 GMT
For the record, there's a workaround: ip -6 addr change <IPSec IPv6 address>/128 dev <outbound device> preferred_lft forever

The problem is that some version of StrongSwan (5.2 and later, I' guess) started to configure 'preferred_lft 0' on the IPv6 IPSec tunnel addresses, yet IPv4 tunnel addresses still get 'preferred_lft forever' for some reason. This must be a bug.

The meaning of 'preferred_lft 0' is explained here: http://www.davidc.net/networking/ipv6-source-address-selection-linux In my case, the IPSec tunnel's IPv6 address is usually the machine's *only* usable IPv6 address, so avoiding it is simply not an option. It can and must serve as a source address.

Manually selecting 'preferred_lft forever' (as in the command above) gives a hint to getaddrinf() that there is indeed a usable IPv6 source address. Consequently, getaddrinfo() prefers IPv6, exactly as desired and as done before. With 'preferred_lft 0', getaddrinfo() assumes there is no outbound IPv6 connectivity and orders IPv4 addresses first.

OK, it seems that the mystery has been solved. I think I should report this to StrongSwan.
Comment by Allan McRae (Allan) - Monday, 28 July 2014, 13:27 GMT
Great - I'll close this.

Loading...