Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
FS#23052 - [glibc] getaddrinfo() support for IPv6 DNS servers is broken (huge latency)
Attached to Project:
Arch Linux
Opened by Andrej Podzimek (andrej) - Friday, 25 February 2011, 22:40 GMT
Last edited by Allan McRae (Allan) - Monday, 11 April 2011, 04:13 GMT
Opened by Andrej Podzimek (andrej) - Friday, 25 February 2011, 22:40 GMT
Last edited by Allan McRae (Allan) - Monday, 11 April 2011, 04:13 GMT
|
DetailsDescription:
IPv6-only DNS servers (and IPv6-only networks) are currently almost unusable with all programs that rely on the getaddrinfo() functionality. There are huge latencies of >5 seconds with an IPv6 DNS server. Interesting facts: * dig works *perfectly* in exactly the same environment, there is no latency * there is no latency when observing a test application with strace, but a latency always occurs without strace (!) * there is no latency with an IPv4 DNS server The fact that strace removes the latency is really *surprising*, to say the least. You can use the attached gaitest.c snippet to observe this. A couple of examples can be found below. Unfortunately, this makes it impossible to diagnose the issue with strace. :-( Additional info: * package version(s) glibc 2.13-4 (The issue has existed for >3 months, AFAIK, so the exact glibc version may not matter.) * config and/or log files etc. These are the odd latencies that disappear with strace: $ time ./gaitest ipv6.google.com 2a00:1450:8004::93 real 0m5.037s user 0m0.003s sys 0m0.000s $ time strace ./gaitest ipv6.google.com 2>/dev/null 2a00:1450:8004::93 real 0m0.011s user 0m0.007s sys 0m0.000s Please note that both of these results are 100% reproducible, so they are not related to any caches. And as already mentioned, this only occurs when an IPv6 DNS server is configured. DNS over IPv4 does not have this issue. My configuration files follow. $ cat /etc/resolv.conf nameserver 2002:****:****:1::1 $ cat /etc/host.conf order hosts,bind multi on $ cat /etc/nsswitch.conf passwd: files group: files shadow: files publickey: files hosts: files dns networks: files protocols: files services: files ethers: files rpc: files netgroup: files $ cat /etc/hosts ::1 localhost charonng 127.0.0.1 localhost charonng $ cat /etc/gai.conf # This file is empty. I experimented with the default file and with multiple modifications thereof, but both issues are still the same. Steps to reproduce: 1) Set your /etc/resolv.conf to use an IPv6 DNS server. 2) Try to resolve an address using getaddrinfo(). (You will see huge latencies in Firefox, for instance.) |
This task depends upon
Closed by Allan McRae (Allan)
Monday, 11 April 2011, 04:13 GMT
Reason for closing: No response
Additional comments about closing: Requires someone with the correct setup to do the git bisect. Request this to be reopened once that is done.
Monday, 11 April 2011, 04:13 GMT
Reason for closing: No response
Additional comments about closing: Requires someone with the correct setup to do the git bisect. Request this to be reopened once that is done.
FS#20470FS#20470.In my case, the DNS server is on the same hardware switch as the machines that generate queries. There is no intermediate DNS proxy. The machine with the DNS server has not been updated for months and it had always worked fine, without observable latencies, before this issue emerged. So this is probably not a server-side issue. Clients used to work just fine a couple of weeks (months?) ago, but then users started to observe these huge latencies. The problem has not been reported immediately, since everybody thought it was just a *temporary* issue on the network or the like.
There are also other major differences. In this case,
1) the latency always takes almost exactly 5 seconds, there are no repeated queries.
2) it does *not* matter whether glibc requests A and/or AAAA records. The latency is still the same.
3) it does matter how the DNS communication is transported. DNS over IPv6 causes the delay, whereas DNS over IPv4 works fine.
4) using strace removes the DNS over IPv6 latency (which is probably the most surprising fact).
Anyway, given this used to work, you can git bisect the issue and find the upstream change that causes it. My guess is this was the glibc-2.12 to 2.13 update so that should give you starting points. I can not take this bug much further without that being done.
And yes, there might be some similarity to
FS#20470... at least guessing by the wireshark output. getaddrinfo() really generates an obsolete second query when used *without* strace. getaddrinfo() under strace does not do this. Opera generates quite a lot of queries, but there are no problems with delays.BTW, how can an application find out that it is strace'd? I thought this should not be possible, at least for a non-root program. But obviously, my gaitest program does this...