FS#12215 - Toolchain in [TESTING] causes name resolving failure

Attached to Project: Arch Linux
Opened by Emmanuel (bkk_drs) - Sunday, 23 November 2008, 10:28 GMT
Last edited by Jan de Groot (JGC) - Monday, 15 December 2008, 15:01 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Andreas Radke (AndyRTR)
Architecture i686
Severity High
Priority High
Reported Version None
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

Quoting Andreas sign-off email (not yet archived):

"There is one known potential problem with the name resolving which has
been extensively changed. Under some circumstance (always involving
broken servers) requests are not completely processed. Often a second
call fixes the problem. This might be fixed in the latest code, maybe
not. Unfortunately no reported has been able or willing to provide the
information needed to track down the problem."

That's what happens to me:

[eb@blackout mercurial]$ wget http://www.selenic.com/mercurial/release/mercurial-1.0.2.tar.gz
--2008-11-22 21:50:04-- http://www.selenic.com/mercurial/release/mercurial-1.0.2.tar.gz
Resolving www.selenic.com... failed: Name or service not known.
wget: unable to resolve host address `www.selenic.com'

Whereas:

[eb@blackout ~]$ wget http://204.152.191.37/pub/linux/kernel/v2.6/testing/patch-2.6.28-rc6.bz2
--2008-11-22 22:18:53-- http://204.152.191.37/pub/linux/kernel/v2.6/testing/patch-2.6.28-rc6.bz2
Connecting to 204.152.191.37:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9747602 (9.3M) [application/x-bzip2]
Saving to: `patch-2.6.28-rc6.bz2.2'

0% [ ] 77,783 16.4K/s eta 10m 12s
This task depends upon

Closed by  Jan de Groot (JGC)
Monday, 15 December 2008, 15:01 GMT
Reason for closing:  Fixed
Additional comments about closing:  Works fine now.
Comment by Jan de Groot (JGC) - Sunday, 23 November 2008, 10:51 GMT
This is caused by buggy nameservers that respond with NOTIMPL instead of NOERROR when asking for AAAA records. I have such a buggy nameserver at my work also in a Hotbrick LB2 loadbalancer. I "fixed" this problem by installing pdns-recursor, binding it to 127.0.0.1 and pointing resolv.conf to this.
Comment by Emmanuel (bkk_drs) - Sunday, 23 November 2008, 11:20 GMT
Thanks a lot Jan, it works now. However, I guess this will cause problems to a lot of users. I'll have a look at the upstream bug tracker as well...
Comment by Pierre Schmitz (Pierre) - Sunday, 23 November 2008, 14:45 GMT
Isn't AAAA just for ipv6? Maybe it would hep to just disable the ipv6 kernel module.
Comment by Jan de Groot (JGC) - Sunday, 23 November 2008, 15:02 GMT
Applications like firefox and ssh will resolve both ipv6 and ipv4 records at the same time. With glibc 2.8, this was sent in two requests, with 2.9 it just asks for both at the same time. Broken nameservers will return NOTIMPL in this case, glibc will discard both addresses and errors out. Disabling ipv6 will help in case your nameserver is broken (setting AddressFamily inet in ssh_config fixed my issue for ssh), but is hardly a solution.
It would be helpful to test what output dig will give with AAAA records. Just use "dig someaddress AAAA" and see what errorcode you get. A good nameserver will report NXDOMAIN in case the domain doesn't exist, or NOERROR in case the domain exists regardless of the existence of an AAAA record.
Comment by Smith Dhumbumroong (zodmaner) - Sunday, 23 November 2008, 17:13 GMT
I'm also experienced the name resolving problem with my ISP nameserver using the glibc from testing repository.

I'm currently using OpenDNS as a temporary fix, but I'm thinking of install pdns-recurosr as Jan suggested. Hope we could find a way to properly fix this problem.
Comment by Andreas Radke (AndyRTR) - Tuesday, 02 December 2008, 17:58 GMT
http://sources.redhat.com/bugzilla/show_bug.cgi?id=7060

if you think this is the upstream report for our issue please give Ulrich & Jakub the feedback how you like this change ;)
Comment by Jan de Groot (JGC) - Wednesday, 03 December 2008, 14:06 GMT
We could apply the patch to revert to 2.8 behaviour until this is fixed. This would allow us to move the new toolchain to core.
Comment by Andreas Radke (AndyRTR) - Sunday, 14 December 2008, 19:24 GMT
please all affected users test glibc 2.9-2 from testing.
Comment by Emmanuel (bkk_drs) - Monday, 15 December 2008, 14:55 GMT
glibc 2.9-2 solved the problem. thanks.