FS#14678 - ssh and telnet fail to resolve hostnames with glibc 2.9-7

Attached to Project: Arch Linux
Opened by Malte Skoruppa (einheitlix) - Monday, 11 May 2009, 09:39 GMT
Last edited by Allan McRae (Allan) - Saturday, 16 May 2009, 04:09 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To No-one
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

I get the following strange problem when I try to ssh to any host when I use a hostname as argument instead of an IP:

root@bombadil $ ssh www.heim-d.de
ssh: Could not resolve hostname www.heim-d.de: Success

The same thing happens with telnet:

root@bombadil $ telnet www.heim-d.de
www.heim-d.de/telnet: lookup failure: Success

However, it works fine when I specify an IP directly (e.g. 134.96.56.96 which is the IP of www.heim-d.de).

On my system, hostname resolution does work fine: I can browse, indeed I am writing this post from the very computer and under the very configuration where this problem is occurring; nmap, ftp, ping, or other network clients I tried all work fine from the same shell where the problem occurs and are able to resolve hostnames without any problems.
It does not even matter if a host and its IP are specified in /etc/hosts: ssh and telnet fail to resolve it anyway, with the same error as above, while all other clients succeed.

I found out that both packages openssh (ssh) and inetutils (telnet) depend on the package tcp_wrappers.
tcp_wrappers in turn depends on glibc.
glibc seems to be at least involved in the problem, as I am going to explain below.


Additional info:

Under the following configuration (an up-to-date system as of now, in fact), the problem occurs with both ssh and telnet:

root@bombadil $ pacman -Qi glibc tcp_wrappers openssh inetutils | grep Ver
Version : 2.9-7
Version : 7.6-8
Version : 5.2p1-1
Version : 1.6-3

Downgrading glibc 2.9-7 to glibc 2.9-4 resolves the problem for me (I also have to downgrade binutils since it depends on glibc 2.9-7)

root@bombadil $ pacman -U glibc-2.9-4-x86_64.pkg.tar.gz binutils-2.19.1-1-x86_64.pkg.tar.gz
[snip]
root@bombadil $ pacman -Qi glibc tcp_wrappers openssh inetutils | grep Ver
Version : 2.9-4
Version : 7.6-8
Version : 5.2p1-1
Version : 1.6-3

Now both ssh and telnet work fine again, as they used to.

Additional info can also be found in the corresponding forum thread:
http://bbs.archlinux.org/viewtopic.php?pid=551286#p551286

Steps to reproduce:

To reproduce it, I only have to:
(1) use an up-to-date-system (pacman -Syu) with the above configuration (the one with glibc 2.9-7), and
(2) execute ssh or telnet with a hostname instead of an IP as argument

However, as can be seen from the forum thread, it seems that this does not happen for everybody under said configuration, so not everybody can reprdouce it.
Hence I guess that the problem originates somewhere else, and using glibc 2.9-4 just works around it somehow.
Unfortunately I don't have an idea how to corner this problem any further, but I would be willing to try many things and/or post logs, configurations etc., if you have any ideas.

Cheers,

Malte Skoruppa
This task depends upon

Closed by  Allan McRae (Allan)
Saturday, 16 May 2009, 04:09 GMT
Reason for closing:  Fixed
Additional comments about closing:  glibc-2.10.1-1
Comment by Gerardo Exequiel Pozzi (djgera) - Monday, 11 May 2009, 12:20 GMT
What is the output of this command?:
strace -e trace=open,connect,send,sendto,recvfrom -s 1024 telnet www.heim-d.de
Comment by Malte Skoruppa (einheitlix) - Monday, 11 May 2009, 23:23 GMT
Under the configuration where it fails, that is with glibc 2.9-7, the output is this:

open("/etc/ld.so.cache", O_RDONLY) = 3
open("/lib/libncursesw.so.5", O_RDONLY) = 3
open("/lib/libcrypt.so.1", O_RDONLY) = 3
open("/lib/libdl.so.2", O_RDONLY) = 3
open("/lib/libresolv.so.2", O_RDONLY) = 3
open("/lib/libnsl.so.1", O_RDONLY) = 3
open("/lib/libc.so.6", O_RDONLY) = 3
sendto(3, "\24\0\0\0\26\0\1\3\265\261\10J\0\0\0\0\0\0\0\0"..., 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"...}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"...}, 110) = -1 ENOENT (No such file or directory)
open("/etc/nsswitch.conf", O_RDONLY) = 3
open("/etc/ld.so.cache", O_RDONLY) = 3
open("/lib/libnss_nis.so.2", O_RDONLY) = 3
open("/lib/libnss_files.so.2", O_RDONLY) = 3
open("/var/yp/binding/ww.2", O_RDONLY) = 3
sendto(4, "WTnh\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2ww\0\0\0\0\0\26services.byservicename\0\0\0\0\0\ntelnet/tcp\0\0"..., 92, 0, {sa_family=AF_INET, sin_port=htons(664), sin_addr=inet_addr("134.96.240.71")}, 16) = 92
recvfrom(4, "WTnh\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\0\0\0\0"..., 8800, MSG_DONTWAIT, {sa_family=AF_INET, sin_port=htons(664), sin_addr=inet_addr("134.96.240.71")}, [16]) = 32
open("/etc/default/nss", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/var/yp/binding/ww.2", O_RDONLY) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(111), sin_addr=inet_addr("134.96.240.71")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(816), sin_addr=inet_addr("134.96.240.71")}, 16) = 0
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"...}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"...}, 110) = -1 ENOENT (No such file or directory)
open("/etc/host.conf", O_RDONLY) = 3
open("/etc/resolv.conf", O_RDONLY) = 3
open("/var/yp/binding/ww.2", O_RDONLY) = 3
sendto(4, "\17v\226\363\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2ww\0\0\0\0\0\fhosts.byname\0\0\0\rwww.heim-d.de\0\0\0"..., 84, 0, {sa_family=AF_INET, sin_port=htons(664), sin_addr=inet_addr("134.96.240.71")}, 16) = 84
recvfrom(4, "\17v\226\363\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0M134.96.56.96 d096.stw.stud.uni-saarland.de www.heim-d.de www.heim-d.uni-sb.de\0\0\0"..., 8800, MSG_DONTWAIT, {sa_family=AF_INET, sin_port=htons(664), sin_addr=inet_addr("134.96.240.71")}, [16]) = 112
www.heim-d.de/telnet: lookup failure: Success


Under the configuration with glibc 2.9-4 (where it works) the output is the following:


open("/etc/ld.so.cache", O_RDONLY) = 3
open("/lib/libncursesw.so.5", O_RDONLY) = 3
open("/lib/libcrypt.so.1", O_RDONLY) = 3
open("/lib/libdl.so.2", O_RDONLY) = 3
open("/lib/libresolv.so.2", O_RDONLY) = 3
open("/lib/libnsl.so.1", O_RDONLY) = 3
open("/lib/libc.so.6", O_RDONLY) = 3
sendto(3, "\24\0\0\0\26\0\1\3n\261\10J\0\0\0\0\0\0\0\0"..., 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"...}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"...}, 110) = -1 ENOENT (No such file or directory)
open("/etc/nsswitch.conf", O_RDONLY) = 3
open("/etc/ld.so.cache", O_RDONLY) = 3
open("/lib/libnss_nis.so.2", O_RDONLY) = 3
open("/lib/libnss_files.so.2", O_RDONLY) = 3
open("/var/yp/binding/ww.2", O_RDONLY) = 3
sendto(4, "\16\267\366\366\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2ww\0\0\0\0\0\26services.byservicename\0\0\0\0\0\ntelnet/tcp\0\0"..., 92, 0, {sa_family=AF_INET, sin_port=htons(664), sin_addr=inet_addr("134.96.240.71")}, 16) = 92
recvfrom(4, "\16\267\366\366\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\0\0\0\0"..., 8800, MSG_DONTWAIT, {sa_family=AF_INET, sin_port=htons(664), sin_addr=inet_addr("134.96.240.71")}, [16]) = 32
open("/etc/default/nss", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/var/yp/binding/ww.2", O_RDONLY) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(111), sin_addr=inet_addr("134.96.240.71")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(816), sin_addr=inet_addr("134.96.240.71")}, 16) = 0
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"...}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"...}, 110) = -1 ENOENT (No such file or directory)
open("/etc/host.conf", O_RDONLY) = 3
open("/etc/resolv.conf", O_RDONLY) = 3
open("/var/yp/binding/ww.2", O_RDONLY) = 3
sendto(4, "''Za\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2ww\0\0\0\0\0\fhosts.byname\0\0\0\rwww.heim-d.de\0\0\0"..., 84, 0, {sa_family=AF_INET, sin_port=htons(664), sin_addr=inet_addr("134.96.240.71")}, 16) = 84
recvfrom(4, "''Za\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0M134.96.56.96 d096.stw.stud.uni-saarland.de www.heim-d.de www.heim-d.uni-sb.de\0\0\0"..., 8800, MSG_DONTWAIT, {sa_family=AF_INET, sin_port=htons(664), sin_addr=inet_addr("134.96.240.71")}, [16]) = 112
connect(3, {sa_family=AF_INET, sin_port=htons(23), sin_addr=inet_addr("134.96.56.96")}, 16) = 0
open("/root/.telnetrc", O_RDONLY) = -1 ENOENT (No such file or directory)
Trying 134.96.56.96...
Connected to www.heim-d.de.
Escape character is '^]'.
sendto(3, "\377\375\3\377\373\30\377\373\37\377\373 \377\373!\377\373\"\377\373'\377\375\5"..., 24, 0, NULL, 0) = 24
recvfrom(3, "\377\375\30\377\375 \377\375#\377\375'"..., 8192, 0, NULL, NULL) = 12
sendto(3, "\377\374#"..., 3, 0, NULL, 0) = 3
recvfrom(3, "\377\373\3\377\375\37\377\375!\377\376\"\377\373\5\377\372 \1\377\360\377\372'\1\377\360\377\372\30\1\377\360"..., 8192, 0, NULL, NULL) = 33
open("/usr/share/terminfo/s/screen", O_RDONLY) = 4
sendto(3, "\377\372\37\0\215\0008\377\360\377\372 \00038400,38400\377\360\377\372'\0\377\360\377\372\30\0SCREEN\377\360"..., 44, 0, NULL, 0) = 44
recvfrom(3, "\377\375\1"..., 8192, 0, NULL, NULL) = 3
sendto(3, "\377\374\1"..., 3, 0, NULL, 0) = 3
recvfrom(3, "\377\373\1"..., 8192, 0, NULL, NULL) = 3
sendto(3, "\377\375\1"..., 3, 0, NULL, 0) = 3
recvfrom(3, "\r\nDebian GNU/Linux d096.stw.stud.uni-saarland.de\r\n\r\nstud: Verbindung zu stud.uni-sb.de\r\nirc: IRC-Chat im #heim-d\r\nmensa: Speiseplan von heute\r\nhdz: N\344chster Bus zum Haus der Zukunft\r\n\r\n"..., 8192, 0, NULL, NULL) = 190

Debian GNU/Linux d096.stw.stud.uni-saarland.de

stud: Verbindung zu stud.uni-sb.de
irc: IRC-Chat im #heim-d
mensa: Speiseplan von heute
hdz: N�chster Bus zum Haus der Zukunft

recvfrom(3, "d096 login: "..., 8192, 0, NULL, NULL) = 12
d096 login:
Comment by Gerardo Exequiel Pozzi (djgera) - Tuesday, 12 May 2009, 00:43 GMT
i see that are you running a ypbind and nis, i am correct ? So what happens if you disable it from /etc/nsswitch.conf ? and put nss with default values?

Rare, the hostname is resolved, but...
Comment by Gerardo Exequiel Pozzi (djgera) - Tuesday, 12 May 2009, 01:09 GMT
It is also quite rare, because there is no change in the source code of the glibc between versions -4 and -7 (in CVS snapshots), the only changes are some patches that are removed, and are unrelated to "network" and an adition of another patch to resolve a thread gdb issue...
Comment by Malte Skoruppa (einheitlix) - Tuesday, 12 May 2009, 15:37 GMT
Hi!

Your are perfectly right, I"m using ypbind and nis.

My /etc/nsswitch.conf has the following value for hosts:

hosts: nis files dns

Hence hostname resolution first attempts to use nis. I modified this line to:

hosts: files dns

Then everything works perfectly with both glibc 2.9-4 and 2.9-7. Nice!

So it looks like our NIS server behaves in some strange way when one tries to perform hostname resolution, such that glibc 2.9-7 somehow fails, although glibc 2.9-4 is able to cope with it. But this is strange, since you said that there were no changes of importance between the two versions. However, clearly there has to be something, since it works simply by downgrading glibc (and binutils for dependency reasons; but note that downgrading binutils only is not sufficient to make it work, so glibc is the cause and not binutils), and nothing else needs to be done so that it works.

Already changing the nsswitch.conf is a much neater solution that holding back updates of glibc. However we still haven't discovered the root cause ;-)

Two questions remain:
(1) why does it work with glibc 2.9-4, and not with glibc 2.9-7? If it was only the NIS server's fault, then neither glibc 2.9-4 nor 2.9-7 should work. But it is not so.
(2) what does the NIS server do wrong to confuse glibc? any idea how I could find this out? I do have root access to the NIS server.
Comment by Gerardo Exequiel Pozzi (djgera) - Tuesday, 12 May 2009, 18:04 GMT
Good!

Beyond my knowledge, what I can tell, they are assumptions:

1) Maybe is a glibc-2.9 issue with GCC 4.4.0, just ways somedays that glibc-2.10 become available on Arch Linux, maybe this problem is solved in 2.10 branch.
2) I don't have any experience with NIS. Can't help :(
Comment by Malte Skoruppa (einheitlix) - Friday, 15 May 2009, 11:14 GMT
Hi!

Well, I did some more testing and I can now precisely say that the error was introduced exactly between glibc-2.9-4 and glibc-2.9-5.

I found old Arch packages for .9-5 and .9-6 in the internet (http://mirrors.gigenet.com/archlinux/testing/os/x86_64/) and tried them all.
Now, in...
2.9-4: error does not occur
2.9-5: error occurs
2.9-6: error occurs
2.9-7: error occurs

Hence this error starts occurring with version 2.9-5, and occurs then in all subsequent versions of the 2.9 branch.

According to the .CHANGELOG, the gcc-4.4 toolchain was built starting with 2.9-5, so you may be right and it could be a problem with GCC 4.4.0...
or, I also noticed that Aaron Griffin built the glibc-2.9-4 package, but starting with glibc 2.9-5 Allan McRae built all the glibc packages. So maybe it's packaged in some different way which somehow causes problems...

Anyway! It doesn't really matter anymore, since after trying all those packages I also got the idea to try the glibc-2.10.1-1 package from [testing], and there it works fine again. Guess we'll never know, though I should have liked to know what the problem was ;)

Cheers,

Malte
Comment by Gerardo Exequiel Pozzi (djgera) - Friday, 15 May 2009, 21:02 GMT
@Malte: Good, so if glibc-2.10 solves the problem, this task can be closed.

Yes, but its probably that is related to gcc 4.4.0 optimization and old glibc.

Good Luck!

Loading...