FS#34832 - [glibc] Assertion in sysdeps/posix/getaddrinfo.c forces openJDK to exit with some networking apps

Attached to Project: Arch Linux
Opened by Old User New ID (u2012) - Wednesday, 17 April 2013, 23:43 GMT
Last edited by Allan McRae (Allan) - Friday, 25 October 2013, 21:17 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Allan McRae (Allan)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 3
Private No

Details

Description:

Recently (starting from April 13th), the JVM running I2P started to exit unexpectedly at seemingly random times due to this assertion:

java: ../sysdeps/posix/getaddrinfo.c:1738: rfc3484_sort: Assertion `src->results[i].native == -1 || src->results[i].native == a2_native' failed.

----
Note that I run a relatively unusual setup were short-lived processes create their own namespace with a dedicated hostname and network stack, connected with the master namespace through a veth master/slave interface.
----

I initially suspected that this patch was to blame:
https://projects.archlinux.org/svntogit/packages.git/plain/trunk/glibc-2.17-getaddrinfo-stack-overflow.patch?h=packages/glibc&id=27d80958180562e033e57483ac2a58ac49dab8e5

But I found another Archlinux user with the same issue but with another java app here:
https://groups.google.com/forum/?fromgroups=#!topic/omnetpp/exI342W4_P0

As this was reported on April 6th, I'm starting to think the GCC 4.8 rebuild is to blame.

Additional info:
* package version(s)
2.17-5 (with reason to believe it existed in 2.17-4)

* config and/or log files etc.

java: ../sysdeps/posix/getaddrinfo.c:1738: rfc3484_sort: Assertion `src->results[i].native == -1 || src->results[i].native == a2_native' failed.
JVM received a signal UNKNOWN (6).
JVM process is gone.
JVM exited unexpectedly.
This task depends upon

Closed by  Allan McRae (Allan)
Friday, 25 October 2013, 21:17 GMT
Reason for closing:  Fixed
Additional comments about closing:  glibc-2.18-9 in [testing]
Comment by Allan McRae (Allan) - Wednesday, 17 April 2013, 23:58 GMT
Which jvm are you using? I'm pointing my finger that way...
Comment by Jan de Groot (JGC) - Thursday, 18 April 2013, 08:01 GMT
My finger points to /etc/hosts. Does it have hostnames set for 127.x.x.x addresses that are not 127.0.0.1?

Debian has a patch for this:
http://patch-tracker.debian.org/patch/series/view/eglibc/2.11.3-4/any/submitted-getaddrinfo-lo.diff
Upstream report:
http://sourceware.org/bugzilla/show_bug.cgi?id=9954

Maybe also related, though no idea if it is needed:
http://patch-tracker.debian.org/patch/series/view/eglibc/2.13-38/any/local-getaddrinfo-interface.diff
Comment by Old User New ID (u2012) - Thursday, 18 April 2013, 10:39 GMT
>My finger points to /etc/hosts. Does it have hostnames set for 127.x.x.x addresses that are not 127.0.0.1?

Yes. But I added that a long time ago.

127.0.0.1 localhost.localdomain localhost
127.0.0.2 localhost2.localdomain localhost2
::1 localhost.localdomain localhost local6host
::2 local6host2.localdomain local6host2
Comment by Allan McRae (Allan) - Wednesday, 24 April 2013, 03:39 GMT
Any chance you can test the patch here: http://sourceware.org/bugzilla/attachment.cgi?id=3822 ?
Comment by Jan de Groot (JGC) - Wednesday, 24 April 2013, 09:54 GMT
That patch is the combined versions of the two debian patches I linked to above. Debian includes this in their eglibc build.
Comment by Allan McRae (Allan) - Wednesday, 24 April 2013, 10:07 GMT
Yes - but Debian includes bunches of crap in their patches. I wanted to confirm this is an actual fix before I push upstream about it.
Comment by Old User New ID (u2012) - Friday, 26 April 2013, 15:43 GMT
As I haven't seen the bug for four days now. I'm holding trying the patch to make sure the bug still exists without it.
Comment by jan (mrnerd) - Sunday, 28 April 2013, 16:13 GMT
I just happened to stumble upon this bug after a fresh install of Apache HTTPD 2.2.24 (Server built: Mar 18 2013 13:57:39). I found a quick workaround: My laptop has two interfaces enabled (LAN & Wifi), both in the same subnet. After deactivating the Wifi interface, Apache started without a problem.
Comment by Allan McRae (Allan) - Sunday, 28 April 2013, 22:46 GMT
And does the patch above fix the issue?
Comment by Allan McRae (Allan) - Wednesday, 01 May 2013, 12:48 GMT
While waiting on an answer, I tried the test case in the upstream BZ but could not replicate.
Comment by Old User New ID (u2012) - Thursday, 23 May 2013, 20:16 GMT
My system was hit by this bug twice in the last 6 days without me noticing.

So, I built glibc with the patch and "epoch=1" so it wouldn't be upgraded.

I'll report back in a couple of weeks.
Comment by Miro Kropacek (mikro) - Monday, 10 June 2013, 07:48 GMT
I confirm the same bug, happening during svn checkout:

svn co svn+ssh://mikro_sk@svn.code.sf.net/p/mxplay/code/trunk mxplay
svn: ../sysdeps/posix/getaddrinfo.c:1732: rfc3484_sort: Assertion `src->results[i].native == -1 || src->results[i].native == a1_native' failed.
Aborted (core dumped)

Content of /etc/hosts is:

#
# /etc/hosts: static lookup table for host names
#

#<ip-address> <hostname.domain.org> <hostname>
127.0.0.1 localhost.localdomain localhost
::1 localhost.localdomain localhost

# End of file

It must be something new (April 2013 onwards) because it used to work perfectly, I haven't changed my system configuration at all.
Comment by Allan McRae (Allan) - Saturday, 15 June 2013, 13:37 GMT
Confirming the issue does not help... confirming the issue AND the fix would.
Comment by Old User New ID (u2012) - Tuesday, 16 July 2013, 14:23 GMT
  • Field changed: Percent Complete (100% → 0%)
Feedback.
Didn't see the assertion since June 30. I lost my earlier logs.
So, the patch seems to fix the issue although that's hard to proof.
Comment by Allan McRae (Allan) - Tuesday, 16 July 2013, 14:24 GMT
Thanks. I am endeavoring to get this included upstream.
Comment by FakeName (LucetLux) - Thursday, 29 August 2013, 15:06 GMT
My system was affected by this bug, a workaround found here https://bugzilla.redhat.com/show_bug.cgi?id=739743 solved it for now.

"Removing "myhostname" from /etc/nsswitch.conf is a workaround that works for me."
Comment by Guillaume Maudoux (Layus) - Friday, 20 September 2013, 15:07 GMT
Removing "myhostname" from /etc/nsswitch.conf worked for me too.
Comment by Iru Dog (mytbk) - Friday, 04 October 2013, 11:47 GMT
Jan de Groot: After testing, I think  FS#37191  is caused by aur/linux-pf. I'll do further tests later.
Comment by Iru Dog (mytbk) - Friday, 04 October 2013, 11:57 GMT
I'm sorry, but the bug only occurs on one single boot.
Comment by Claus Klingberg (cjk) - Thursday, 24 October 2013, 09:17 GMT
Same problem: OpenJDK fails, but *only* when I have a openvpn-connection open (using tun0-interface).
Comment by Allan McRae (Allan) - Thursday, 24 October 2013, 10:16 GMT
I have the patch accepted upstream, so I will push it there tomorrow and the apply it to our package.

Loading...