FS#40304 - [bind] named segfaults

Attached to Project: Community Packages
Opened by alex (kabolt) - Sunday, 11 May 2014, 12:14 GMT
Last edited by Sébastien Luttringer (seblu) - Monday, 02 June 2014, 21:23 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Sébastien Luttringer (seblu)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 25
Private No

Details

Description:
named segfaults (in version 9.10.0.P1-1) after a short time. Attached is the configurationfile.
Sadly I have not more information.

Errormessage:
Mai 11 13:46:14 archal named[8786]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
Mai 11 13:46:14 archal named[8786]: #0 0x433afd in ??
Mai 11 13:46:14 archal named[8786]: #1 0x7f92f9be315a in ??
Mai 11 13:46:14 archal named[8786]: #2 0x7f92f9bf81bf in ??
Mai 11 13:46:14 archal named[8786]: #3 0x7f92fa57b0b4 in ??
Mai 11 13:46:14 archal named[8786]: #4 0x7f92fa525a0a in ??
Mai 11 13:46:14 archal named[8786]: #5 0x7f92fa50b5d0 in ??
Mai 11 13:46:14 archal named[8786]: #6 0x7f92fa50b734 in ??
Mai 11 13:46:14 archal named[8786]: #7 0x7f92fa582090 in ??
Mai 11 13:46:14 archal named[8786]: #8 0x7f92fa59170b in ??
Mai 11 13:46:14 archal named[8786]: #9 0x7f92f9c054dc in ??
Mai 11 13:46:14 archal named[8786]: #10 0x7f92f97af124 in ??
Mai 11 13:46:14 archal named[8786]: #11 0x7f92f8f714bd in ??
Mai 11 13:46:14 archal named[8786]: exiting (due to assertion failure)
Mai 11 13:46:14 archal systemd[1]: named.service: main process exited, code=killed, status=6/ABRT
This task depends upon

Closed by  Sébastien Luttringer (seblu)
Monday, 02 June 2014, 21:23 GMT
Reason for closing:  Fixed
Comment by Roderick Hoybach (hoybach) - Sunday, 11 May 2014, 20:51 GMT
After doing a system upgrade yesterday (to bind 9.10.0.P1-1, as above), I am having the same problem. named starts fine, but four minutes later crashes when the same assertion fails.
Comment by Jef (jeagoss) - Monday, 12 May 2014, 02:13 GMT
Downgrading to 9.9.5.W1-2 does seem to work in the meantime.
Comment by AMM (amish) - Monday, 12 May 2014, 18:08 GMT
Same issue
May 12 23:32:34 amish named[1755]: parser.c:315: REQUIRE(tupleobj != ((void *)0) && tupleobj->type->rep == &cfg_rep_tuple) failed, back trace
May 12 23:32:34 amish named[1755]: #0 0x433afd in ??
May 12 23:32:34 amish named[1755]: #1 0x7f36c3ca515a in ??
May 12 23:32:34 amish named[1755]: #2 0x7f36c411b8ce in ??
May 12 23:32:34 amish named[1755]: #3 0x4184d8 in ??
May 12 23:32:34 amish named[1755]: #4 0x418e8a in ??
May 12 23:32:34 amish named[1755]: #5 0x419d9f in ??
May 12 23:32:34 amish named[1755]: #6 0x44ec21 in ??
May 12 23:32:34 amish named[1755]: #7 0x44fbf4 in ??
May 12 23:32:34 amish named[1755]: #8 0x7f36c3cc74dc in ??
May 12 23:32:34 amish named[1755]: #9 0x7f36c3877124 in ??
May 12 23:32:34 amish named[1755]: #10 0x7f36c303f4bd in ??
May 12 23:32:34 amish named[1755]: exiting (due to assertion failure)
May 12 23:32:34 amish systemd[1]: named.service: main process exited, code=killed, status=6/ABRT
May 12 23:32:34 amish systemd[1]: named.service: control process exited, code=exited status=1


But it happens immediately, not after 4-5 minutes as reported by someone above.

Downgraded back to older version.
Comment by Bret Towe (magnade) - Monday, 12 May 2014, 19:11 GMT
time period didnt matter much to me I did find that browsing to http://www.penny-arcade.com/comic seemed to make it die consistantly tho
Comment by Mukund Sivaraman (muks) - Monday, 12 May 2014, 20:26 GMT
For those who are seeing the mem.c REQUIRE() assertion:

Please can you do a debug build of the package as described here: https://wiki.archlinux.org/index.php/Debug_-_Getting_Traces

so that the backtrace contains symbol names instead of "??" and either paste the backtrace here, or send it to <bind9-bugs@isc.org>? Thank you. A surefire way to reproduce (with config files) would also be helpful.
Comment by Mukund Sivaraman (muks) - Monday, 12 May 2014, 20:27 GMT
AMM (amish): The crash you are seeing seems to be a different problem: REQUIRE() assertion in parser.c.

Please can you also do a debug build as described above and report the backtrace with symbols? You can send the mail directly to <bind9-bugs@isc.org>. Thank you.
Comment by Sébastien Luttringer (seblu) - Monday, 12 May 2014, 21:17 GMT
Version compiled with options=('debug' '!strip')

# file =named
/usr/bin/named: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=02d0879315f197274dbdadecf7c53a3ec3cc9e4c, not stripped

-- Logs begin at sam. 2013-06-08 19:30:26 CEST, end at lun. 2014-05-12 22:53:03 CEST. --
mai 12 17:11:52 achille.seblu.net named[2994]: success resolving '132.214.211.66.zen.spamhaus.org/A' (in 'zen.spamhaus.
mai 12 17:11:56 achille.seblu.net named[2994]: success resolving 'archlinux.org.dbl.spamhaus.org/A' (in 'dbl.spamhaus.o
mai 12 18:03:32 achille.seblu.net named[2994]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
mai 12 18:03:32 achille.seblu.net named[2994]: #0 0x433afd in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #1 0x7ffa1002215a in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #2 0x7ffa100371bf in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #3 0x7ffa109ac0b4 in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #4 0x7ffa10956a0a in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #5 0x7ffa1093c5d0 in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #6 0x7ffa1093c734 in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #7 0x7ffa109b3090 in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #8 0x7ffa109c270b in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #9 0x7ffa100444dc in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #10 0x7ffa0fbf4124 in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #11 0x7ffa0f3bc4bd in ??
mai 12 18:03:32 achille.seblu.net named[2994]: exiting (due to assertion failure)
mai 12 18:03:32 achille.seblu.net systemd[1]: named.service: main process exited, code=killed, status=6/ABRT
mai 12 18:03:32 achille.seblu.net rndc[4667]: rndc: connect failed: 127.0.0.1#953: connection refused
mai 12 18:03:32 achille.seblu.net systemd[1]: named.service: control process exited, code=exited status=1
mai 12 18:03:32 achille.seblu.net systemd[1]: Unit named.service entered failed state.

New version with -g3 -ggdb3 -O0 is running. You can find the package here: http://celestia.archlinux.org/~seblu/bind-9.10.0.P1-1.1-x86_64.pkg.tar.xz
Comment by Steven Haigh (CRCinAU) - Tuesday, 13 May 2014, 00:13 GMT
I get the same on i686. I believe it would also be useful to add the following to the stock named service file for systemd:
[Unit]
Description=Internet domain name server
After=network.target

[Service]
ExecStart=/usr/bin/named -f -u named
ExecReload=/usr/bin/rndc reload
ExecStop=/usr/bin/rndc stop
Restart=always

[Install]
WantedBy=multi-user.target

As such, the stock systemd unit file won't auto-restart named on a crash / exit. If you're running bind, chances are that you want it to auto-restart. DNS is a critical service for networks these days - and as such should auto-recover from most failures (or at least restart!).

BTW: Does someone have an i686 debug build to try? I don't have a build environment for i686 yet...

EDIT: By the way, it crashes VERY often:
May 13 04:37:07 spin-gw named[19656]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 04:51:32 spin-gw named[19667]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 05:14:08 spin-gw named[19680]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 05:24:08 spin-gw named[19695]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 05:29:42 spin-gw named[19708]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 05:39:12 spin-gw named[19720]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 05:49:25 spin-gw named[19732]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 05:50:55 spin-gw named[19745]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 06:03:09 spin-gw named[19756]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 06:10:26 spin-gw named[19769]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 06:34:32 spin-gw named[19781]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 06:44:05 spin-gw named[19795]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 07:39:21 spin-gw named[19808]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:04:17 spin-gw named[19827]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:10:46 spin-gw named[19841]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:10:52 spin-gw named[19853]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:11:37 spin-gw named[19864]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:12:33 spin-gw named[19875]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:13:15 spin-gw named[19886]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:18:08 spin-gw named[19897]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:26:48 spin-gw named[19909]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:38:44 spin-gw named[19921]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:39:52 spin-gw named[19933]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 09:10:54 spin-gw named[19945]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 09:10:59 spin-gw named[19962]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 09:11:41 spin-gw named[19973]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 09:26:49 spin-gw named[19985]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 09:54:31 spin-gw named[19998]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 09:58:26 spin-gw named[20018]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 09:58:33 spin-gw named[20030]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 10:04:52 spin-gw named[20041]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 10:04:57 spin-gw named[20053]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 10:05:39 spin-gw named[20064]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 10:05:42 spin-gw named[20075]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 10:06:37 spin-gw named[20086]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 10:13:50 spin-gw named[20135]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
Comment by AMM (amish) - Tuesday, 13 May 2014, 03:42 GMT
Ok I found the config line which caused my issue.

Here is trace.log (I dont know how much useful it is)

Starting program: /usr/bin/named -f -u named
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff40ff700 (LWP 20374)]
[New Thread 0x7ffff38fe700 (LWP 20375)]
[New Thread 0x7ffff30fd700 (LWP 20376)]
[New Thread 0x7ffff28fc700 (LWP 20377)]

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff40ff700 (LWP 20374)]
0x00007ffff6011d67 in raise () from /usr/lib/libc.so.6
#0 0x00007ffff6011d67 in raise () from /usr/lib/libc.so.6
#1 0x00007ffff6013118 in abort () from /usr/lib/libc.so.6
#2 0x0000000000434808 in ?? ()
#3 0x00007ffff7192110 in isc_categories () from /usr/lib/libisc.so.142
#4 0x00007ffff7192100 in ?? () from /usr/lib/libisc.so.142
#5 0x0000000000000001 in ?? ()
#6 0x0000000b00000001 in ?? ()
#7 0x0000000000000000 in ?? ()



But I found what was causing crash. (Strangely same thing worked in previous version)

zone "multi.uribl.com" {
type forward;
};

I have forwarders (OpenDNS) set, so above lines to tell named that don't use forwarders for this domain.(multi.uribl.com)

It acts like "exception" to forwarders.

But somehow this line is crashing named. Commenting/removing the line resolves my problem.

Any idea whats wrong? Thanks in advance.
Comment by Steven Haigh (CRCinAU) - Tuesday, 13 May 2014, 03:58 GMT
amish: This seems to be a different bug. Can you file it in a separate report?
Comment by AMM (amish) - Tuesday, 13 May 2014, 04:08 GMT
Another update.

After above was resolved (which seemed like same bug but was actually different). now I have also come across mem.c bug referred here.

i.e. named starts but crashes after some time with mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace

@Steven - ok I will.
Comment by Mukund Sivaraman (muks) - Tuesday, 13 May 2014, 06:09 GMT
AMM (amish): The crash you are seeing seems to be a different problem: REQUIRE() assertion in parser.c.

Please can you also do a debug build as described above and report the backtrace with symbols? You can send the mail directly to <bind9-bugs@isc.org>. Thank you.
Comment by AMM (amish) - Tuesday, 13 May 2014, 06:14 GMT
@muks: I have already filed a separate bug report and backtrace is in that bug report.(and also above)

I have also identified the config line which causes that problem

See bug report:(with workaround)
https://bugs.archlinux.org/task/40342?project=5
Comment by Mukund Sivaraman (muks) - Tuesday, 13 May 2014, 06:42 GMT
AMM: Thank you for it.. I've updated the ticket.

For this ticket's issue, please see if you are able to do a debug build, make named dump core upon hitting the REQUIRE() assertion (you may have to set ulimit -c on a Linux and run it manually outside systemd), open the core dump in gdb and get a usable backtrace (one without ??s).
Comment by Randy McCaskill (rmccask) - Tuesday, 13 May 2014, 15:38 GMT
I do not know how to cause it to happen, it just does after a few minutes but here is my backtrace. Let me know if you need anything else. For now I have reverted to the previous build.

#0 0x00007ffff621fd67 in raise () from /usr/lib/libc.so.6
#1 0x00007ffff6221118 in abort () from /usr/lib/libc.so.6
#2 0x0000000000433e77 in assertion_failed (
file=0xc <error: Cannot access memory at address 0xc>, line=-198136064,
type=6912928, cond=0x7ffff430af08 "") at ./main.c:222
#3 0x00007ffff6f3b28a in isc_assertion_failed (file=<optimized out>,
line=<optimized out>, type=<optimized out>, cond=<optimized out>)
at assertions.c:57
#4 0x00007ffff6f50a12 in isc___mem_put (ctx0=0x7fffec047470, ptr=0x0,
size=140737290221856, file=0x0, line=6974288) at mem.c:1298
#5 0x00007ffff78c927c in dns_rdataslab_fromrdataset (rdataset=0x7ffff430b920,
mctx=0x7fffec047470, region=0x7ffff430b460, reservelen=<optimized out>)
at rdataslab.c:334
#6 0x00007ffff7872faa in addrdataset (db=0x7ffff27b0010, node=0x167,
version=0x6, now=1399995217, rdataset=0x7ffff430b920, options=0,
addedrdataset=0x7fffeda941d0) at rbtdb.c:6445
#7 0x00007ffff78587bb in addoptout (message=0x163,
message@entry=0x7fffece3bd08, cache=0x167, node=0x6, covers=65535,
now=6974288, maxttl=0, optout=isc_boolean_false, secure=isc_boolean_false,
addedrdataset=0x7fffeda941d0) at ncache.c:270
#8 0x00007ffff7858934 in dns_ncache_add (
message=message@entry=0x7fffece3bd08, cache=<optimized out>,
node=<optimized out>, covers=<optimized out>, now=<optimized out>,
maxttl=<optimized out>, addedrdataset=0x7fffeda941d0) at ncache.c:105
#9 0x00007ffff78d0498 in ncache_adderesult (message=0x7fffece3bd08,
cache=<optimized out>, node=<optimized out>, covers=<optimized out>,
now=<optimized out>, maxttl=<optimized out>, optout=isc_boolean_false,
secure=isc_boolean_false, ardataset=0x7fffeda941d0,
eresultp=0x7ffff430ce48) at resolver.c:5283
#10 0x00007ffff78ddaa3 in validated (task=0x163, event=0x7ffff25135d0)
at resolver.c:4459
#11 0x00007ffff6f5e2a9 in dispatch (manager=0x7ffff7faa010) at task.c:1122
#12 run (uap=0x7ffff7faa010) at task.c:1294
#13 0x00007ffff6b0d124 in start_thread () from /usr/lib/libpthread.so.0
#14 0x00007ffff62d54bd in clone () from /usr/lib/libc.so.6
Comment by Mukund Sivaraman (muks) - Tuesday, 13 May 2014, 15:59 GMT
Thank you for the backtrace!
Comment by Mukund Sivaraman (muks) - Wednesday, 14 May 2014, 11:19 GMT
This issue seems to be caused by a compiler bug in GCC 4.9.1 20140507 (both Arch patched, and unpatched upstream). We are able to reproduce the issue now. Thank you very much for giving us backtraces and reports.

We'll update this ticket in a day or two.
Comment by Sébastien Luttringer (seblu) - Thursday, 15 May 2014, 22:24 GMT
version bind-9.10.0.P1-2 should fix this issue (rebuild against gcc 4.8.2).
Comment by Randy McCaskill (rmccask) - Friday, 16 May 2014, 00:20 GMT
I installed the new version and it still crashed. The crash looks the same from the backtrace without symbols. I can give another backtrace with symbols if needed.
Comment by MadCatX (MadCatX) - Friday, 16 May 2014, 13:14 GMT
Crashes for me either. BT (with no debugging symbols though)

kvě 16 14:59:28 Sigyn named[842]: exiting (due to assertion failure)
kvě 16 14:59:28 Sigyn named[842]: #11 0x7fe7b8f2c4bd in ??
kvě 16 14:59:28 Sigyn named[842]: #10 0x7fe7b9764124 in ??
kvě 16 14:59:28 Sigyn named[842]: #9 0x7fe7b9bb44dc in ??
kvě 16 14:59:28 Sigyn named[842]: #8 0x7fe7ba5304c4 in ??
kvě 16 14:59:28 Sigyn named[842]: #7 0x7fe7ba523090 in ??
kvě 16 14:59:28 Sigyn named[842]: #6 0x7fe7ba4ac734 in ??
kvě 16 14:59:28 Sigyn named[842]: #5 0x7fe7ba4ac5d0 in ??
kvě 16 14:59:28 Sigyn named[842]: #4 0x7fe7ba4c6a0a in ??
kvě 16 14:59:28 Sigyn named[842]: #3 0x7fe7ba51c0b4 in ??
kvě 16 14:59:28 Sigyn named[842]: #2 0x7fe7b9ba71bf in ??
kvě 16 14:59:28 Sigyn named[842]: #1 0x7fe7b9b9215a in ??
kvě 16 14:59:28 Sigyn named[842]: #0 0x433afd in ??
kvě 16 14:59:28 Sigyn named[842]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
kvě 16 14:59:28 Sigyn named[842]: SERVFAIL unexpected RCODE resolving 'images.zvents.com/AAAA/IN': 64.94.123.36#53
kvě 16 14:59:28 Sigyn named[842]: SERVFAIL unexpected RCODE resolving 'images.zvents.com/A/IN': 64.94.123.36#53
kvě 16 14:59:28 Sigyn named[842]: SERVFAIL unexpected RCODE resolving 'images.zvents.com/AAAA/IN': 64.94.123.4#53
kvě 16 14:59:28 Sigyn named[842]: SERVFAIL unexpected RCODE resolving 'images.zvents.com/A/IN': 64.94.123.4#53
kvě 16 14:59:27 Sigyn named[842]: network unreachable resolving 'secure-us.imrworldwide.com/A/IN': 2600:1401:2::32#53
kvě 16 14:59:27 Sigyn named[842]: network unreachable resolving 'secure-us.imrworldwide.com/AAAA/IN': 2600:1401:2::32#53
kvě 16 14:59:27 Sigyn named[842]: network unreachable resolving 'a1-50.akam.net/AAAA/IN': 2600:1401:2::43#53
kvě 16 14:59:27 Sigyn named[842]: network unreachable resolving 'a1-50.akam.net/A/IN': 2600:1401:2::43#53
Comment by Adam C. Emerson (electric_blue) - Friday, 16 May 2014, 22:47 GMT
I am having this problem, also. It seems to be triggered by requests since if switch to using Google's DNS servers, my local named stays up, but as soon as I start having it serve requests, it dies, oftenly quickly.
Comment by Rafał (ert16) - Saturday, 17 May 2014, 11:43 GMT
Hello,

I had the same problem with assert at mem.c (see below)
It crashed in both wersjins 9.10.0.P1-1 and 9.10.0.P1-2. I can reproduce it by throwing A LOT of DNS requests.

What interesting I found out, that problem disappears when using -O0 flag (throws warnings during compilation). When I had build package with that flag, daemon stays up after stress-test. When not - crashes as you can see below.

Hope, that will help


[1]
May 17 13:31:11 localhost named[21639]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 17 13:31:11 localhost named[21639]: #0 0x8076a3d in ??
May 17 13:31:11 localhost named[21639]: #1 0xb7419615 in ??
May 17 13:31:11 localhost named[21639]: #2 0xb742f548 in ??
May 17 13:31:11 localhost named[21639]: #3 0xb742f5f9 in ??
May 17 13:31:11 localhost named[21639]: #4 0xb75a7e43 in ??
May 17 13:31:11 localhost named[21639]: #5 0xb755bda7 in ??
May 17 13:31:11 localhost named[21639]: #6 0xb7500186 in ??
May 17 13:31:11 localhost named[21639]: #7 0xb7536cef in ??
May 17 13:31:11 localhost named[21639]: #8 0xb75aeac4 in ??
May 17 13:31:11 localhost named[21639]: #9 0xb75bde19 in ??
May 17 13:31:11 localhost named[21639]: #10 0xb743d625 in ??
May 17 13:31:11 localhost named[21639]: #11 0xb73e4096 in ??
May 17 13:31:11 localhost named[21639]: #12 0xb71a6a3e in ??
May 17 13:31:11 localhost named[21639]: exiting (due to assertion failure)
Comment by Sébastien Luttringer (seblu) - Saturday, 17 May 2014, 18:24 GMT
@muks: We have the same king of crash with gcc 4.8.2. Will try a build with -O0.
Comment by Sébastien Luttringer (seblu) - Saturday, 17 May 2014, 18:55 GMT
ok, I screwed up with -2. It had still built with gcc 4.9.0.

From the last info from ISC, -O0 seems to be a workaround, but they still have some crash. They advise to use gcc 4.8.0. They still not open a bug to gcc BR.
Comment by Mukund Sivaraman (muks) - Saturday, 17 May 2014, 19:46 GMT
Hi

Adding to what seblu says, strings on the usr/bin/named binary returned that the binary was still compiled with GCC 4.9 which would explain why the crashes are still seen.

As explained on in #bind on irc.freenode.net, -O0 with GCC 4.9 seems to make the crash go away, but we cannot be sure there are no other issues due to this bug in GCC 4.9. We are not compiler developers and we don't understand what triggers the the bug in GCC's code.

A bug report will be filed to GCC's bug tracker soon. We still need to do some analysis of the generated object code before reporting it.

ISC is not advising you to use any particular version of GCC, except noting that the GCC 4.8.x releases in other distros like Fedora 20 do not seem to suffer from this compiler bug. I'll ask our support team to check if a formal announcement is necessary for this issue.

Development team members currently do not use Arch Linux. I've already asked our QA team for an Arch builder setup in our build farm, but it may be a little while before that is ready.

For Arch, I recommend referring this bug to your GCC package maintainers as it could affect other packages too.
Comment by Steven Haigh (CRCinAU) - Sunday, 18 May 2014, 01:14 GMT
For documentations sake, the last version of bind *not* to crash and burn is 9.9.3.P2-1. Anything beyond that seems to be hit with this problem.
Comment by Asterios Dimitriou (rhayader) - Sunday, 18 May 2014, 14:24 GMT
Maybe I will complicate things but I would like to bring to your attention that running named outside of systemd service (#named -u named, NOTICE no -f argument) is running without assertion failure on my server for the last 48 hours. The version of bind I am running is 9.10.0.P1-1 built from abs with debug and !strip with gcc4.9.0. Ofcourse there is always the chance that the server will segfault sometime later, but since the systemd service failed within minutes (around 4 in my case) this seems unlikely.
Comment by Steven Haigh (CRCinAU) - Friday, 23 May 2014, 16:11 GMT
Hi all,

Nearly two weeks on with this bug open. Just wondering if we can get a 'state of play' as to what is going on and the current progress towards resolution?
Comment by Mukund Sivaraman (muks) - Friday, 23 May 2014, 16:25 GMT
The assertion failure is due to how the agressive optimizer in GCC 4.9, glibc and BIND interact together. You can read about it here:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61236

It has been resolved as INVALID by the GCC team, so we have committed fixes to our repository branches to address the crash and work around this optimization issue.

A patch release of BIND will be made by ISC in the next few days to address this problem. If the maintainer of the package urgently wants a patch, please email us at bind9-bugs@isc.org.
Comment by Sébastien Luttringer (seblu) - Friday, 23 May 2014, 18:00 GMT
Steven, this bug seems to be resolved since -4 (17 may) for Archlinux. Current version in repository does not crash anymore. I'm waiting to see if another people complain about it.
Comment by Sergej Pupykin (sergej) - Monday, 02 June 2014, 14:04 GMT
looks working for me
Comment by AMM (amish) - Monday, 02 June 2014, 14:36 GMT
Just upgraded. Working for me too. No crash from 25 minutes.

Loading...