FS#40304 - [bind] named segfaults
Attached to Project:
Community Packages
Opened by alex (kabolt) - Sunday, 11 May 2014, 12:14 GMT
Last edited by Sébastien Luttringer (seblu) - Monday, 02 June 2014, 21:23 GMT
Opened by alex (kabolt) - Sunday, 11 May 2014, 12:14 GMT
Last edited by Sébastien Luttringer (seblu) - Monday, 02 June 2014, 21:23 GMT
|
Details
Description:
named segfaults (in version 9.10.0.P1-1) after a short time. Attached is the configurationfile. Sadly I have not more information. Errormessage: Mai 11 13:46:14 archal named[8786]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace Mai 11 13:46:14 archal named[8786]: #0 0x433afd in ?? Mai 11 13:46:14 archal named[8786]: #1 0x7f92f9be315a in ?? Mai 11 13:46:14 archal named[8786]: #2 0x7f92f9bf81bf in ?? Mai 11 13:46:14 archal named[8786]: #3 0x7f92fa57b0b4 in ?? Mai 11 13:46:14 archal named[8786]: #4 0x7f92fa525a0a in ?? Mai 11 13:46:14 archal named[8786]: #5 0x7f92fa50b5d0 in ?? Mai 11 13:46:14 archal named[8786]: #6 0x7f92fa50b734 in ?? Mai 11 13:46:14 archal named[8786]: #7 0x7f92fa582090 in ?? Mai 11 13:46:14 archal named[8786]: #8 0x7f92fa59170b in ?? Mai 11 13:46:14 archal named[8786]: #9 0x7f92f9c054dc in ?? Mai 11 13:46:14 archal named[8786]: #10 0x7f92f97af124 in ?? Mai 11 13:46:14 archal named[8786]: #11 0x7f92f8f714bd in ?? Mai 11 13:46:14 archal named[8786]: exiting (due to assertion failure) Mai 11 13:46:14 archal systemd[1]: named.service: main process exited, code=killed, status=6/ABRT |
This task depends upon
May 12 23:32:34 amish named[1755]: parser.c:315: REQUIRE(tupleobj != ((void *)0) && tupleobj->type->rep == &cfg_rep_tuple) failed, back trace
May 12 23:32:34 amish named[1755]: #0 0x433afd in ??
May 12 23:32:34 amish named[1755]: #1 0x7f36c3ca515a in ??
May 12 23:32:34 amish named[1755]: #2 0x7f36c411b8ce in ??
May 12 23:32:34 amish named[1755]: #3 0x4184d8 in ??
May 12 23:32:34 amish named[1755]: #4 0x418e8a in ??
May 12 23:32:34 amish named[1755]: #5 0x419d9f in ??
May 12 23:32:34 amish named[1755]: #6 0x44ec21 in ??
May 12 23:32:34 amish named[1755]: #7 0x44fbf4 in ??
May 12 23:32:34 amish named[1755]: #8 0x7f36c3cc74dc in ??
May 12 23:32:34 amish named[1755]: #9 0x7f36c3877124 in ??
May 12 23:32:34 amish named[1755]: #10 0x7f36c303f4bd in ??
May 12 23:32:34 amish named[1755]: exiting (due to assertion failure)
May 12 23:32:34 amish systemd[1]: named.service: main process exited, code=killed, status=6/ABRT
May 12 23:32:34 amish systemd[1]: named.service: control process exited, code=exited status=1
But it happens immediately, not after 4-5 minutes as reported by someone above.
Downgraded back to older version.
Please can you do a debug build of the package as described here: https://wiki.archlinux.org/index.php/Debug_-_Getting_Traces
so that the backtrace contains symbol names instead of "??" and either paste the backtrace here, or send it to <bind9-bugs@isc.org>? Thank you. A surefire way to reproduce (with config files) would also be helpful.
Please can you also do a debug build as described above and report the backtrace with symbols? You can send the mail directly to <bind9-bugs@isc.org>. Thank you.
# file =named
/usr/bin/named: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=02d0879315f197274dbdadecf7c53a3ec3cc9e4c, not stripped
-- Logs begin at sam. 2013-06-08 19:30:26 CEST, end at lun. 2014-05-12 22:53:03 CEST. --
mai 12 17:11:52 achille.seblu.net named[2994]: success resolving '132.214.211.66.zen.spamhaus.org/A' (in 'zen.spamhaus.
mai 12 17:11:56 achille.seblu.net named[2994]: success resolving 'archlinux.org.dbl.spamhaus.org/A' (in 'dbl.spamhaus.o
mai 12 18:03:32 achille.seblu.net named[2994]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
mai 12 18:03:32 achille.seblu.net named[2994]: #0 0x433afd in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #1 0x7ffa1002215a in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #2 0x7ffa100371bf in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #3 0x7ffa109ac0b4 in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #4 0x7ffa10956a0a in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #5 0x7ffa1093c5d0 in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #6 0x7ffa1093c734 in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #7 0x7ffa109b3090 in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #8 0x7ffa109c270b in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #9 0x7ffa100444dc in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #10 0x7ffa0fbf4124 in ??
mai 12 18:03:32 achille.seblu.net named[2994]: #11 0x7ffa0f3bc4bd in ??
mai 12 18:03:32 achille.seblu.net named[2994]: exiting (due to assertion failure)
mai 12 18:03:32 achille.seblu.net systemd[1]: named.service: main process exited, code=killed, status=6/ABRT
mai 12 18:03:32 achille.seblu.net rndc[4667]: rndc: connect failed: 127.0.0.1#953: connection refused
mai 12 18:03:32 achille.seblu.net systemd[1]: named.service: control process exited, code=exited status=1
mai 12 18:03:32 achille.seblu.net systemd[1]: Unit named.service entered failed state.
New version with -g3 -ggdb3 -O0 is running. You can find the package here: http://celestia.archlinux.org/~seblu/bind-9.10.0.P1-1.1-x86_64.pkg.tar.xz
[Unit]
Description=Internet domain name server
After=network.target
[Service]
ExecStart=/usr/bin/named -f -u named
ExecReload=/usr/bin/rndc reload
ExecStop=/usr/bin/rndc stop
Restart=always
[Install]
WantedBy=multi-user.target
As such, the stock systemd unit file won't auto-restart named on a crash / exit. If you're running bind, chances are that you want it to auto-restart. DNS is a critical service for networks these days - and as such should auto-recover from most failures (or at least restart!).
BTW: Does someone have an i686 debug build to try? I don't have a build environment for i686 yet...
EDIT: By the way, it crashes VERY often:
May 13 04:37:07 spin-gw named[19656]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 04:51:32 spin-gw named[19667]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 05:14:08 spin-gw named[19680]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 05:24:08 spin-gw named[19695]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 05:29:42 spin-gw named[19708]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 05:39:12 spin-gw named[19720]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 05:49:25 spin-gw named[19732]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 05:50:55 spin-gw named[19745]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 06:03:09 spin-gw named[19756]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 06:10:26 spin-gw named[19769]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 06:34:32 spin-gw named[19781]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 06:44:05 spin-gw named[19795]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 07:39:21 spin-gw named[19808]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:04:17 spin-gw named[19827]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:10:46 spin-gw named[19841]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:10:52 spin-gw named[19853]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:11:37 spin-gw named[19864]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:12:33 spin-gw named[19875]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:13:15 spin-gw named[19886]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:18:08 spin-gw named[19897]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:26:48 spin-gw named[19909]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:38:44 spin-gw named[19921]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 08:39:52 spin-gw named[19933]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 09:10:54 spin-gw named[19945]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 09:10:59 spin-gw named[19962]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 09:11:41 spin-gw named[19973]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 09:26:49 spin-gw named[19985]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 09:54:31 spin-gw named[19998]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 09:58:26 spin-gw named[20018]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 09:58:33 spin-gw named[20030]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 10:04:52 spin-gw named[20041]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 10:04:57 spin-gw named[20053]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 10:05:39 spin-gw named[20064]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 10:05:42 spin-gw named[20075]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 10:06:37 spin-gw named[20086]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 13 10:13:50 spin-gw named[20135]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
Here is trace.log (I dont know how much useful it is)
Starting program: /usr/bin/named -f -u named
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff40ff700 (LWP 20374)]
[New Thread 0x7ffff38fe700 (LWP 20375)]
[New Thread 0x7ffff30fd700 (LWP 20376)]
[New Thread 0x7ffff28fc700 (LWP 20377)]
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff40ff700 (LWP 20374)]
0x00007ffff6011d67 in raise () from /usr/lib/libc.so.6
#0 0x00007ffff6011d67 in raise () from /usr/lib/libc.so.6
#1 0x00007ffff6013118 in abort () from /usr/lib/libc.so.6
#2 0x0000000000434808 in ?? ()
#3 0x00007ffff7192110 in isc_categories () from /usr/lib/libisc.so.142
#4 0x00007ffff7192100 in ?? () from /usr/lib/libisc.so.142
#5 0x0000000000000001 in ?? ()
#6 0x0000000b00000001 in ?? ()
#7 0x0000000000000000 in ?? ()
But I found what was causing crash. (Strangely same thing worked in previous version)
zone "multi.uribl.com" {
type forward;
};
I have forwarders (OpenDNS) set, so above lines to tell named that don't use forwarders for this domain.(multi.uribl.com)
It acts like "exception" to forwarders.
But somehow this line is crashing named. Commenting/removing the line resolves my problem.
Any idea whats wrong? Thanks in advance.
After above was resolved (which seemed like same bug but was actually different). now I have also come across mem.c bug referred here.
i.e. named starts but crashes after some time with mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
@Steven - ok I will.
Please can you also do a debug build as described above and report the backtrace with symbols? You can send the mail directly to <bind9-bugs@isc.org>. Thank you.
I have also identified the config line which causes that problem
See bug report:(with workaround)
https://bugs.archlinux.org/task/40342?project=5
For this ticket's issue, please see if you are able to do a debug build, make named dump core upon hitting the REQUIRE() assertion (you may have to set ulimit -c on a Linux and run it manually outside systemd), open the core dump in gdb and get a usable backtrace (one without ??s).
#0 0x00007ffff621fd67 in raise () from /usr/lib/libc.so.6
#1 0x00007ffff6221118 in abort () from /usr/lib/libc.so.6
#2 0x0000000000433e77 in assertion_failed (
file=0xc <error: Cannot access memory at address 0xc>, line=-198136064,
type=6912928, cond=0x7ffff430af08 "") at ./main.c:222
#3 0x00007ffff6f3b28a in isc_assertion_failed (file=<optimized out>,
line=<optimized out>, type=<optimized out>, cond=<optimized out>)
at assertions.c:57
#4 0x00007ffff6f50a12 in isc___mem_put (ctx0=0x7fffec047470, ptr=0x0,
size=140737290221856, file=0x0, line=6974288) at mem.c:1298
#5 0x00007ffff78c927c in dns_rdataslab_fromrdataset (rdataset=0x7ffff430b920,
mctx=0x7fffec047470, region=0x7ffff430b460, reservelen=<optimized out>)
at rdataslab.c:334
#6 0x00007ffff7872faa in addrdataset (db=0x7ffff27b0010, node=0x167,
version=0x6, now=1399995217, rdataset=0x7ffff430b920, options=0,
addedrdataset=0x7fffeda941d0) at rbtdb.c:6445
#7 0x00007ffff78587bb in addoptout (message=0x163,
message@entry=0x7fffece3bd08, cache=0x167, node=0x6, covers=65535,
now=6974288, maxttl=0, optout=isc_boolean_false, secure=isc_boolean_false,
addedrdataset=0x7fffeda941d0) at ncache.c:270
#8 0x00007ffff7858934 in dns_ncache_add (
message=message@entry=0x7fffece3bd08, cache=<optimized out>,
node=<optimized out>, covers=<optimized out>, now=<optimized out>,
maxttl=<optimized out>, addedrdataset=0x7fffeda941d0) at ncache.c:105
#9 0x00007ffff78d0498 in ncache_adderesult (message=0x7fffece3bd08,
cache=<optimized out>, node=<optimized out>, covers=<optimized out>,
now=<optimized out>, maxttl=<optimized out>, optout=isc_boolean_false,
secure=isc_boolean_false, ardataset=0x7fffeda941d0,
eresultp=0x7ffff430ce48) at resolver.c:5283
#10 0x00007ffff78ddaa3 in validated (task=0x163, event=0x7ffff25135d0)
at resolver.c:4459
#11 0x00007ffff6f5e2a9 in dispatch (manager=0x7ffff7faa010) at task.c:1122
#12 run (uap=0x7ffff7faa010) at task.c:1294
#13 0x00007ffff6b0d124 in start_thread () from /usr/lib/libpthread.so.0
#14 0x00007ffff62d54bd in clone () from /usr/lib/libc.so.6
We'll update this ticket in a day or two.
kvě 16 14:59:28 Sigyn named[842]: exiting (due to assertion failure)
kvě 16 14:59:28 Sigyn named[842]: #11 0x7fe7b8f2c4bd in ??
kvě 16 14:59:28 Sigyn named[842]: #10 0x7fe7b9764124 in ??
kvě 16 14:59:28 Sigyn named[842]: #9 0x7fe7b9bb44dc in ??
kvě 16 14:59:28 Sigyn named[842]: #8 0x7fe7ba5304c4 in ??
kvě 16 14:59:28 Sigyn named[842]: #7 0x7fe7ba523090 in ??
kvě 16 14:59:28 Sigyn named[842]: #6 0x7fe7ba4ac734 in ??
kvě 16 14:59:28 Sigyn named[842]: #5 0x7fe7ba4ac5d0 in ??
kvě 16 14:59:28 Sigyn named[842]: #4 0x7fe7ba4c6a0a in ??
kvě 16 14:59:28 Sigyn named[842]: #3 0x7fe7ba51c0b4 in ??
kvě 16 14:59:28 Sigyn named[842]: #2 0x7fe7b9ba71bf in ??
kvě 16 14:59:28 Sigyn named[842]: #1 0x7fe7b9b9215a in ??
kvě 16 14:59:28 Sigyn named[842]: #0 0x433afd in ??
kvě 16 14:59:28 Sigyn named[842]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
kvě 16 14:59:28 Sigyn named[842]: SERVFAIL unexpected RCODE resolving 'images.zvents.com/AAAA/IN': 64.94.123.36#53
kvě 16 14:59:28 Sigyn named[842]: SERVFAIL unexpected RCODE resolving 'images.zvents.com/A/IN': 64.94.123.36#53
kvě 16 14:59:28 Sigyn named[842]: SERVFAIL unexpected RCODE resolving 'images.zvents.com/AAAA/IN': 64.94.123.4#53
kvě 16 14:59:28 Sigyn named[842]: SERVFAIL unexpected RCODE resolving 'images.zvents.com/A/IN': 64.94.123.4#53
kvě 16 14:59:27 Sigyn named[842]: network unreachable resolving 'secure-us.imrworldwide.com/A/IN': 2600:1401:2::32#53
kvě 16 14:59:27 Sigyn named[842]: network unreachable resolving 'secure-us.imrworldwide.com/AAAA/IN': 2600:1401:2::32#53
kvě 16 14:59:27 Sigyn named[842]: network unreachable resolving 'a1-50.akam.net/AAAA/IN': 2600:1401:2::43#53
kvě 16 14:59:27 Sigyn named[842]: network unreachable resolving 'a1-50.akam.net/A/IN': 2600:1401:2::43#53
I had the same problem with assert at mem.c (see below)
It crashed in both wersjins 9.10.0.P1-1 and 9.10.0.P1-2. I can reproduce it by throwing A LOT of DNS requests.
What interesting I found out, that problem disappears when using -O0 flag (throws warnings during compilation). When I had build package with that flag, daemon stays up after stress-test. When not - crashes as you can see below.
Hope, that will help
[1]
May 17 13:31:11 localhost named[21639]: mem.c:1298: REQUIRE(ptr != ((void *)0)) failed, back trace
May 17 13:31:11 localhost named[21639]: #0 0x8076a3d in ??
May 17 13:31:11 localhost named[21639]: #1 0xb7419615 in ??
May 17 13:31:11 localhost named[21639]: #2 0xb742f548 in ??
May 17 13:31:11 localhost named[21639]: #3 0xb742f5f9 in ??
May 17 13:31:11 localhost named[21639]: #4 0xb75a7e43 in ??
May 17 13:31:11 localhost named[21639]: #5 0xb755bda7 in ??
May 17 13:31:11 localhost named[21639]: #6 0xb7500186 in ??
May 17 13:31:11 localhost named[21639]: #7 0xb7536cef in ??
May 17 13:31:11 localhost named[21639]: #8 0xb75aeac4 in ??
May 17 13:31:11 localhost named[21639]: #9 0xb75bde19 in ??
May 17 13:31:11 localhost named[21639]: #10 0xb743d625 in ??
May 17 13:31:11 localhost named[21639]: #11 0xb73e4096 in ??
May 17 13:31:11 localhost named[21639]: #12 0xb71a6a3e in ??
May 17 13:31:11 localhost named[21639]: exiting (due to assertion failure)
From the last info from ISC, -O0 seems to be a workaround, but they still have some crash. They advise to use gcc 4.8.0. They still not open a bug to gcc BR.
Adding to what seblu says, strings on the usr/bin/named binary returned that the binary was still compiled with GCC 4.9 which would explain why the crashes are still seen.
As explained on in #bind on irc.freenode.net, -O0 with GCC 4.9 seems to make the crash go away, but we cannot be sure there are no other issues due to this bug in GCC 4.9. We are not compiler developers and we don't understand what triggers the the bug in GCC's code.
A bug report will be filed to GCC's bug tracker soon. We still need to do some analysis of the generated object code before reporting it.
ISC is not advising you to use any particular version of GCC, except noting that the GCC 4.8.x releases in other distros like Fedora 20 do not seem to suffer from this compiler bug. I'll ask our support team to check if a formal announcement is necessary for this issue.
Development team members currently do not use Arch Linux. I've already asked our QA team for an Arch builder setup in our build farm, but it may be a little while before that is ready.
For Arch, I recommend referring this bug to your GCC package maintainers as it could affect other packages too.
Nearly two weeks on with this bug open. Just wondering if we can get a 'state of play' as to what is going on and the current progress towards resolution?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61236
It has been resolved as INVALID by the GCC team, so we have committed fixes to our repository branches to address the crash and work around this optimization issue.
A patch release of BIND will be made by ISC in the next few days to address this problem. If the maintainer of the package urgently wants a patch, please email us at bind9-bugs@isc.org.