FS#60913 : [bind] CoreDump after update to bind 9.13.4-1

FS#60913 - [bind] CoreDump after update to bind 9.13.4-1

Attached to Project: Arch Linux
Opened by BAD+MAD (mat_weiss) - Monday, 26 November 2018, 10:59 GMT
Last edited by Sébastien Luttringer (seblu) - Monday, 14 January 2019, 13:54 GMT

Task Type	Bug Report
Category	Packages: Extra
Status	Closed
Assigned To	Sébastien Luttringer (seblu)
Architecture	All
Severity	Medium
Priority	Normal
Reported Version
Due in Version	Undecided
Due Date	Undecided
Percent Complete
Votes	9 Bret Towe (magnade) (2019-01-01) Ben (Dono) (2018-12-24) Nocturne (Nocturne) (2018-12-17) BAD+MAD (mat_weiss) (2018-12-14) AMM (amish) (2018-12-07) Matti Niemenmaa (Deewiant) (2018-12-06) J. Morse Loyola (irb) (2018-12-04) Arturo (milocario) (2018-12-03) Sergej Pupykin (sergej) (2018-11-29)
Private	No

Details

Description:

After the update to version bind 9.13.4-1 I get coredumps and bind crashes!

Additional info:

journalctl -p 4

systemd-coredump[27428]: Process 25250 (named) of user 40 dumped core.

Stack trace of thread 25252:
#0 0x00007fe19dccad7f raise (libc.so.6)
#1 0x00007fe19dcb5672 abort (libc.so.6)
#2 0x000055e94e9cd02c n/a (named)
#3 0x00007fe19e94fcaa isc_assertion_failed (libisc.so.1304)
#4 0x00007fe19eb0295b dns_resolver_createfetch (libdns.so.1304)
#5 0x00007fe19eb0806e n/a (libdns.so.1304)
#6 0x00007fe19eb0baba n/a (libdns.so.1304)
#7 0x00007fe19e96f349 n/a (libisc.so.1304)
#8 0x00007fe19de5ea9d start_thread (libpthread.so.0)
#9 0x00007fe19dd8eb23 __clone (libc.so.6)

Stack trace of thread 25250:
#0 0x00007fe19dccbb4c __sigtimedwait (libc.so.6)
#1 0x00007fe19de68f4d sigwait (libpthread.so.0)
#2 0x00007fe19e978e41 isc_app_ctxrun (libisc.so.1304)
#3 0x00007fe19e979158 isc_app_run (libisc.so.1304)
#4 0x000055e94e9cdfce n/a (named)
#5 0x00007fe19dcb7223 __libc_start_main (libc.so.6)
#6 0x000055e94e9ceb4e n/a (named)

Stack trace of thread 25258:
#0 0x00007fe19dd8ee57 epoll_wait (libc.so.6)
#1 0x00007fe19e981c0c n/a (libisc.so.1304)
#2 0x00007fe19de5ea9d start_thread (libpthread.so.0)
#3 0x00007fe19dd8eb23 __clone (libc.so.6)

Stack trace of thread 25257:
#0 0x00007fe19dd8ee57 epoll_wait (libc.so.6)
#1 0x00007fe19e981c0c n/a (libisc.so.1304)
#2 0x00007fe19de5ea9d start_thread (libpthread.so.0)
#3 0x00007fe19dd8eb23 __clone (libc.so.6)

Stack trace of thread 25253:
#0 0x00007fe19de64afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007fe19e96ef52 n/a (libisc.so.1304)
#2 0x00007fe19de5ea9d start_thread (libpthread.so.0)
#3 0x00007fe19dd8eb23 __clone (libc.so.6)

Stack trace of thread 25259:
#0 0x00007fe19dd8ee57 epoll_wait (libc.so.6)
#1 0x00007fe19e981c0c n/a (libisc.so.1304)
#2 0x00007fe19de5ea9d start_thread (libpthread.so.0)
#3 0x00007fe19dd8eb23 __clone (libc.so.6)

Stack trace of thread 25254:
#0 0x00007fe19de64afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007fe19e96ef52 n/a (libisc.so.1304)
#2 0x00007fe19de5ea9d start_thread (libpthread.so.0)
#3 0x00007fe19dd8eb23 __clone (libc.so.6)

Stack trace of thread 25256:
#0 0x00007fe19dd8ee57 epoll_wait (libc.so.6)
#1 0x00007fe19e981c0c n/a (libisc.so.1304)
#2 0x00007fe19de5ea9d start_thread (libpthread.so.0)
#3 0x00007fe19dd8eb23 __clone (libc.so.6)

Stack trace of thread 25255:
#0 0x00007fe19de64e5b pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007fe19e989e09 isc_condition_waituntil (libisc.so.1304)
#2 0x00007fe19e9756f4 n/a (libisc.so.1304)
#3 0x00007fe19de5ea9d start_thread (libpthread.so.0)
#4 0x00007fe19dd8eb23 __clone (libc.so.6)

Stack trace of thread 25251:
#0 0x00007fe19de64afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007fe19e96ef52 n/a (libisc.so.1304)
#2 0x00007fe19de5ea9d start_thread (libpthread.so.0)
#3 0x00007fe19dd8eb23 __clone (libc.so.6)

This task depends upon

Closed by Sébastien Luttringer (seblu)
Monday, 14 January 2019, 13:54 GMT
Reason for closing: Fixed

Comment by BAD+MAD (mat_weiss) - Monday, 26 November 2018, 11:03 GMT

...and the logging for named shows:

26-Nov-2018 11:44:37.149 general: critical: resolver.c:10484: REQUIRE(fetchp != ((void *)0) && *fetchp == ((void *)0)) failed, back trace
26-Nov-2018 11:44:37.149 general: critical: #0 0x55e94e9d74d2 in ??
26-Nov-2018 11:44:37.149 general: critical: #1 0x7fe19e94fcaa in ??
26-Nov-2018 11:44:37.149 general: critical: #2 0x7fe19eb0295b in ??
26-Nov-2018 11:44:37.149 general: critical: #3 0x7fe19eb0806e in ??
26-Nov-2018 11:44:37.149 general: critical: #4 0x7fe19eb0baba in ??
26-Nov-2018 11:44:37.149 general: critical: #5 0x7fe19e96f349 in ??
26-Nov-2018 11:44:37.149 general: critical: #6 0x7fe19de5ea9d in ??
26-Nov-2018 11:44:37.149 general: critical: #7 0x7fe19dd8eb23 in ??
26-Nov-2018 11:44:37.149 general: critical: exiting (due to assertion failure)

Comment by Søren Rindom Andersen (natlampen) - Tuesday, 27 November 2018, 17:18 GMT

I have seen similar coredump with the latest update.

As default named runs for both ipv4 and ipv6 however I do not use ipv6. After adding -4 to the service-file I have not seen any issues.

I think the issue could be related to ipv6.

Comment by BAD+MAD (mat_weiss) - Wednesday, 28 November 2018, 07:10 GMT

Thank you for this suggestion! I'll try that.
But then of course we have to adjust the service file during any update.

Version 9.13.3-3 does not have this error. Even without "-4" bind works perfectly and there are no coredumps.

On the ICS website https://www.isc.org/downloads/ the version 9.13.4 is listed as "unstable development".

Why is this used in Archlinux? Instead of offering a stable version as a package.

Comment by BAD+MAD (mat_weiss) - Wednesday, 28 November 2018, 09:12 GMT

In my case, the "-4" did not help! I had to go back to version 9.13.3-3.

-----------------------------------------------------------------------------------

cat /etc/systemd/system/multi-user.target.wants/named.service

[Unit]
Description=Internet domain name server
After=network.target

[Service]
ExecStart=/usr/bin/named -4 -f -u named
ExecReload=/usr/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

-----------------------------------------------------------------------------------

systemctl daemon-reload; \
systemctl restart named

-----------------------------------------------------------------------------------

journalctl -p 4

Nov 28 09:10:08 arch-linux systemd-coredump[30851]: Process 25971 (named) of user 40 dumped core.

Stack trace of thread 25974:
#0 0x00007f6cc277cd7f raise (libc.so.6)
#1 0x00007f6cc2767672 abort (libc.so.6)
#2 0x000055c66fbd202c n/a (named)
#3 0x00007f6cc3401caa isc_assertion_failed (libisc.so.1304)
#4 0x00007f6cc35b495b dns_resolver_createfetch (libdns.so.1304)
#5 0x00007f6cc35ba06e n/a (libdns.so.1304)
#6 0x00007f6cc35bdaba n/a (libdns.so.1304)
#7 0x00007f6cc3421349 n/a (libisc.so.1304)
#8 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#9 0x00007f6cc2840b23 __clone (libc.so.6)

Stack trace of thread 25982:
#0 0x00007f6cc2840e57 epoll_wait (libc.so.6)
#1 0x00007f6cc3433c0c n/a (libisc.so.1304)
#2 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#3 0x00007f6cc2840b23 __clone (libc.so.6)

Stack trace of thread 25979:
#0 0x00007f6cc2840e57 epoll_wait (libc.so.6)
#1 0x00007f6cc3433c0c n/a (libisc.so.1304)
#2 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#3 0x00007f6cc2840b23 __clone (libc.so.6)

Stack trace of thread 25977:
#0 0x00007f6cc2916afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6cc3420f52 n/a (libisc.so.1304)
#2 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#3 0x00007f6cc2840b23 __clone (libc.so.6)

Stack trace of thread 25971:
#0 0x00007f6cc277db4c __sigtimedwait (libc.so.6)
#1 0x00007f6cc291af4d sigwait (libpthread.so.0)
#2 0x00007f6cc342ae41 isc_app_ctxrun (libisc.so.1304)
#3 0x00007f6cc342b158 isc_app_run (libisc.so.1304)
#4 0x000055c66fbd2fce n/a (named)
#5 0x00007f6cc2769223 __libc_start_main (libc.so.6)
#6 0x000055c66fbd3b4e n/a (named)

Stack trace of thread 25981:
#0 0x00007f6cc2840e57 epoll_wait (libc.so.6)
#1 0x00007f6cc3433c0c n/a (libisc.so.1304)
#2 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#3 0x00007f6cc2840b23 __clone (libc.so.6)

Stack trace of thread 25978:
#0 0x00007f6cc2916e5b pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6cc343be09 isc_condition_waituntil (libisc.so.1304)
#2 0x00007f6cc34276f4 n/a (libisc.so.1304)
#3 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#4 0x00007f6cc2840b23 __clone (libc.so.6)

Stack trace of thread 25976:
#0 0x00007f6cc2916afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6cc3420f52 n/a (libisc.so.1304)
#2 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#3 0x00007f6cc2840b23 __clone (libc.so.6)

Stack trace of thread 25980:
#0 0x00007f6cc2840e57 epoll_wait (libc.so.6)
#1 0x00007f6cc3433c0c n/a (libisc.so.1304)
#2 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#3 0x00007f6cc2840b23 __clone (libc.so.6)

Stack trace of thread 25975:
#0 0x00007f6cc2916afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6cc3420f52 n/a (libisc.so.1304)
#2 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#3 0x00007f6cc2840b23 __clone (libc.so.6)

-----------------------------------------------------------------------------------

...and the logging for named shows:

28-Nov-2018 09:10:06.655 general: critical: resolver.c:10484: REQUIRE(fetchp != ((void *)0) && *fetchp == ((void *)0)) failed, back trace
28-Nov-2018 09:10:06.656 general: critical: #0 0x55c66fbdc4d2 in ??
28-Nov-2018 09:10:06.656 general: critical: #1 0x7f6cc3401caa in ??
28-Nov-2018 09:10:06.656 general: critical: #2 0x7f6cc35b495b in ??
28-Nov-2018 09:10:06.656 general: critical: #3 0x7f6cc35ba06e in ??
28-Nov-2018 09:10:06.656 general: critical: #4 0x7f6cc35bdaba in ??
28-Nov-2018 09:10:06.656 general: critical: #5 0x7f6cc3421349 in ??
28-Nov-2018 09:10:06.656 general: critical: #6 0x7f6cc2910a9d in ??
28-Nov-2018 09:10:06.656 general: critical: #7 0x7f6cc2840b23 in ??
28-Nov-2018 09:10:06.656 general: critical: exiting (due to assertion failure)

Comment by Søren Rindom Andersen (natlampen) - Wednesday, 28 November 2018, 17:04 GMT

It took a while but eventually the -4 solution I suggested earlier failed for me as well.

Comment by BAD+MAD (mat_weiss) - Thursday, 29 November 2018, 06:33 GMT

Hello Søren,

it seems like nobody who uses his
Archlinux as a DNS server has the
same problem.

I think that our machines are broken ;-).

However.
My machine is running in a productive
environment and I rely need the DNS server.

I have to prevent the update of bind and
bind-tools and have to stay on version
9.13.3-3.

I can only hope that someday in the future
somebody has the same problem and maybe a
better idea to solve it.

Comment by Sergej Pupykin (sergej) - Thursday, 29 November 2018, 09:34 GMT

The same "resolver.c:10484: REQUIRE(fetchp != ((void *)0) && *fetchp == ((void *)0)) failed" error but on armv7h architecture.

Comment by Neil Darlow (neildarlow) - Friday, 30 November 2018, 09:37 GMT

I use bind as my nameserver and this update is broken. Seems to fail after a few hours with a SIGSEGV for me. I'll have to revert to a previous working version also.

Comment by Christian Wolf (christianlupus) - Friday, 30 November 2018, 14:41 GMT

I can confirm this behavior on my amd64 machine.

I do not know after which time this happens or what triggers it. Unfortunately in my case it takes quite some time before the issue happens. Last time was ~4 days ago.

If anyone has a clue (or idea how to debug), please tell me.

Comment by Jérôme (jerome-rdlv) - Saturday, 01 December 2018, 11:16 GMT

I also have this problem on my machine after updating this morning.

> resolver.c:10484: REQUIRE(fetchp != ((void *)0) && *fetchp == ((void *)0)) failed, back trace
> #0 0x55fe8e77d4d2 in ?
> #1 0x7fee5ac72caa in ??
> #2 0x7fee5ae2595b in ??
> #3 0x7fee5ae2b06e in ??
> #4 0x7fee5ae2eaba in ??
> #5 0x7fee5ac92349 in ??
> #6 0x7fee5a181a9d in ??
> #7 0x7fee5a0b1b23 in ??
> exiting (due to assertion failure)
> Main process exited, code=killed, status=6/ABRT
> Failed with result 'signal'.
> Process 619 (named) of user 40 dumped core.

Comment by loqs (loqs) - Saturday, 01 December 2018, 13:54 GMT

Would suggest creating a bind-git PKGBUILD then bisecting between 9.13.3 and 9.13.4
or build bind with debug symbols and report the issue upstream including a backtrace that includes debug symbols.

Comment by P. Shao (shallpion) - Monday, 03 December 2018, 17:21 GMT

Glad I am not alone. Same issue every few hours. I run Bind ina semi production environment and it has caused quite some complaints...

Is it possible to revert this back to 9.13.3-3? I am manually ignoring bind for now. Thanks

Comment by Jérôme (jerome-rdlv) - Monday, 03 December 2018, 17:24 GMT

I’ve been able to revert to bind-9.13.3-3 without any noticeable issues. But I would need to downgrade bind-tools too:
pacman -U /var/cache/pacman/pkg/bind-tools-9.13.3-3-x86_64.pkg.tar.xz /var/cache/^Ccman/pkg/bind-9.13.3-3-x86_64.pkg.tar.xz

Comment by J. Morse Loyola (irb) - Tuesday, 04 December 2018, 17:36 GMT

Same issue here. Would this be ~~FS#40304~~ again? Related to GCC optimizations?

Comment by loqs (loqs) - Tuesday, 04 December 2018, 18:37 GMT

@irb both bind-9.13.3-3 and bind-9.13.4-1 were built with gcc-8.2.1+20180831-1-x86_64 and the failing REQUIRE(fetchp != ((void *)0) && *fetchp == ((void *)0)) is years old.
You could build the package locally with optimizations off at the same time as you enable debug symbols to report the issue upstream if you want to test.

Comment by David Ford (FirefighterBlu3) - Tuesday, 04 December 2018, 23:16 GMT

running as a caching/forwarding server, no local domains, seeing this as well

Comment by David C. Rankin (drankinatty) - Friday, 07 December 2018, 19:23 GMT

I and 3 others on the arch-general list can confirm this behavior (see thread: "bind/named dying after 24-48 hrs. "assertion failure"?" from 12/7). Following update to the latest bind, named seem to die with an "assertion failure" every 24-48 hours. Checking the daemon status reveals, e.g.:

● named.service - Internet domain name server
Loaded: loaded (/usr/lib/systemd/system/named.service; enabled; vendor
preset: disabled)
Active: failed (Result: signal) since Thu 2018-12-06 10:35:51 CST; 15h ago
Process: 23007 ExecStart=/usr/bin/named -f -u named (code=killed, signal=ABRT)
Main PID: 23007 (code=killed, signal=ABRT)

Dec 06 10:35:51 phoinix named[23007]: #1 0x7f078ea0fcaa in ??
Dec 06 10:35:51 phoinix named[23007]: #2 0x7f078ebc295b in ??
Dec 06 10:35:51 phoinix named[23007]: #3 0x7f078ebc806e in ??
Dec 06 10:35:51 phoinix named[23007]: #4 0x7f078ebcbaba in ??
Dec 06 10:35:51 phoinix named[23007]: #5 0x7f078ea2f349 in ??
Dec 06 10:35:51 phoinix named[23007]: #6 0x7f078df1ea9d in ??
Dec 06 10:35:51 phoinix named[23007]: #7 0x7f078de4eb23 in ??
Dec 06 10:35:51 phoinix named[23007]: exiting (due to assertion failure)
Dec 06 10:35:51 phoinix systemd[1]: named.service: Main process exited,
code=killed, status=6/ABRT
Dec 06 10:35:51 phoinix systemd[1]: named.service: Failed with result 'signal'.

A simple restart brings it back to life. The only "guess" I had was it may possibly be due to the systemd timers reported on restart and the managed keys somehow timing out. I take that from the restart output of:

<snip>
Dec 07 02:04:03 phoinix named[26487]: all zones loaded
Dec 07 02:04:03 phoinix named[26487]: running
Dec 07 02:04:04 phoinix named[26487]: managed-keys-zone: Key 19036 for zone .
is now trusted (acceptance timer com>
Dec 07 02:04:04 phoinix named[26487]: managed-keys-zone: Key 20326 for zone .
is now trusted (acceptance timer com>
Dec 07 02:04:04 phoinix named[26487]: resolver priming query complete

Comment by AMM (amish) - Monday, 10 December 2018, 03:07 GMT

If service is critical then a temporary workaround is:

/etc/systemd/system/named.service.d/onfail.conf

[Service]
Restart=on-failure
RestartSec=2s

Customize as per your wish (man systemd.service). Then run:
systemctl daemon-reload

May be this should be made the default because most people who use named / bind would never want to shut the service down ever unless system itself is shutdown.

Comment by BAD+MAD (mat_weiss) - Monday, 10 December 2018, 07:47 GMT

The problem should be solved.

Monitoring a service to restart it after
a crash looks like a "Redmond-like" problem solving. ;-)

But of course it's a workaround.

The problem exists now for a long time.
I'm wondering that it's not even assigned to anyone!

Comment by BAD+MAD (mat_weiss) - Thursday, 13 December 2018, 15:53 GMT

The same on bind 9.13.5-1 and bind-tools 9.13.5-1 (Have to switch back to 9.13.3-3)

journalctl -f -p 4
==================

Dez 13 16:43:59 systemd-coredump[21108]: Process 20018 (named) of user 40 dumped core.

Stack trace of thread 20019:
#0 0x00007f6977ee6d7f raise (libc.so.6)
#1 0x00007f6977ed1672 abort (libc.so.6)
#2 0x0000560281ba002c n/a (named)
#3 0x00007f6978b6bc6a isc_assertion_failed (libisc.so.1305)
#4 0x00007f6978d1ec5b dns_resolver_createfetch (libdns.so.1305)
#5 0x00007f6978d2436e n/a (libdns.so.1305)
#6 0x00007f6978d27dba n/a (libdns.so.1305)
#7 0x00007f6978b8b1c9 n/a (libisc.so.1305)
#8 0x00007f697807aa9d start_thread (libpthread.so.0)
#9 0x00007f6977faab23 __clone (libc.so.6)

Stack trace of thread 20027:
#0 0x00007f6977faae57 epoll_wait (libc.so.6)
#1 0x00007f6978b9d9fc n/a (libisc.so.1305)
#2 0x00007f697807aa9d start_thread (libpthread.so.0)
#3 0x00007f6977faab23 __clone (libc.so.6)

Stack trace of thread 20024:
#0 0x00007f6977faae57 epoll_wait (libc.so.6)
#1 0x00007f6978b9d9fc n/a (libisc.so.1305)
#2 0x00007f697807aa9d start_thread (libpthread.so.0)
#3 0x00007f6977faab23 __clone (libc.so.6)

Stack trace of thread 20018:
#0 0x00007f6977ee7b4c __sigtimedwait (libc.so.6)
#1 0x00007f6978084f4d sigwait (libpthread.so.0)
#2 0x00007f6978b94c41 isc_app_ctxrun (libisc.so.1305)
#3 0x00007f6978b94f58 isc_app_run (libisc.so.1305)
#4 0x0000560281ba0fce main (named)
#5 0x00007f6977ed3223 __libc_start_main (libc.so.6)
#6 0x0000560281ba1b4e _start (named)

Stack trace of thread 20022:
#0 0x00007f6978080afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6978b8add2 n/a (libisc.so.1305)
#2 0x00007f697807aa9d start_thread (libpthread.so.0)
#3 0x00007f6977faab23 __clone (libc.so.6)

Stack trace of thread 20025:
#0 0x00007f6977faae57 epoll_wait (libc.so.6)
#1 0x00007f6978b9d9fc n/a (libisc.so.1305)
#2 0x00007f697807aa9d start_thread (libpthread.so.0)
#3 0x00007f6977faab23 __clone (libc.so.6)

Stack trace of thread 20020:
#0 0x00007f6978080afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6978b8add2 n/a (libisc.so.1305)
#2 0x00007f697807aa9d start_thread (libpthread.so.0)
#3 0x00007f6977faab23 __clone (libc.so.6)

Stack trace of thread 20023:
#0 0x00007f6978080e5b pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6978ba5b99 isc_condition_waituntil (libisc.so.1305)
#2 0x00007f6978b91534 n/a (libisc.so.1305)
#3 0x00007f697807aa9d start_thread (libpthread.so.0)
#4 0x00007f6977faab23 __clone (libc.so.6)

Stack trace of thread 20026:
#0 0x00007f6977faae57 epoll_wait (libc.so.6)
#1 0x00007f6978b9d9fc n/a (libisc.so.1305)
#2 0x00007f697807aa9d start_thread (libpthread.so.0)
#3 0x00007f6977faab23 __clone (libc.so.6)

Stack trace of thread 20021:
#0 0x00007f6978080afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6978b8add2 n/a (libisc.so.1305)
#2 0x00007f697807aa9d start_thread (libpthread.so.0)
#3 0x00007f6977faab23 __clone (libc.so.6)

-------------------------------------------------------------------------------------------------------------------------------------------

cat /var/named/data/named.run
=============================

13-Dec-2018 16:43:57.994 general: critical: resolver.c:10470: REQUIRE(fetchp != ((void *)0) && *fetchp == ((void *)0)) failed, back trace
13-Dec-2018 16:43:57.994 general: critical: #0 0x560281baa4d2 in ??
13-Dec-2018 16:43:57.994 general: critical: #1 0x7f6978b6bc6a in ??
13-Dec-2018 16:43:57.994 general: critical: #2 0x7f6978d1ec5b in ??
13-Dec-2018 16:43:57.994 general: critical: #3 0x7f6978d2436e in ??
13-Dec-2018 16:43:57.994 general: critical: #4 0x7f6978d27dba in ??
13-Dec-2018 16:43:57.994 general: critical: #5 0x7f6978b8b1c9 in ??
13-Dec-2018 16:43:57.994 general: critical: #6 0x7f697807aa9d in ??
13-Dec-2018 16:43:57.994 general: critical: #7 0x7f6977faab23 in ??
13-Dec-2018 16:43:57.994 general: critical: exiting (due to assertion failure)

-------------------------------------------------------------------------------------------------------------------------------------------

Comment by David C. Rankin (drankinatty) - Saturday, 15 December 2018, 23:48 GMT

Confirmed with 9.13.5-1

17:44 phoinix:~> scs named
● named.service - Internet domain name server
Loaded: loaded (/usr/lib/systemd/system/named.service; enabled; vendor preset: disabled)
Active: failed (Result: signal) since Sat 2018-12-15 03:53:57 CST; 13h ago
Process: 18825 ExecStart=/usr/bin/named -f -u named (code=killed, signal=ABRT)
Main PID: 18825 (code=killed, signal=ABRT)

Dec 15 03:53:57 phoinix named[18825]: #1 0x7f7d60aefc6a in ??
Dec 15 03:53:57 phoinix named[18825]: #2 0x7f7d60ca2c5b in ??
Dec 15 03:53:57 phoinix named[18825]: #3 0x7f7d60ca836e in ??
Dec 15 03:53:57 phoinix named[18825]: #4 0x7f7d60cabdba in ??
Dec 15 03:53:57 phoinix named[18825]: #5 0x7f7d60b0f1c9 in ??
Dec 15 03:53:57 phoinix named[18825]: #6 0x7f7d5fffea9d in ??
Dec 15 03:53:57 phoinix named[18825]: #7 0x7f7d5ff2eb23 in ??
Dec 15 03:53:57 phoinix named[18825]: exiting (due to assertion failure)
Dec 15 03:53:57 phoinix systemd[1]: named.service: Main process exited, code=killed, status=6/ABRT
Dec 15 03:53:57 phoinix systemd[1]: named.service: Failed with result 'signal'.

Comment by Nocturne (Nocturne) - Monday, 17 December 2018, 17:07 GMT

Seeing the exact same problems with both 9.13.4-1 and 9.13.5-1, reverted back to 9.13.3-3 and things are working as they should.

Edit:
I have bind running on three different servers. On two of the servers, 9.13.5-1 has been running for about 26 hours without any problems whatsoever. On the third server, the crashes that everyone else here is seeing, forcing me to downgrade to 9.13.3-3. Some differences between the working and non-working servers(if it makes any difference):

Working servers:
-Running bind in a chrooted environment on both working servers.
-One working server is an Intel Core2 Duo CPU, the other working server is an Intel Core i5 CPU.

Non-working server:
-NOT running bind in chrooted environment.
-Non-working server is a Dual Xeon E5405 CPU setup.

How about the rest of you people having issues? What type of CPU? Chroot or non-chroot?

Comment by Jason Carr (jason2) - Tuesday, 18 December 2018, 04:00 GMT

I'm seeing the same issues as well.

bind 9.13.5-1

17-Dec-2018 16:17:54.266 resolver.c:10470: REQUIRE(fetchp != ((void *)0) && *fetchp == ((void *)0)) failed, back trace
17-Dec-2018 16:17:54.266 #0 0x561ea22734d2 in ??
17-Dec-2018 16:17:54.267 #1 0x7f8d6d851c6a in ??
17-Dec-2018 16:17:54.267 #2 0x7f8d6da04c5b in ??
17-Dec-2018 16:17:54.267 #3 0x7f8d6da0a36e in ??
17-Dec-2018 16:17:54.267 #4 0x7f8d6da0ddba in ??
17-Dec-2018 16:17:54.267 #5 0x7f8d6d8711c9 in ??
17-Dec-2018 16:17:54.267 #6 0x7f8d6cd60a9d in ??
17-Dec-2018 16:17:54.267 #7 0x7f8d6cc90b23 in ??
17-Dec-2018 16:17:54.267 exiting (due to assertion failure)

Comment by David Ford (FirefighterBlu3) - Saturday, 22 December 2018, 15:03 GMT

possibly related:

22-Dec-2018 08:58:04.898 dispatch: info: dispatch 0x7fd508984da0: shutting down due to TCP receive error: 8.8.4.4#53: connection reset
22-Dec-2018 09:06:19.202 general: critical: resolver.c:10470: REQUIRE(fetchp != ((void *)0) && *fetchp == ((void *)0)) failed, back trace

Comment by loqs (loqs) - Saturday, 22 December 2018, 16:03 GMT

Are those affected waiting for the bug to be assigned before reporting the issue upstream or producing a backtrace with debug symbols or bisecting between 9.13.3 and 9.13.4?
As upstream could have been working on the issue for the last four weeks if someone affected had reported it to them.

Comment by BAD+MAD (mat_weiss) - Monday, 24 December 2018, 07:25 GMT

@loqs (loqs)

That's such a wise statement!

But what do you think? Do we know that or not? Sorry, that was a rhetorical question, it does not have to be answered by you.

Not everyone has the technical understanding to do such a thing and not everyone is a kernel programmer.

There are people who use a server for server services and not for testing software.

In a productive environment, the DNS server just has to work and you rarely have time to deal with debugging crashes.

Please let us share your great wisdom and help us to prepare the necessary information.

Comment by loqs (loqs) - Monday, 24 December 2018, 11:40 GMT

https://wiki.archlinux.org/index.php/Debug_-_Getting_Traces
https://wiki.archlinux.org/index.php/Step-by-step_debugging_guide
https://wiki.archlinux.org/index.php/Bisecting_bugs_with_Git
https://gitlab.isc.org/isc-projects/bind9/issues
As there is no bind-git package I would suggest starting with generating a backtrace with debug symbols and reporting that upstream.
I can try creating a bind-git package if needed.

Comment by loqs (loqs) - Thursday, 03 January 2019, 15:52 GMT

https://gitlab.isc.org/isc-projects/bind9/issues/797

Comment by David Ford (FirefighterBlu3) - Thursday, 03 January 2019, 16:37 GMT

I've posted a detailed gdb traceback and @ISC has a patch in progress. In summary, a forwarder failed and the query is retried but is done in the wrong place with prior allocated data thus triggering the REQUIRE() assertion. The ISC patch will correct this.

Comment by Ondřej Surý (oerdnj) - Thursday, 03 January 2019, 17:41 GMT

The patch under review is available here: https://gitlab.isc.org/isc-projects/bind9/merge_requests/1293.patch

And while I appreciate the test base of all Arch Linux users, I would strongly recommend not using development releases as a base version for all Arch Linux users. Please stay with latest stable release (BIND 9.14 would be good choice in couple of weeks)

Comment by Sébastien Luttringer (seblu) - Thursday, 03 January 2019, 23:28 GMT

I just discovered this bug report. I'm also running a production dns server and I have no such issue.

@oerdnj: We will stop using the development releases and stick with stables as soon as 9.14 is out.

Comment by AMM (amish) - Friday, 04 January 2019, 06:53 GMT

Since we are already running on unstable version. Request to please apply the patch and release new version instead of waiting for 2 weeks. (please don't change epoch and downgrade)

Also as per this note: https://gitlab.isc.org/isc-projects/bind9/issues/797#note_37322

Odd number (9.13) is unstable release and even number (9.14) is stable release. But this is not marked clearly on their BIND download page and hence creates impression that 9.13 is a latest stable release.

@seblu
> I just discovered this bug report. I'm also running a production dns server and I have no such issue.

It happens only if you are using forwarders with forward first.

Comment by Neil Darlow (neildarlow) - Friday, 04 January 2019, 07:29 GMT

If there is no policy to dictate that stable versions should be used in preference to development/unstable ones then the choice is at the discretion of the maintainer. In many cases is isn't an issue but there is always a risk.

As I understand it, this bug is related to retries associated with forwarders. If you don't employ forwarders or your connection to your forwarders is supremely reliable then you might not experience this failure.

Until a fix is released the solution is to downgrade to 9.13.3 which you can do by fetching the bind and bind-tools packages from archive.archlinux.org.

Comment by AMM (amish) - Friday, 04 January 2019, 07:40 GMT

> If there is no policy to dictate that stable versions should be used in preference to development/unstable ones then the choice is at the discretion of the maintainer

https://wiki.archlinux.org/index.php/Arch_Linux

The first line states:

Arch Linux is an independently developed, x86-64 general-purpose GNU/Linux distribution that strives to provide the **latest stable versions** of most software by following a rolling-release model.

Comment by Ondřej Surý (oerdnj) - Friday, 04 January 2019, 07:52 GMT

> If there is no policy to dictate that stable versions should be used in preference to development/unstable ones then the choice is at the discretion of the maintainer. In many cases is isn't an issue but there is always a risk.

With my BIND team hat on - I am hoping to get to the point where I would recommend running the latest (development) release because it will be production stable sometimes next year (aka nginx model), but I wouldn't recommend it right now.

Anyway, while I agree the BIND download page needs some fixing, the first paragraph of the release notes says:

> BIND 9.13 is an unstable development release of BIND.
> This document summarizes new features and functional changes that
> have been introduced on this branch. With each development release
> leading up to the stable BIND 9.14 release, this document will be
> updated with additional features added and bugs fixed.

Also, if the Arch Linux maintainer isn't on our package maintainer list, please contact me, I'll get you added to the list.

Comment by Sébastien Luttringer (seblu) - Friday, 04 January 2019, 12:30 GMT

@oerdnj:

Yes, we will stick to the stable releases. As already mentioned in ~~FS#59464~~ , I missed the new release model introduced in 9.13.
Fortunately, developments release was enough good (until now) to not trigger an epoch version switch.
I'm happy about the close release of 9.14 you announced. This will make us back on track.

I'm in isc-os-security@lists.isc.org, is that the maintainer list you think of?

@amish:
Yes it makes sense to apply the patch and push a package to testing.

@mat_weiss:
You wrote like a grumpy customer, which make me emotionally unable to help you.

Comment by BAD+MAD (mat_weiss) - Friday, 04 January 2019, 12:34 GMT

I can reproduce that removing the forwarders option from named.conf will allow updating the package. Without the DNS server crashes after a while.
Of course, that does not solve the initial problem. For this, the patch would have to be included in the Archlinux package. Or be switched to a stable version.

Comment by Neil Darlow (neildarlow) - Friday, 04 January 2019, 13:17 GMT

Let's play nice everyone.

There is a workaround which involves a simple downgrade and functionality is restored. You can even install 9.13.5 if you don't employ forwarders although that wouldn't be a recommended solution.

Comment by Eli Schwartz (eschwartz) - Friday, 04 January 2019, 16:25 GMT

@mat_weiss,

I deleted several superbly unhelpful comments from you, which I do not wish to see repeated on the bugtracker. Let this be an end to it.

Comment by Sébastien Luttringer (seblu) - Friday, 04 January 2019, 21:00 GMT

The version bind-9.13.5-2 with the patch is in testing. Let me know if it's fix your issue.

Comment by Sébastien Luttringer (seblu) - Saturday, 05 January 2019, 23:08 GMT

Could I have feedback about the new version? Does it helps?

Comment by AMM (amish) - Sunday, 06 January 2019, 00:45 GMT

Running from about 10-12hours now. Havent seen a crash yet. Whether patch has fixed the bug or not I cant tell, but atleast it hasnt caused any other issue. So may be we can move it to extra.

PS: Hoping that more people test it and reply here.

Comment by BAD+MAD (mat_weiss) - Monday, 07 January 2019, 06:57 GMT

Packages from "TESTING" have already been running for several hours, with options "forwarders" and "forward first" being used. Currently no crashes.

Comment by BAD+MAD (mat_weiss) - Tuesday, 08 January 2019, 06:20 GMT

Packages from "TESTING" have already been running for more than 24 hours, with options "forwarders" and "forward first" being used. Currently no crashes. 30 clients are using the DNS-Server.

	Tasks related to this task (0)

Duplicate tasks of this task (1)
~~FS#61063 - [bind] CoreDump after update to bind 9.13.5-1~~

Arch Linux

FS#60913 - [bind] CoreDump after update to bind 9.13.4-1

Details

Loading...