FS#60913 - [bind] CoreDump after update to bind 9.13.4-1
Attached to Project:
Arch Linux
Opened by BAD+MAD (mat_weiss) - Monday, 26 November 2018, 10:59 GMT
Last edited by Sébastien Luttringer (seblu) - Monday, 14 January 2019, 13:54 GMT
Opened by BAD+MAD (mat_weiss) - Monday, 26 November 2018, 10:59 GMT
Last edited by Sébastien Luttringer (seblu) - Monday, 14 January 2019, 13:54 GMT
|
Details
Description:
After the update to version bind 9.13.4-1 I get coredumps and bind crashes! Additional info: journalctl -p 4 systemd-coredump[27428]: Process 25250 (named) of user 40 dumped core. Stack trace of thread 25252: #0 0x00007fe19dccad7f raise (libc.so.6) #1 0x00007fe19dcb5672 abort (libc.so.6) #2 0x000055e94e9cd02c n/a (named) #3 0x00007fe19e94fcaa isc_assertion_failed (libisc.so.1304) #4 0x00007fe19eb0295b dns_resolver_createfetch (libdns.so.1304) #5 0x00007fe19eb0806e n/a (libdns.so.1304) #6 0x00007fe19eb0baba n/a (libdns.so.1304) #7 0x00007fe19e96f349 n/a (libisc.so.1304) #8 0x00007fe19de5ea9d start_thread (libpthread.so.0) #9 0x00007fe19dd8eb23 __clone (libc.so.6) Stack trace of thread 25250: #0 0x00007fe19dccbb4c __sigtimedwait (libc.so.6) #1 0x00007fe19de68f4d sigwait (libpthread.so.0) #2 0x00007fe19e978e41 isc_app_ctxrun (libisc.so.1304) #3 0x00007fe19e979158 isc_app_run (libisc.so.1304) #4 0x000055e94e9cdfce n/a (named) #5 0x00007fe19dcb7223 __libc_start_main (libc.so.6) #6 0x000055e94e9ceb4e n/a (named) Stack trace of thread 25258: #0 0x00007fe19dd8ee57 epoll_wait (libc.so.6) #1 0x00007fe19e981c0c n/a (libisc.so.1304) #2 0x00007fe19de5ea9d start_thread (libpthread.so.0) #3 0x00007fe19dd8eb23 __clone (libc.so.6) Stack trace of thread 25257: #0 0x00007fe19dd8ee57 epoll_wait (libc.so.6) #1 0x00007fe19e981c0c n/a (libisc.so.1304) #2 0x00007fe19de5ea9d start_thread (libpthread.so.0) #3 0x00007fe19dd8eb23 __clone (libc.so.6) Stack trace of thread 25253: #0 0x00007fe19de64afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fe19e96ef52 n/a (libisc.so.1304) #2 0x00007fe19de5ea9d start_thread (libpthread.so.0) #3 0x00007fe19dd8eb23 __clone (libc.so.6) Stack trace of thread 25259: #0 0x00007fe19dd8ee57 epoll_wait (libc.so.6) #1 0x00007fe19e981c0c n/a (libisc.so.1304) #2 0x00007fe19de5ea9d start_thread (libpthread.so.0) #3 0x00007fe19dd8eb23 __clone (libc.so.6) Stack trace of thread 25254: #0 0x00007fe19de64afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fe19e96ef52 n/a (libisc.so.1304) #2 0x00007fe19de5ea9d start_thread (libpthread.so.0) #3 0x00007fe19dd8eb23 __clone (libc.so.6) Stack trace of thread 25256: #0 0x00007fe19dd8ee57 epoll_wait (libc.so.6) #1 0x00007fe19e981c0c n/a (libisc.so.1304) #2 0x00007fe19de5ea9d start_thread (libpthread.so.0) #3 0x00007fe19dd8eb23 __clone (libc.so.6) Stack trace of thread 25255: #0 0x00007fe19de64e5b pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fe19e989e09 isc_condition_waituntil (libisc.so.1304) #2 0x00007fe19e9756f4 n/a (libisc.so.1304) #3 0x00007fe19de5ea9d start_thread (libpthread.so.0) #4 0x00007fe19dd8eb23 __clone (libc.so.6) Stack trace of thread 25251: #0 0x00007fe19de64afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fe19e96ef52 n/a (libisc.so.1304) #2 0x00007fe19de5ea9d start_thread (libpthread.so.0) #3 0x00007fe19dd8eb23 __clone (libc.so.6) |
This task depends upon
26-Nov-2018 11:44:37.149 general: critical: resolver.c:10484: REQUIRE(fetchp != ((void *)0) && *fetchp == ((void *)0)) failed, back trace
26-Nov-2018 11:44:37.149 general: critical: #0 0x55e94e9d74d2 in ??
26-Nov-2018 11:44:37.149 general: critical: #1 0x7fe19e94fcaa in ??
26-Nov-2018 11:44:37.149 general: critical: #2 0x7fe19eb0295b in ??
26-Nov-2018 11:44:37.149 general: critical: #3 0x7fe19eb0806e in ??
26-Nov-2018 11:44:37.149 general: critical: #4 0x7fe19eb0baba in ??
26-Nov-2018 11:44:37.149 general: critical: #5 0x7fe19e96f349 in ??
26-Nov-2018 11:44:37.149 general: critical: #6 0x7fe19de5ea9d in ??
26-Nov-2018 11:44:37.149 general: critical: #7 0x7fe19dd8eb23 in ??
26-Nov-2018 11:44:37.149 general: critical: exiting (due to assertion failure)
As default named runs for both ipv4 and ipv6 however I do not use ipv6. After adding -4 to the service-file I have not seen any issues.
I think the issue could be related to ipv6.
But then of course we have to adjust the service file during any update.
Version 9.13.3-3 does not have this error. Even without "-4" bind works perfectly and there are no coredumps.
On the ICS website https://www.isc.org/downloads/ the version 9.13.4 is listed as "unstable development".
Why is this used in Archlinux? Instead of offering a stable version as a package.
-----------------------------------------------------------------------------------
cat /etc/systemd/system/multi-user.target.wants/named.service
[Unit]
Description=Internet domain name server
After=network.target
[Service]
ExecStart=/usr/bin/named -4 -f -u named
ExecReload=/usr/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
-----------------------------------------------------------------------------------
systemctl daemon-reload; \
systemctl restart named
-----------------------------------------------------------------------------------
journalctl -p 4
Nov 28 09:10:08 arch-linux systemd-coredump[30851]: Process 25971 (named) of user 40 dumped core.
Stack trace of thread 25974:
#0 0x00007f6cc277cd7f raise (libc.so.6)
#1 0x00007f6cc2767672 abort (libc.so.6)
#2 0x000055c66fbd202c n/a (named)
#3 0x00007f6cc3401caa isc_assertion_failed (libisc.so.1304)
#4 0x00007f6cc35b495b dns_resolver_createfetch (libdns.so.1304)
#5 0x00007f6cc35ba06e n/a (libdns.so.1304)
#6 0x00007f6cc35bdaba n/a (libdns.so.1304)
#7 0x00007f6cc3421349 n/a (libisc.so.1304)
#8 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#9 0x00007f6cc2840b23 __clone (libc.so.6)
Stack trace of thread 25982:
#0 0x00007f6cc2840e57 epoll_wait (libc.so.6)
#1 0x00007f6cc3433c0c n/a (libisc.so.1304)
#2 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#3 0x00007f6cc2840b23 __clone (libc.so.6)
Stack trace of thread 25979:
#0 0x00007f6cc2840e57 epoll_wait (libc.so.6)
#1 0x00007f6cc3433c0c n/a (libisc.so.1304)
#2 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#3 0x00007f6cc2840b23 __clone (libc.so.6)
Stack trace of thread 25977:
#0 0x00007f6cc2916afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6cc3420f52 n/a (libisc.so.1304)
#2 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#3 0x00007f6cc2840b23 __clone (libc.so.6)
Stack trace of thread 25971:
#0 0x00007f6cc277db4c __sigtimedwait (libc.so.6)
#1 0x00007f6cc291af4d sigwait (libpthread.so.0)
#2 0x00007f6cc342ae41 isc_app_ctxrun (libisc.so.1304)
#3 0x00007f6cc342b158 isc_app_run (libisc.so.1304)
#4 0x000055c66fbd2fce n/a (named)
#5 0x00007f6cc2769223 __libc_start_main (libc.so.6)
#6 0x000055c66fbd3b4e n/a (named)
Stack trace of thread 25981:
#0 0x00007f6cc2840e57 epoll_wait (libc.so.6)
#1 0x00007f6cc3433c0c n/a (libisc.so.1304)
#2 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#3 0x00007f6cc2840b23 __clone (libc.so.6)
Stack trace of thread 25978:
#0 0x00007f6cc2916e5b pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6cc343be09 isc_condition_waituntil (libisc.so.1304)
#2 0x00007f6cc34276f4 n/a (libisc.so.1304)
#3 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#4 0x00007f6cc2840b23 __clone (libc.so.6)
Stack trace of thread 25976:
#0 0x00007f6cc2916afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6cc3420f52 n/a (libisc.so.1304)
#2 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#3 0x00007f6cc2840b23 __clone (libc.so.6)
Stack trace of thread 25980:
#0 0x00007f6cc2840e57 epoll_wait (libc.so.6)
#1 0x00007f6cc3433c0c n/a (libisc.so.1304)
#2 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#3 0x00007f6cc2840b23 __clone (libc.so.6)
Stack trace of thread 25975:
#0 0x00007f6cc2916afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6cc3420f52 n/a (libisc.so.1304)
#2 0x00007f6cc2910a9d start_thread (libpthread.so.0)
#3 0x00007f6cc2840b23 __clone (libc.so.6)
-----------------------------------------------------------------------------------
...and the logging for named shows:
28-Nov-2018 09:10:06.655 general: critical: resolver.c:10484: REQUIRE(fetchp != ((void *)0) && *fetchp == ((void *)0)) failed, back trace
28-Nov-2018 09:10:06.656 general: critical: #0 0x55c66fbdc4d2 in ??
28-Nov-2018 09:10:06.656 general: critical: #1 0x7f6cc3401caa in ??
28-Nov-2018 09:10:06.656 general: critical: #2 0x7f6cc35b495b in ??
28-Nov-2018 09:10:06.656 general: critical: #3 0x7f6cc35ba06e in ??
28-Nov-2018 09:10:06.656 general: critical: #4 0x7f6cc35bdaba in ??
28-Nov-2018 09:10:06.656 general: critical: #5 0x7f6cc3421349 in ??
28-Nov-2018 09:10:06.656 general: critical: #6 0x7f6cc2910a9d in ??
28-Nov-2018 09:10:06.656 general: critical: #7 0x7f6cc2840b23 in ??
28-Nov-2018 09:10:06.656 general: critical: exiting (due to assertion failure)
it seems like nobody who uses his
Archlinux as a DNS server has the
same problem.
I think that our machines are broken ;-).
However.
My machine is running in a productive
environment and I rely need the DNS server.
I have to prevent the update of bind and
bind-tools and have to stay on version
9.13.3-3.
I can only hope that someday in the future
somebody has the same problem and maybe a
better idea to solve it.
I do not know after which time this happens or what triggers it. Unfortunately in my case it takes quite some time before the issue happens. Last time was ~4 days ago.
If anyone has a clue (or idea how to debug), please tell me.
> resolver.c:10484: REQUIRE(fetchp != ((void *)0) && *fetchp == ((void *)0)) failed, back trace
> #0 0x55fe8e77d4d2 in ?
> #1 0x7fee5ac72caa in ??
> #2 0x7fee5ae2595b in ??
> #3 0x7fee5ae2b06e in ??
> #4 0x7fee5ae2eaba in ??
> #5 0x7fee5ac92349 in ??
> #6 0x7fee5a181a9d in ??
> #7 0x7fee5a0b1b23 in ??
> exiting (due to assertion failure)
> Main process exited, code=killed, status=6/ABRT
> Failed with result 'signal'.
> Process 619 (named) of user 40 dumped core.
or build bind with debug symbols and report the issue upstream including a backtrace that includes debug symbols.
Is it possible to revert this back to 9.13.3-3? I am manually ignoring bind for now. Thanks
pacman -U /var/cache/pacman/pkg/bind-tools-9.13.3-3-x86_64.pkg.tar.xz /var/cache/^Ccman/pkg/bind-9.13.3-3-x86_64.pkg.tar.xz
FS#40304again? Related to GCC optimizations?You could build the package locally with optimizations off at the same time as you enable debug symbols to report the issue upstream if you want to test.
● named.service - Internet domain name server
Loaded: loaded (/usr/lib/systemd/system/named.service; enabled; vendor
preset: disabled)
Active: failed (Result: signal) since Thu 2018-12-06 10:35:51 CST; 15h ago
Process: 23007 ExecStart=/usr/bin/named -f -u named (code=killed, signal=ABRT)
Main PID: 23007 (code=killed, signal=ABRT)
Dec 06 10:35:51 phoinix named[23007]: #1 0x7f078ea0fcaa in ??
Dec 06 10:35:51 phoinix named[23007]: #2 0x7f078ebc295b in ??
Dec 06 10:35:51 phoinix named[23007]: #3 0x7f078ebc806e in ??
Dec 06 10:35:51 phoinix named[23007]: #4 0x7f078ebcbaba in ??
Dec 06 10:35:51 phoinix named[23007]: #5 0x7f078ea2f349 in ??
Dec 06 10:35:51 phoinix named[23007]: #6 0x7f078df1ea9d in ??
Dec 06 10:35:51 phoinix named[23007]: #7 0x7f078de4eb23 in ??
Dec 06 10:35:51 phoinix named[23007]: exiting (due to assertion failure)
Dec 06 10:35:51 phoinix systemd[1]: named.service: Main process exited,
code=killed, status=6/ABRT
Dec 06 10:35:51 phoinix systemd[1]: named.service: Failed with result 'signal'.
A simple restart brings it back to life. The only "guess" I had was it may possibly be due to the systemd timers reported on restart and the managed keys somehow timing out. I take that from the restart output of:
<snip>
Dec 07 02:04:03 phoinix named[26487]: all zones loaded
Dec 07 02:04:03 phoinix named[26487]: running
Dec 07 02:04:04 phoinix named[26487]: managed-keys-zone: Key 19036 for zone .
is now trusted (acceptance timer com>
Dec 07 02:04:04 phoinix named[26487]: managed-keys-zone: Key 20326 for zone .
is now trusted (acceptance timer com>
Dec 07 02:04:04 phoinix named[26487]: resolver priming query complete
/etc/systemd/system/named.service.d/onfail.conf
[Service]
Restart=on-failure
RestartSec=2s
Customize as per your wish (man systemd.service). Then run:
systemctl daemon-reload
May be this should be made the default because most people who use named / bind would never want to shut the service down ever unless system itself is shutdown.
Monitoring a service to restart it after
a crash looks like a "Redmond-like" problem solving. ;-)
But of course it's a workaround.
The problem exists now for a long time.
I'm wondering that it's not even assigned to anyone!
journalctl -f -p 4
==================
Dez 13 16:43:59 systemd-coredump[21108]: Process 20018 (named) of user 40 dumped core.
Stack trace of thread 20019:
#0 0x00007f6977ee6d7f raise (libc.so.6)
#1 0x00007f6977ed1672 abort (libc.so.6)
#2 0x0000560281ba002c n/a (named)
#3 0x00007f6978b6bc6a isc_assertion_failed (libisc.so.1305)
#4 0x00007f6978d1ec5b dns_resolver_createfetch (libdns.so.1305)
#5 0x00007f6978d2436e n/a (libdns.so.1305)
#6 0x00007f6978d27dba n/a (libdns.so.1305)
#7 0x00007f6978b8b1c9 n/a (libisc.so.1305)
#8 0x00007f697807aa9d start_thread (libpthread.so.0)
#9 0x00007f6977faab23 __clone (libc.so.6)
Stack trace of thread 20027:
#0 0x00007f6977faae57 epoll_wait (libc.so.6)
#1 0x00007f6978b9d9fc n/a (libisc.so.1305)
#2 0x00007f697807aa9d start_thread (libpthread.so.0)
#3 0x00007f6977faab23 __clone (libc.so.6)
Stack trace of thread 20024:
#0 0x00007f6977faae57 epoll_wait (libc.so.6)
#1 0x00007f6978b9d9fc n/a (libisc.so.1305)
#2 0x00007f697807aa9d start_thread (libpthread.so.0)
#3 0x00007f6977faab23 __clone (libc.so.6)
Stack trace of thread 20018:
#0 0x00007f6977ee7b4c __sigtimedwait (libc.so.6)
#1 0x00007f6978084f4d sigwait (libpthread.so.0)
#2 0x00007f6978b94c41 isc_app_ctxrun (libisc.so.1305)
#3 0x00007f6978b94f58 isc_app_run (libisc.so.1305)
#4 0x0000560281ba0fce main (named)
#5 0x00007f6977ed3223 __libc_start_main (libc.so.6)
#6 0x0000560281ba1b4e _start (named)
Stack trace of thread 20022:
#0 0x00007f6978080afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6978b8add2 n/a (libisc.so.1305)
#2 0x00007f697807aa9d start_thread (libpthread.so.0)
#3 0x00007f6977faab23 __clone (libc.so.6)
Stack trace of thread 20025:
#0 0x00007f6977faae57 epoll_wait (libc.so.6)
#1 0x00007f6978b9d9fc n/a (libisc.so.1305)
#2 0x00007f697807aa9d start_thread (libpthread.so.0)
#3 0x00007f6977faab23 __clone (libc.so.6)
Stack trace of thread 20020:
#0 0x00007f6978080afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6978b8add2 n/a (libisc.so.1305)
#2 0x00007f697807aa9d start_thread (libpthread.so.0)
#3 0x00007f6977faab23 __clone (libc.so.6)
Stack trace of thread 20023:
#0 0x00007f6978080e5b pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6978ba5b99 isc_condition_waituntil (libisc.so.1305)
#2 0x00007f6978b91534 n/a (libisc.so.1305)
#3 0x00007f697807aa9d start_thread (libpthread.so.0)
#4 0x00007f6977faab23 __clone (libc.so.6)
Stack trace of thread 20026:
#0 0x00007f6977faae57 epoll_wait (libc.so.6)
#1 0x00007f6978b9d9fc n/a (libisc.so.1305)
#2 0x00007f697807aa9d start_thread (libpthread.so.0)
#3 0x00007f6977faab23 __clone (libc.so.6)
Stack trace of thread 20021:
#0 0x00007f6978080afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6978b8add2 n/a (libisc.so.1305)
#2 0x00007f697807aa9d start_thread (libpthread.so.0)
#3 0x00007f6977faab23 __clone (libc.so.6)
-------------------------------------------------------------------------------------------------------------------------------------------
cat /var/named/data/named.run
=============================
13-Dec-2018 16:43:57.994 general: critical: resolver.c:10470: REQUIRE(fetchp != ((void *)0) && *fetchp == ((void *)0)) failed, back trace
13-Dec-2018 16:43:57.994 general: critical: #0 0x560281baa4d2 in ??
13-Dec-2018 16:43:57.994 general: critical: #1 0x7f6978b6bc6a in ??
13-Dec-2018 16:43:57.994 general: critical: #2 0x7f6978d1ec5b in ??
13-Dec-2018 16:43:57.994 general: critical: #3 0x7f6978d2436e in ??
13-Dec-2018 16:43:57.994 general: critical: #4 0x7f6978d27dba in ??
13-Dec-2018 16:43:57.994 general: critical: #5 0x7f6978b8b1c9 in ??
13-Dec-2018 16:43:57.994 general: critical: #6 0x7f697807aa9d in ??
13-Dec-2018 16:43:57.994 general: critical: #7 0x7f6977faab23 in ??
13-Dec-2018 16:43:57.994 general: critical: exiting (due to assertion failure)
-------------------------------------------------------------------------------------------------------------------------------------------
17:44 phoinix:~> scs named
● named.service - Internet domain name server
Loaded: loaded (/usr/lib/systemd/system/named.service; enabled; vendor preset: disabled)
Active: failed (Result: signal) since Sat 2018-12-15 03:53:57 CST; 13h ago
Process: 18825 ExecStart=/usr/bin/named -f -u named (code=killed, signal=ABRT)
Main PID: 18825 (code=killed, signal=ABRT)
Dec 15 03:53:57 phoinix named[18825]: #1 0x7f7d60aefc6a in ??
Dec 15 03:53:57 phoinix named[18825]: #2 0x7f7d60ca2c5b in ??
Dec 15 03:53:57 phoinix named[18825]: #3 0x7f7d60ca836e in ??
Dec 15 03:53:57 phoinix named[18825]: #4 0x7f7d60cabdba in ??
Dec 15 03:53:57 phoinix named[18825]: #5 0x7f7d60b0f1c9 in ??
Dec 15 03:53:57 phoinix named[18825]: #6 0x7f7d5fffea9d in ??
Dec 15 03:53:57 phoinix named[18825]: #7 0x7f7d5ff2eb23 in ??
Dec 15 03:53:57 phoinix named[18825]: exiting (due to assertion failure)
Dec 15 03:53:57 phoinix systemd[1]: named.service: Main process exited, code=killed, status=6/ABRT
Dec 15 03:53:57 phoinix systemd[1]: named.service: Failed with result 'signal'.
Edit:
I have bind running on three different servers. On two of the servers, 9.13.5-1 has been running for about 26 hours without any problems whatsoever. On the third server, the crashes that everyone else here is seeing, forcing me to downgrade to 9.13.3-3. Some differences between the working and non-working servers(if it makes any difference):
Working servers:
-Running bind in a chrooted environment on both working servers.
-One working server is an Intel Core2 Duo CPU, the other working server is an Intel Core i5 CPU.
Non-working server:
-NOT running bind in chrooted environment.
-Non-working server is a Dual Xeon E5405 CPU setup.
How about the rest of you people having issues? What type of CPU? Chroot or non-chroot?
bind 9.13.5-1
17-Dec-2018 16:17:54.266 resolver.c:10470: REQUIRE(fetchp != ((void *)0) && *fetchp == ((void *)0)) failed, back trace
17-Dec-2018 16:17:54.266 #0 0x561ea22734d2 in ??
17-Dec-2018 16:17:54.267 #1 0x7f8d6d851c6a in ??
17-Dec-2018 16:17:54.267 #2 0x7f8d6da04c5b in ??
17-Dec-2018 16:17:54.267 #3 0x7f8d6da0a36e in ??
17-Dec-2018 16:17:54.267 #4 0x7f8d6da0ddba in ??
17-Dec-2018 16:17:54.267 #5 0x7f8d6d8711c9 in ??
17-Dec-2018 16:17:54.267 #6 0x7f8d6cd60a9d in ??
17-Dec-2018 16:17:54.267 #7 0x7f8d6cc90b23 in ??
17-Dec-2018 16:17:54.267 exiting (due to assertion failure)
22-Dec-2018 08:58:04.898 dispatch: info: dispatch 0x7fd508984da0: shutting down due to TCP receive error: 8.8.4.4#53: connection reset
22-Dec-2018 09:06:19.202 general: critical: resolver.c:10470: REQUIRE(fetchp != ((void *)0) && *fetchp == ((void *)0)) failed, back trace
As upstream could have been working on the issue for the last four weeks if someone affected had reported it to them.
That's such a wise statement!
But what do you think? Do we know that or not? Sorry, that was a rhetorical question, it does not have to be answered by you.
Not everyone has the technical understanding to do such a thing and not everyone is a kernel programmer.
There are people who use a server for server services and not for testing software.
In a productive environment, the DNS server just has to work and you rarely have time to deal with debugging crashes.
Please let us share your great wisdom and help us to prepare the necessary information.
https://wiki.archlinux.org/index.php/Step-by-step_debugging_guide
https://wiki.archlinux.org/index.php/Bisecting_bugs_with_Git
https://gitlab.isc.org/isc-projects/bind9/issues
As there is no bind-git package I would suggest starting with generating a backtrace with debug symbols and reporting that upstream.
I can try creating a bind-git package if needed.
And while I appreciate the test base of all Arch Linux users, I would strongly recommend not using development releases as a base version for all Arch Linux users. Please stay with latest stable release (BIND 9.14 would be good choice in couple of weeks)
@oerdnj: We will stop using the development releases and stick with stables as soon as 9.14 is out.
Also as per this note: https://gitlab.isc.org/isc-projects/bind9/issues/797#note_37322
Odd number (9.13) is unstable release and even number (9.14) is stable release. But this is not marked clearly on their BIND download page and hence creates impression that 9.13 is a latest stable release.
@seblu
> I just discovered this bug report. I'm also running a production dns server and I have no such issue.
It happens only if you are using forwarders with forward first.
As I understand it, this bug is related to retries associated with forwarders. If you don't employ forwarders or your connection to your forwarders is supremely reliable then you might not experience this failure.
Until a fix is released the solution is to downgrade to 9.13.3 which you can do by fetching the bind and bind-tools packages from archive.archlinux.org.
https://wiki.archlinux.org/index.php/Arch_Linux
The first line states:
Arch Linux is an independently developed, x86-64 general-purpose GNU/Linux distribution that strives to provide the **latest stable versions** of most software by following a rolling-release model.
With my BIND team hat on - I am hoping to get to the point where I would recommend running the latest (development) release because it will be production stable sometimes next year (aka nginx model), but I wouldn't recommend it right now.
Anyway, while I agree the BIND download page needs some fixing, the first paragraph of the release notes says:
> BIND 9.13 is an unstable development release of BIND.
> This document summarizes new features and functional changes that
> have been introduced on this branch. With each development release
> leading up to the stable BIND 9.14 release, this document will be
> updated with additional features added and bugs fixed.
Also, if the Arch Linux maintainer isn't on our package maintainer list, please contact me, I'll get you added to the list.
Yes, we will stick to the stable releases. As already mentioned in
FS#59464, I missed the new release model introduced in 9.13.Fortunately, developments release was enough good (until now) to not trigger an epoch version switch.
I'm happy about the close release of 9.14 you announced. This will make us back on track.
I'm in isc-os-security@lists.isc.org, is that the maintainer list you think of?
@amish:
Yes it makes sense to apply the patch and push a package to testing.
@mat_weiss:
You wrote like a grumpy customer, which make me emotionally unable to help you.
Of course, that does not solve the initial problem. For this, the patch would have to be included in the Archlinux package. Or be switched to a stable version.
There is a workaround which involves a simple downgrade and functionality is restored. You can even install 9.13.5 if you don't employ forwarders although that wouldn't be a recommended solution.
I deleted several superbly unhelpful comments from you, which I do not wish to see repeated on the bugtracker. Let this be an end to it.
PS: Hoping that more people test it and reply here.