FS#63802 - [unbound] fatal error: sd_notify failed

Attached to Project: Community Packages
Opened by Laurențiu Nicola (lnicola) - Tuesday, 17 September 2019, 10:01 GMT
Last edited by Gaetan Bisson (vesath) - Friday, 20 September 2019, 00:19 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Gaetan Bisson (vesath)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 5
Private No

Details

Description:
unbound 1.9.3-2 doesn't start:

unbound[273918]: [273918:0] fatal error: sd_notify failed /run/systemd/notify: No such file or directory. Make sure that unbound has access/permission to use the socket presented by systemd.

This happens regardless of the Type setting in the systemd unit, and even if I comment out the hardening settings, but perhaps I'm missing something.
This task depends upon

Closed by  Gaetan Bisson (vesath)
Friday, 20 September 2019, 00:19 GMT
Reason for closing:  Implemented
Additional comments about closing:  unbound-1.9.3-3 in [community]
Comment by Ruslan (ruff) - Tuesday, 17 September 2019, 10:07 GMT
I have same problem and don't quite understand the reasoning - User error (missing chroot: "")
Where is it missing? Why it should be added? What if i need chroot?
Comment by Laurențiu Nicola (lnicola) - Tuesday, 17 September 2019, 10:09 GMT
In /etc/unbound/unbound.conf, in the server section. Look at unbount.conf.pacnew.
Comment by Ruslan (ruff) - Tuesday, 17 September 2019, 10:14 GMT
Yes that was rhetoric, I know where it should be - I mean why is it missing? I have a server config which i use for ages and this is clearly regression.
Comment by Laurențiu Nicola (lnicola) - Tuesday, 17 September 2019, 10:21 GMT
I assume because it's now trying to tell systemd it started (thus needing to access /run/systemd/notify), and because the systemd hardening facilities are better than a chroot() call.

@bisson I think we can switch to Type=notify, since it's clearly used.
Comment by Ruslan (ruff) - Tuesday, 17 September 2019, 10:22 GMT
See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=867187 where they added bind-mount (proper solution as to me) to avoid breaking chroot.
Comment by Ruslan (ruff) - Tuesday, 17 September 2019, 10:25 GMT
P.S. I maybe mistaking but i don't think patch to do bind-mount internally for type=notify came to systemd.
Comment by Ruslan (ruff) - Tuesday, 17 September 2019, 10:38 GMT
and btw setting chroot: "" doesn't fix that:
● unbound.service - Validating, recursive, and caching DNS resolver
Loaded: loaded (/usr/lib/systemd/system/unbound.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Tue 2019-09-17 12:33:46 CEST; 2s ago
Docs: man:unbound(8)
Process: 24458 ExecStart=/usr/bin/unbound (code=exited, status=0/SUCCESS)
Main PID: 24458 (code=exited, status=0/SUCCESS)

Sep 17 12:33:45 trx.ruff.mobi systemd[1]: Started Validating, recursive, and caching DNS resolver.
Sep 17 12:33:45 trx.ruff.mobi unbound[24458]: [1568716425] unbound[24458:0] warning: did not exit gracefully last time (24426)
Sep 17 12:33:45 trx.ruff.mobi systemd[1]: unbound.service: Got notification message from PID 24459, but reception only permitted for main PID which is currently not known
Sep 17 12:33:45 trx.ruff.mobi systemd[1]: unbound.service: Got notification message from PID 24459, but reception only permitted for main PID which is currently not known
Sep 17 12:33:46 trx.ruff.mobi systemd[1]: unbound.service: Succeeded.

And setting type to notify gives
● unbound.service - Validating, recursive, and caching DNS resolver
Loaded: loaded (/etc/systemd/system/unbound.service; enabled; vendor preset: disabled)
Active: failed (Result: protocol) since Tue 2019-09-17 12:36:48 CEST; 11s ago
Docs: man:unbound(8)
Process: 24527 ExecStart=/usr/bin/unbound (code=exited, status=0/SUCCESS)
Main PID: 24527 (code=exited, status=0/SUCCESS)

Sep 17 12:36:47 trx.ruff.mobi systemd[1]: Starting Validating, recursive, and caching DNS resolver...
Sep 17 12:36:48 trx.ruff.mobi systemd[1]: unbound.service: Got notification message from PID 24528, but reception only permitted for main PID which is currently not known
Sep 17 12:36:48 trx.ruff.mobi systemd[1]: unbound.service: Got notification message from PID 24528, but reception only permitted for main PID which is currently not known
Sep 17 12:36:48 trx.ruff.mobi systemd[1]: unbound.service: Failed with result 'protocol'.
Sep 17 12:36:48 trx.ruff.mobi systemd[1]: Failed to start Validating, recursive, and caching DNS resolver.

If i start it manually with -d it works perfectly well though
Comment by Ruslan (ruff) - Tuesday, 17 September 2019, 10:43 GMT
So the only way to make it start is to change it to non-forking (-d) and add chroot: "" - then it starts either as simple or as notify
● unbound.service - Validating, recursive, and caching DNS resolver
Loaded: loaded (/etc/systemd/system/unbound.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2019-09-17 12:41:56 CEST; 3s ago
Docs: man:unbound(8)
Main PID: 24604 (unbound)
Tasks: 1
Memory: 6.2M
CGroup: /system.slice/unbound.service
└─24604 /usr/bin/unbound -d

Sep 17 12:41:56 trx.ruff.mobi systemd[1]: Started Validating, recursive, and caching DNS resolver.
Sep 17 12:41:57 trx.ruff.mobi unbound[24604]: [1568716917] unbound[24604:0] error: cannot open pidfile /run/unbound.pid: Permission denied
Sep 17 12:41:57 trx.ruff.mobi unbound[24604]: [1568716917] unbound[24604:0] notice: init module 0: subnet
Sep 17 12:41:57 trx.ruff.mobi unbound[24604]: [1568716917] unbound[24604:0] notice: init module 1: validator
Sep 17 12:41:57 trx.ruff.mobi unbound[24604]: [1568716917] unbound[24604:0] notice: init module 2: iterator
Sep 17 12:41:57 trx.ruff.mobi unbound[24604]: [1568716917] unbound[24604:0] info: start of service (unbound 1.9.3).
Comment by Laurențiu Nicola (lnicola) - Tuesday, 17 September 2019, 11:08 GMT
I guess I don't particularly care about chroot, but adding -d and setting Type=notify sounds like a good idea.
Comment by Ruslan (ruff) - Tuesday, 17 September 2019, 11:24 GMT
chroot is perhaps subject for upstream bugreport - to disable default chroot when systemd integration is active. In my config i didn't have chroot at all, it is default setting implied by directory config option. While it's ok in general case - with systemd it should be reversed - only explicit chroot is required otherwise it breaks sd_notify.
Unfortunately I don't have account on nllabs bugzilla and didn't find a way to create it.
Comment by helle vaanzinn (glitsj16) - Tuesday, 17 September 2019, 11:24 GMT
I can't add to https://bugs.archlinux.org/task/61163 so I'll respond here. After upgrading unbound to 1.9.3-2 I got the same error. My /etc/unbound/unbound.conf has chroot="/etc/unbound" and doesn't enable systemd socket activation, so technically the follwing falls in the category 'User error (missing chroot: "")' too. When keeping Type=simple the NotifyAccess=main doesn't make sense. If I override that with NotifyAccess=none my unbound happily starts and works as it did before the upgrade. Just something users might try when wanting to keep their perfectly working setup. It's a bit alarming to see packagers forcing configuration on users to get a systemd service file working...
Comment by Ruslan (ruff) - Tuesday, 17 September 2019, 11:53 GMT
Thanks for the tip glitsj16 - i'd rather restore my original config and disable notify as you suggested.
And yes, agree that regression has been categorized as 'user error' is a bit alarming.
Comment by Laurențiu Nicola (lnicola) - Tuesday, 17 September 2019, 11:58 GMT
Note that I (the reporter) requested the issue to be closed with that reason. It's not the maintainer's opinion.
Comment by Harald Koch (hkoch) - Tuesday, 17 September 2019, 12:10 GMT
Requiring chroot: "" in the configuration is a breaking change that should have been announced...
Comment by Andrey (melentye) - Wednesday, 18 September 2019, 13:10 GMT
Not sure if that's a separate issue or not, but, starting with unbound 1.9.3-2, the NetworkManager -> resolvconf -> unbound integration is broken. What I mean is that NM is configured to use resolvconf for DNS management and after upgrading unbound to 1.9.3-2 I now get

Sep 18 14:54:38 psypad NetworkManager[1003]: <info> [1568811278.8660] dns-mgr: Writing DNS information to /usr/bin/resolvconf
Sep 18 14:54:38 psypad NetworkManager[1003]: Job for unbound.service failed.
Sep 18 14:54:38 psypad NetworkManager[1003]: See "systemctl status unbound.service" and "journalctl -xe" for details.
Sep 18 14:54:38 psypad NetworkManager[1003]: <warn> [1568811278.9482] dns-mgr: resolvconf failed with status 256
Sep 18 14:54:38 psypad NetworkManager[1003]: <warn> [1568811278.9484] dns-mgr: could not commit DNS changes: resolvconf failed with status 256

^^ this in NM logs, and

Sep 18 14:54:38 psypad systemd[1]: Reloading Validating, recursive, and caching DNS resolver.
Sep 18 14:54:38 psypad kill[4767]: kill: sending signal to 4047 failed: Operation not permitted
Sep 18 14:54:38 psypad systemd[1]: unbound.service: Control process exited, code=exited, status=1/FAILURE
Sep 18 14:54:38 psypad systemd[1]: Reload failed for Validating, recursive, and caching DNS resolver.

^^ this in unbound logs.

Downgrading unbound to 1.9.3-1 brings things back to normal.
Comment by Konstantin Shalygin (k0ste) - Thursday, 19 September 2019, 09:49 GMT
```
chroot: ""
```

Works for me.
Comment by Jensen McKenzie (your_doomsday) - Thursday, 19 September 2019, 12:48 GMT Comment by Ruslan (ruff) - Thursday, 19 September 2019, 21:40 GMT
new ver 1.9.3-3 still doesn't help much. It again relies on config param 'do-daemonize: no' which when not set results:
unbound.service - Validating, recursive, and caching DNS resolver
Loaded: loaded (/usr/lib/systemd/system/unbound.service; enabled; vendor preset: disabled)
Active: failed (Result: protocol) since Thu 2019-09-19 23:36:26 CEST; 5s ago
Docs: man:unbound(8)
Process: 9511 ExecStart=/usr/bin/unbound (code=exited, status=0/SUCCESS)
Main PID: 9511 (code=exited, status=0/SUCCESS)

Sep 19 23:36:26 trx.ruff.mobi systemd[1]: Starting Validating, recursive, and caching DNS resolver...
Sep 19 23:36:26 trx.ruff.mobi systemd[1]: unbound.service: Got notification message from PID 9513, but reception only permitted for main PID which is currently not known
Sep 19 23:36:26 trx.ruff.mobi systemd[1]: unbound.service: Got notification message from PID 9513, but reception only permitted for main PID which is currently not known
Sep 19 23:36:26 trx.ruff.mobi systemd[1]: unbound.service: Failed with result 'protocol'.
Sep 19 23:36:26 trx.ruff.mobi systemd[1]: Failed to start Validating, recursive, and caching DNS resolver.

so still regression (or - user error)
If you need it to be non-daemon please add -d to service definition instead of requesting users to update their configs.
Comment by Ruslan (ruff) - Thursday, 19 September 2019, 21:44 GMT
[root@trx ~]# diff -u /usr/lib/systemd/system/unbound.service /etc/systemd/system/unbound.service
--- /usr/lib/systemd/system/unbound.service 2019-09-19 21:50:24.000000000 +0200
+++ /etc/systemd/system/unbound.service 2019-09-19 23:42:34.307814953 +0200
@@ -10,7 +10,7 @@

[Service]
ExecReload=/bin/kill -HUP $MAINPID
-ExecStart=/usr/bin/unbound
+ExecStart=/usr/bin/unbound -d
NotifyAccess=main
Type=notify
CapabilityBoundingSet=CAP_IPC_LOCK CAP_NET_BIND_SERVICE CAP_SETGID CAP_SETUID CAP_SYS_CHROOT CAP_SYS_RESOURCE CAP_NET_ADMIN
[root@trx ~]# systemctl status unbound.service
● unbound.service - Validating, recursive, and caching DNS resolver
Loaded: loaded (/etc/systemd/system/unbound.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2019-09-19 23:42:40 CEST; 35s ago
Docs: man:unbound(8)
Main PID: 9706 (unbound)
Tasks: 1
Memory: 7.9M
CGroup: /system.slice/unbound.service
└─9706 /usr/bin/unbound -d

Sep 19 23:42:40 trx.ruff.mobi systemd[1]: Starting Validating, recursive, and caching DNS resolver...
Sep 19 23:42:40 trx.ruff.mobi unbound[9706]: [1568929360] unbound[9706:0] notice: init module 0: subnet
Sep 19 23:42:40 trx.ruff.mobi unbound[9706]: [1568929360] unbound[9706:0] notice: init module 1: validator
Sep 19 23:42:40 trx.ruff.mobi unbound[9706]: [1568929360] unbound[9706:0] notice: init module 2: iterator
Sep 19 23:42:40 trx.ruff.mobi unbound[9706]: [1568929360] unbound[9706:0] info: start of service (unbound 1.9.3).
Sep 19 23:42:40 trx.ruff.mobi systemd[1]: Started Validating, recursive, and caching DNS resolver.
[root@trx ~]#
Comment by Gaetan Bisson (vesath) - Friday, 20 September 2019, 00:19 GMT
The service file is now provided by upstream. Please submit your requests to them.

Note however that you are expected to merge pacnew files. You cannot expect your setup to be supported if you rely on the stock service file while independently customizing your configuration file.

Loading...