FS#69566 - NSD 4.3.5 problems with unit file

Attached to Project: Community Packages
Opened by Gene (GeneC) - Saturday, 06 February 2021, 15:15 GMT
Last edited by Bruno Pagani (ArchangeGabriel) - Sunday, 07 March 2021, 19:57 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Bruno Pagani (ArchangeGabriel)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description: nsd 4.3.5 fails to run

Replacing the unit file with prior version it starts and works.
Using the unit file as packaged is problematic - some additional info below.

This is what log says:

service: Changing to the requested working directory failed: Not a directory
service: Failed at step CHDIR spawning /bin/kill: Not a directory
service: Control process exited, code=exited, status=200/CHDIR

----

1) I noted that the unit file has
WorkingDirectory=~

So I tried:
WorkingDirectory=/etc/nsd

but this still fails

2) I changed RunTimeDirecroy=/etc/nad # it was previosuly set to: =nsd

Now I can get nsd to start up, but get this problem:

nsd[10230]: setsockopt(..., IP_TRANSPARENT, ...) failed for tcp: Operation not permitted
nsd[10230]: cannot open pidfile /run/nsd/nsd.pid: No such file or directory
nsd[10230]: cannot overwrite the pidfile /run/nsd/nsd.pid: No such file or directory
nsd[10230]: unable to initgroups nsd: Operation not permitted

3)I put the 4.3.4 nsd.service into
/etc/systemd/system

then 4.3.5 starts up and works fine (after systemd daemon-reload)

Thanks for packaging up the new version.

This task depends upon

Closed by  Bruno Pagani (ArchangeGabriel)
Sunday, 07 March 2021, 19:57 GMT
Reason for closing:  Fixed
Comment by Bruno Pagani (ArchangeGabriel) - Saturday, 06 February 2021, 16:50 GMT
Please retry with 4.3.5-2. Also, can you share your nsd config (you can obfuscate the zone)?
Comment by Gene (GeneC) - Saturday, 06 February 2021, 17:02 GMT
Yes will try 4.3.5-2 and report back. In meantime, this is the config on my test server (running secondary).

server:
ip-transparent: yes
do-ip4: yes
do-ip6: no
port: 11153

key:
name: "key_20090101"
algorithm: hmac-md5
secret: "xxx"


key:
name: "key_20190101"
algorithm: hmac-sha256
secret: "xxx"

zone:
# removed.

Comment by Gene (GeneC) - Saturday, 06 February 2021, 17:40 GMT
Tested updated package - it now starts but still has the ip-transparent problem.
I created /etc/systemd/system/nsd.service.d/local.conf
with the following but problem remains - same error.

[Unit]

[Service]
CapabilityBoundingSet=CAP_NET_ADMIN

[Install]


Comment by Gene (GeneC) - Saturday, 06 February 2021, 17:47 GMT
i now recall similar problem with unbound.
The "fix" is to replace ip-transparent with

ip-freebind: yes

This does even require additional caps. At least there is no error logged.

Only remaining issue I see is
unable to initgroups nsd: Operation not permitted

not sure why this is benign?

Comment by Bruno Pagani (ArchangeGabriel) - Saturday, 06 February 2021, 17:49 GMT
Does adding:

AmbientCapabilities=CAP_NET_ADMIN

to the same file helps?

If not, can you retry after running this:

sudo setcap cap_net_admin+ep /usr/bin/nsd

(And finally, eventually combining these two changes)
Comment by Gene (GeneC) - Saturday, 06 February 2021, 17:53 GMT
By removing ip-transparent and replacing with ip-freebind which does the same thing - the problem is gone.
i can do additional tests to see if we can get ip-transparent working for completeness - will get back to this in a couple of hours.

Is there a way to fix the initgroups not permitted go away - that seems unrelated to cap_net_xxx?
Comment by Bruno Pagani (ArchangeGabriel) - Saturday, 06 February 2021, 17:53 GMT
So our messages crossed, ip-freebind or changing the system service to only start after network is up are two better options indeed, but I wanted to solve the ip-transparent case (though actually I should be able to debug it on my own now).

Regarding initgroups, that’s because nsd is trying to change its group to nsd, supposing it is currently root. Before, nsd used to be started as root (and upstream still expect this), but actually this is not necessary: we are fine starting as nsd user and group. So what I’m going to do is report upstream that nsd should check its group before trying to initgroups, so that the message is not emitted anymore in our case.
Comment by Gene (GeneC) - Saturday, 06 February 2021, 17:59 GMT
Thanks for the info on initgroups - makes sense now!

On ip-transparent - yes it would be good to understand that for sure - cap_net_admin best I can tell should allow the setsockopt(), am interested in what might be holding it back.

Thank you again.

Comment by Gene (GeneC) - Saturday, 06 February 2021, 18:46 GMT
If using certs then the new stricter permissions require on certs to be readable by user nsd:

chown root.nsd /etc/nsd/nsd*.pem
chown nsd.nsd /etc/nsd/nsd*key

if key file is only readable by user- if its group readable than key can be root:nsd as well.


Comment by Bruno Pagani (ArchangeGabriel) - Saturday, 06 February 2021, 18:50 GMT
Indeed, I should probably write about it in a post_install message (and document it on the wiki as well).

But first I need to make ip_transparent work, and apparently cap_net_bind_service as well (doesn’t work here…).
Comment by Gene (GeneC) - Saturday, 06 February 2021, 18:55 GMT
What are you seeing indicating cap_net_bind_service s not working? Best I can tell the server is answering queries at this point? Or is that comment related solely to ip-transparent - but even then, nsd comes up ok and responds to queries I thought.
Comment by Gene (GeneC) - Saturday, 06 February 2021, 19:32 GMT
My internal and test machines run nsd on high port which doesn't require net net_cap_bind_service - my external nsd does use port 53 but that is not running test. So i need to change my test machine to bind to lower port to test net_cap_bind_service and see if its working - assume it will fail as it does for you; I will try and look at this when I have more time.

obviously fixing this one is important :)
Comment by Bruno Pagani (ArchangeGabriel) - Saturday, 06 February 2021, 19:43 GMT
It’s OK, I’ve found the (two!) issue(s) preventing CAP_NET_BIND_SERVICE from working (PrivateUsers=true which is expected from the documentation, but also ProtectSystem=strict which I did not expect and I’m actually looking into).
Comment by Gene (GeneC) - Saturday, 06 February 2021, 19:44 GMT
I could think cap_net_raw would be sufficient for ip-transparent rather than cap_net_admin - i tried it and it also didnt work ... since cap_net_admin has more perhaps its expected.
Comment by Gene (GeneC) - Saturday, 06 February 2021, 19:45 GMT
given your last comment i wonder if the same issues you found with cap_net_bind_service may be affecting cap_net_raw too.
Comment by Bruno Pagani (ArchangeGabriel) - Saturday, 06 February 2021, 19:50 GMT
Well at least PrivateUsers=true will (as said above, this is expected, we actually had the issue with gitea before), but maybe ProtectSystem=strict too (in this case try =full instead, since that works for me).
Comment by Bruno Pagani (ArchangeGabriel) - Saturday, 06 February 2021, 20:18 GMT
I can confirm that CAP_NET_RAW is enough, so I’ll use it. ;)
Comment by Gene (GeneC) - Saturday, 06 February 2021, 21:21 GMT
ok great.

I am testing now - changed test machien to listen on port 53 and put back ip-transparent in place of ip-freebind.

I commented out PrivateUsers=true and added CAP_NET_RAW
and Set ProtectSystem=full

I get ip-transparent error and cannot bind to port 53 - permission denied.

obviously I'm doing something wrong since you have it working.

Comment by Bruno Pagani (ArchangeGabriel) - Saturday, 06 February 2021, 21:56 GMT
Yes, sorry, you also need AmbientCapabilities=CAP_NET_RAW.

Also I got everything working with ProtectSystem=strict, I’ll push my change shortly, just need to write the post_install message but got diverted a bit by other things requiring my attention. ;)
Comment by Gene (GeneC) - Saturday, 06 February 2021, 22:00 GMT
Ok will test your version when its pushed.
thanks!
Comment by Bruno Pagani (ArchangeGabriel) - Saturday, 06 February 2021, 22:21 GMT
Pushed, should be available in a mirror near you soon©®™. ;)
Comment by Gene (GeneC) - Saturday, 06 February 2021, 22:25 GMT
great - will report back soon as I've tested.
Comment by Gene (GeneC) - Saturday, 06 February 2021, 22:30 GMT
tesated buuld 3 and all works fine - both ip-transparent and priv port (53) work out of the box now.

Awesome - thank you.

[ Aside nd i see what I missed in my test - i neglected to add AmbientCapabilities=CAP_NET_BIND_SERVICE - i only added CAP_NET_RAW]

Thanks again ... i'll sign off on the package signoff site as well now.



Comment by Bruno Pagani (ArchangeGabriel) - Saturday, 06 February 2021, 22:54 GMT
Perfect, I’ll let it there for the remaining of the week-end at least so that others can test it and find yet some other bugs, after what I’ll move it early next week. ;)
Comment by Gene (GeneC) - Saturday, 06 February 2021, 22:57 GMT
Thanks again for doing all this - its better and way more secure now which is really great. I'm feeling confident now and will update all to this version, including the external facing one
Comment by Vladimir (_v_l) - Sunday, 07 February 2021, 05:04 GMT
Hi.

I tried to update nsd from 4.3.4-1 to 4.3.5-3 but it failed to start:

sudo systemctl daemon-reload
sudo systemctl stop nsd
sudo systemctl start nsd
sudo systemctl status nsd

● nsd.service - Name Server Daemon
Loaded: loaded (/usr/lib/systemd/system/nsd.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2021-02-07 08:01:36 MSK; 27s ago
Process: 21890 ExecStart=/usr/bin/nsd -d -c /etc/nsd/nsd.conf (code=exited, status=1/FAILURE)
Main PID: 21890 (code=exited, status=1/FAILURE)
CPU: 76ms

Feb 07 08:01:36 node1.bkoty.ru systemd[1]: nsd.service: Scheduled restart job, restart counter is at 5.
Feb 07 08:01:36 node1.bkoty.ru systemd[1]: Stopped Name Server Daemon.
Feb 07 08:01:36 node1.bkoty.ru systemd[1]: nsd.service: Start request repeated too quickly.
Feb 07 08:01:36 node1.bkoty.ru systemd[1]: nsd.service: Failed with result 'exit-code'.
Feb 07 08:01:36 node1.bkoty.ru systemd[1]: Failed to start Name Server Daemon.

sudo journalctl -xe
Feb 07 08:01:36 node1.bkoty.ru audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=nsd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 07 08:01:36 node1.bkoty.ru audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=nsd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 07 08:01:36 node1.bkoty.ru systemd[1]: nsd.service: Start request repeated too quickly.
Feb 07 08:01:36 node1.bkoty.ru systemd[1]: nsd.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ The unit nsd.service has entered the 'failed' state with result 'exit-code'.
Feb 07 08:01:36 node1.bkoty.ru systemd[1]: Failed to start Name Server Daemon.
░░ Subject: A start job for unit nsd.service has failed
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A start job for unit nsd.service has finished with a failure.
░░
░░ The job identifier is 13264 and the job result is failed.

The only difference is new version of nsd, same configuration. I changed ownership of both *.key and *.pem files to nsd.nsd.

This is my nsd configuration:

server:
server-count: 1
ip-address: 2a0a:2b40::4:14f
ip-address: 2a0a:2b40::4:3a2f
ip-transparent: yes
identity: "BKOTY domain master DNS"
zonesdir: "/etc/nsd"

pattern:
name: "secondary"
notify: 2a01:4f8:c2c:c813::14f NOKEY
provide-xfr: 2a01:4f8:c2c:c813::14f NOKEY
notify: 2a01:4f8:c2c:c813::3a2f NOKEY
provide-xfr: 2a01:4f8:c2c:c813::3a2f NOKEY
outgoing-interface: 2a0a:2b40::4:14f
outgoing-interface: 2a0a:2b40::4:3a2f

zone:
name: "bkoty.ru"
zonefile: "bkoty.ru.forward.signed"
include-pattern: "secondary"

zone:
name: "bkoty.work"
zonefile: "bkoty.work.forward.signed"
include-pattern: "secondary"

remote-control:
control-enable: yes

(Yes, only IPv6 DNS server).
Comment by Bruno Pagani (ArchangeGabriel) - Sunday, 07 February 2021, 09:01 GMT
Can you provide the output of `journalctl -u nsd` (redirect to a file and attach it here or pastebin somewhere and link it here)?

Also, what is the output of:
sudo ls -l /etc/nsd
sudo ls -l /var/db/nsd

I think nsd is trying to read or write into something it cannot, so I need to whitelist the corresponding path.
Comment by Vladimir (_v_l) - Sunday, 07 February 2021, 12:51 GMT
> Also, what is the output of:
> sudo ls -l /etc/nsd

Duh, I could more carefully check that directory. I have two kind of keys: control and server. I changed ownership for control, but forgot for server one. After I changed ownership of both keys I could run nsd 4.3.5 fine.

The only glitch I found is following message

unable to initgroups nsd: Operation not permitted

in status:

$ sudo systemctl status nsd

● nsd.service - Name Server Daemon
Loaded: loaded (/usr/lib/systemd/system/nsd.service; enabled; vendor preset: disabled)
Active: active (running) since Sun 2021-02-07 15:42:06 MSK; 4min 5s ago
Main PID: 23992 (nsd: xfrd)
Tasks: 3 (limit: 2358)
Memory: 141.9M
CPU: 355ms
CGroup: /system.slice/nsd.service
├─23992 /usr/bin/nsd -d -c /etc/nsd/nsd.conf
├─23993 /usr/bin/nsd -d -c /etc/nsd/nsd.conf
└─23994 /usr/bin/nsd -d -c /etc/nsd/nsd.conf

Feb 07 15:42:05 node1.bkoty.ru systemd[1]: Starting Name Server Daemon...
Feb 07 15:42:05 node1.bkoty.ru nsd[23992]: nsd starting (NSD 4.3.5)
Feb 07 15:42:05 node1.bkoty.ru nsd[23992]: [2021-02-07 15:42:05.924] nsd[23992]: notice: nsd starting (NSD 4.3.5)
Feb 07 15:42:05 node1.bkoty.ru nsd[23992]: unable to initgroups nsd: Operation not permitted
Feb 07 15:42:05 node1.bkoty.ru nsd[23992]: [2021-02-07 15:42:05.929] nsd[23992]: warning: unable to initgroups nsd: Operation not permitted
Feb 07 15:42:06 node1.bkoty.ru nsd[23993]: nsd started (NSD 4.3.5), pid 23992
Feb 07 15:42:06 node1.bkoty.ru nsd[23993]: [2021-02-07 15:42:06.191] nsd[23993]: notice: nsd started (NSD 4.3.5), pid 23992
Feb 07 15:42:06 node1.bkoty.ru systemd[1]: Started Name Server Daemon.

Thank you!
Comment by Bruno Pagani (ArchangeGabriel) - Sunday, 07 February 2021, 13:25 GMT
Yeah, that one is expected as said above: nsd is trying to switch group from root (expected default) to nsd (which it already is with the new service file). So this should be harmless, but nevertheless I opened a ticket upstream to remove this try (and the subsequent warning) when changing uid/gid is not required: https://github.com/NLnetLabs/nsd/issues/155.
Comment by Bruno Pagani (ArchangeGabriel) - Wednesday, 17 February 2021, 10:11 GMT
Upstream says you can use "" as username in nsd.conf to get rid of the warning. I’ve not been able to confirm that yet, but more importantly I don’t have a solution to set this for everyone by default.
Comment by Mike Cloaked (mcloaked) - Sunday, 07 March 2021, 19:52 GMT
Confirming that adding:

username: ""

to nsd.conf does allow nsd to start cleanly with no warning: "unable to initgroups nsd: Operation not permitted"

This for me now gives in the output of "systemctl status nsd":

Mar 07 19:41:28 incus systemd[1]: Starting Name Server Daemon...
Mar 07 19:41:28 incus nsd[4467]: nsd starting (NSD 4.3.5)
Mar 07 19:41:28 incus nsd[4467]: [2021-03-07 19:41:28.120] nsd[4467]: notice: nsd starting (NSD 4.3.5)
Mar 07 19:41:28 incus nsd[4468]: nsd started (NSD 4.3.5), pid 4467
Mar 07 19:41:28 incus nsd[4468]: [2021-03-07 19:41:28.144] nsd[4468]: notice: nsd started (NSD 4.3.5), pid 4467
Mar 07 19:41:28 incus systemd[1]: Started Name Server Daemon.

and 4.3.5 is now working without problems.
Comment by Bruno Pagani (ArchangeGabriel) - Sunday, 07 March 2021, 19:57 GMT
Thanks, it should be added to the wiki then (I cannot do it now for reasons that are my own, so if someone does it that’s nice, else I’ll try to remember about that later this year).

Loading...