FS#43647 - [nfs-utils] nfs-server fails to start

Attached to Project: Arch Linux
Opened by Gene (GeneC) - Saturday, 31 January 2015, 21:43 GMT
Last edited by Andreas Radke (AndyRTR) - Saturday, 18 April 2015, 07:38 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 6
Private No

Details

Description: nfs-server fails to start


Additional info:

Similar problems noted last year here:

https://lists.archlinux.org/pipermail/arch-general/2014-June/036617.html

Today after reboot nfs-server was not running:

2 units fail to start

rpc-statd.service
nfs-server.service.

-------------------------------------
rpc-statd:
systemctl status rpc-statd

rpc.statd[736]: Version 1.3.2 starting
rpc.statd[736]: Flags: TI-RPC
rpc.statd[736]: Running as root. chown /var/lib/nfs to choose different user
rpc.statd[736]: failed to create RPC listeners, exiting
systemd[1]: rpc-statd.service: control process exited, code=exited status=1
systemd[1]: Failed to start NFS status monitor for NFSv2/3 locking..
systemd[1]: Unit rpc-statd.service entered failed state.
systemd[1]: rpc-statd.service failed.

-------------------------------------
I am unable to start this by hand either - continues to fail same way. I had seen this once a month or so back - but was able to start it by hand after machine was up.

-------------------------------------
nfs-server:

systemctl-status nfs-server

● nfs-server.service - NFS server and services
Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sat 2015-01-31 16:05:32 EST; 5min ago
Process: 743 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
Process: 741 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
Main PID: 743 (code=exited, status=1/FAILURE)

rpc.nfsd[743]: rpc.nfsd: writing fd to kernel failed: errno 111 (Connection refused)
rpc.nfsd[743]: rpc.nfsd: unable to set any sockets for nfsd
systemd[1]: nfs-server.service: main process exited, code=exited, status=1/FAILURE
systemd[1]: Failed to start NFS server and services.
systemd[1]: Unit nfs-server.service entered failed state.
systemd[1]: nfs-server.service failed.


-------------------------------------

Machine is fully updated from testing repo. I did try kernel 3.19.rc6 but it does not help.

Going over the bug from June 2014 I tried these:

systemctl restart proc-fs-nfsd.mount

systemctl restart rpcbind
systemctl restart nfs-mountd.service
systemctl restart rpc-statd.service
systemctl restart nfs-idmapd.service
systemctl restart rpc-svcgssd.service
systemctl restart rpc-statd-notify.service
systemctl restart nfs-mountd
systemctl restart rpc-gssd.service rpc-svcgssd.service


rpc-statd still does not start but now .. it failes with
rpc-statd.service start operation timed out. Terminating.

Trying to start nfs-server - it too 'times out'

After 5 mins i tried again: systemctl start nfs-server
which now starts ... rpc-statd is still not running.

And the server is once again serving NFS.





* package version(s)
nfs-utils 1.3.2-1
linux 3.18.5-1
rpcbind 0.2.2-1

* config and/or log files etc.


Steps to reproduce:

This task depends upon

Closed by  Andreas Radke (AndyRTR)
Saturday, 18 April 2015, 07:38 GMT
Reason for closing:  Fixed
Comment by Gene (GeneC) - Saturday, 31 January 2015, 21:49 GMT
Also in journal nfsd logs this:

rpc.nfsd[505]: rpc.nfsd: writing fd to kernel failed: errno 111 (Connection refused)
Comment by Mike Cloaked (mcloaked) - Sunday, 01 February 2015, 16:16 GMT
It's possible that this Fedora bug may be related:
https://bugzilla.redhat.com/show_bug.cgi?id=1183992

In the arch nfs-utils package in the file /usr/lib/systemd/system/nfs-server.service the line referenced in that bug report is also in the arch unit file as:

[Unit]
Description=NFS server and services
Requires= network.target proc-fs-nfsd.mount rpcbind.target

Is the suggested change from that bug report a possible way forward?
Comment by Tobias Powalowski (tpowa) - Tuesday, 03 February 2015, 12:12 GMT
Please try 1.3.2-2 package it should fix rpc-statd startup.
Comment by Gene (GeneC) - Wednesday, 04 February 2015, 03:29 GMT
Thanks Tobias - I will try this soon as I can (likely the weekend).

gene
Comment by Andreas Radke (AndyRTR) - Thursday, 05 February 2015, 06:02 GMT
Works for me now.
Comment by Markus N. (Markus.N) - Monday, 02 March 2015, 17:23 GMT
  • Field changed: Percent Complete (100% → 0%)
Still failing with 1.3.2-3

Could the root cause be in another package ? The reason why I'm asking this is:
I have two PC's. When I noticed the NFS Server failure on PC 1, I updated PC 2 with --ignore nfs-utils. But PC2 also failed after the update.
If this is really the case, it would explain why it seems to work for some people and fails for others.
Comment by Andy Clay (andyc) - Monday, 02 March 2015, 22:08 GMT
As noted in the above fedora bug https://bugzilla.redhat.com/show_bug.cgi?id=1183992, this can be fixed by changing 'Requires=' line from rpcbind.target to rpcbind.service in the systemd service files nfs-server.service and rpc-statd.service.

The services then start correctly after boot.

I don't know if this is the real root of the problem, but it may help to point in the right direction.
Comment by Markus N. (Markus.N) - Tuesday, 03 March 2015, 08:53 GMT
OK, it looks like that helped. NFS Server is running.
At the moment, I can only see the state of the server, because I'm at work now and currently only PC 1 is running at home (and accssible via VNC session). So I could not test if PC 2 can access the server on PC 1 correctly.

Maybe I have found something that could point to the root cause. There are three dead symlinks in /etc/systemd/system/multi-user.target.wants:
nfsd.service -> /usr/lib/systemd/system/nfsd.service
rpc-idmapd.service -> /usr/lib/systemd/system/rpc-idmapd.service
rpc-mountd.service -> /usr/lib/systemd/system/rpc-mountd.service

The files are completely missing, not been relocated.
I am not sure if some of them are obsolete in between, bit AFAIR, the rpc-idmapd is needed to resolve the user ID's to user names. Or did I miss an important change in this concept ?
Comment by Andreas Radke (AndyRTR) - Tuesday, 03 March 2015, 17:38 GMT
so far no report back from the op.

Markus' issue looks like a wrong user configuration. This could be a missing .pacnew file merge or something else. This has nothing to do with the initial bug we solved.
Comment by Gene (GeneC) - Tuesday, 03 March 2015, 18:15 GMT
Quite a few updates since I posted originally.

I will be doing further testing this weekend and report back.

I can add that i started seeing some client failures which could be remediated by just doing 'mount -av' by hand after boot. I was also seeing client errors on vers 4.2 but it seemed to fall back to vers 4.1 ok. To avoid the error I forced vers = 4.1 in fstab but this did not always fix the mount at boot issue. It could be systemd timing - i,e, perhaps if I waited longer, but I am not sure.

Anyway - will be testing both server side and client side this weekend.

Thanks.
Comment by Andreas Radke (AndyRTR) - Tuesday, 03 March 2015, 20:06 GMT
Please make sure to not use systemd v219! That version is known to break automount at boot.
Comment by Markus N. (Markus.N) - Tuesday, 03 March 2015, 21:41 GMT
Thanks for the hint ... Currently I'm on v218, just planned an update today, I'm going to --ignore systemd.

Update about the NFS-Server on my PC 1:
Being at home now, I mounted the shares from PC 1 on PC2 and it works. Even User ID mapping works.
Comment by Gene (GeneC) - Tuesday, 03 March 2015, 22:43 GMT
And currently I am on systemd 219-2
Comment by Gene (GeneC) - Wednesday, 04 March 2015, 01:08 GMT
By the way systemd 219 and automount - I am not using "systemd.automount" just the standard fstab entry - tho I did try automount and it did not work.
Comment by Andreas Radke (AndyRTR) - Wednesday, 04 March 2015, 12:21 GMT
Can we close this one? rpcbind has been fixed and needs to be enabled - see our NFS wiki page. (rpcbind-less NFSv4 only setup isn't the default recommended way currently).

Anything else is something different and should be discussed in other bug reports (see  FS#43915  for systemdmount issue).

Loading...