FS#43647 - [nfs-utils] nfs-server fails to start
Attached to Project:
Arch Linux
Opened by Gene (GeneC) - Saturday, 31 January 2015, 21:43 GMT
Last edited by Andreas Radke (AndyRTR) - Saturday, 18 April 2015, 07:38 GMT
Opened by Gene (GeneC) - Saturday, 31 January 2015, 21:43 GMT
Last edited by Andreas Radke (AndyRTR) - Saturday, 18 April 2015, 07:38 GMT
|
Details
Description: nfs-server fails to start
Additional info: Similar problems noted last year here: https://lists.archlinux.org/pipermail/arch-general/2014-June/036617.html Today after reboot nfs-server was not running: 2 units fail to start rpc-statd.service nfs-server.service. ------------------------------------- rpc-statd: systemctl status rpc-statd rpc.statd[736]: Version 1.3.2 starting rpc.statd[736]: Flags: TI-RPC rpc.statd[736]: Running as root. chown /var/lib/nfs to choose different user rpc.statd[736]: failed to create RPC listeners, exiting systemd[1]: rpc-statd.service: control process exited, code=exited status=1 systemd[1]: Failed to start NFS status monitor for NFSv2/3 locking.. systemd[1]: Unit rpc-statd.service entered failed state. systemd[1]: rpc-statd.service failed. ------------------------------------- I am unable to start this by hand either - continues to fail same way. I had seen this once a month or so back - but was able to start it by hand after machine was up. ------------------------------------- nfs-server: systemctl-status nfs-server ● nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Sat 2015-01-31 16:05:32 EST; 5min ago Process: 743 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE) Process: 741 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS) Main PID: 743 (code=exited, status=1/FAILURE) rpc.nfsd[743]: rpc.nfsd: writing fd to kernel failed: errno 111 (Connection refused) rpc.nfsd[743]: rpc.nfsd: unable to set any sockets for nfsd systemd[1]: nfs-server.service: main process exited, code=exited, status=1/FAILURE systemd[1]: Failed to start NFS server and services. systemd[1]: Unit nfs-server.service entered failed state. systemd[1]: nfs-server.service failed. ------------------------------------- Machine is fully updated from testing repo. I did try kernel 3.19.rc6 but it does not help. Going over the bug from June 2014 I tried these: systemctl restart proc-fs-nfsd.mount systemctl restart rpcbind systemctl restart nfs-mountd.service systemctl restart rpc-statd.service systemctl restart nfs-idmapd.service systemctl restart rpc-svcgssd.service systemctl restart rpc-statd-notify.service systemctl restart nfs-mountd systemctl restart rpc-gssd.service rpc-svcgssd.service rpc-statd still does not start but now .. it failes with rpc-statd.service start operation timed out. Terminating. Trying to start nfs-server - it too 'times out' After 5 mins i tried again: systemctl start nfs-server which now starts ... rpc-statd is still not running. And the server is once again serving NFS. * package version(s) nfs-utils 1.3.2-1 linux 3.18.5-1 rpcbind 0.2.2-1 * config and/or log files etc. Steps to reproduce: |
This task depends upon
rpc.nfsd[505]: rpc.nfsd: writing fd to kernel failed: errno 111 (Connection refused)
https://bugzilla.redhat.com/show_bug.cgi?id=1183992
In the arch nfs-utils package in the file /usr/lib/systemd/system/nfs-server.service the line referenced in that bug report is also in the arch unit file as:
[Unit]
Description=NFS server and services
Requires= network.target proc-fs-nfsd.mount rpcbind.target
Is the suggested change from that bug report a possible way forward?
gene
Could the root cause be in another package ? The reason why I'm asking this is:
I have two PC's. When I noticed the NFS Server failure on PC 1, I updated PC 2 with --ignore nfs-utils. But PC2 also failed after the update.
If this is really the case, it would explain why it seems to work for some people and fails for others.
The services then start correctly after boot.
I don't know if this is the real root of the problem, but it may help to point in the right direction.
At the moment, I can only see the state of the server, because I'm at work now and currently only PC 1 is running at home (and accssible via VNC session). So I could not test if PC 2 can access the server on PC 1 correctly.
Maybe I have found something that could point to the root cause. There are three dead symlinks in /etc/systemd/system/multi-user.target.wants:
nfsd.service -> /usr/lib/systemd/system/nfsd.service
rpc-idmapd.service -> /usr/lib/systemd/system/rpc-idmapd.service
rpc-mountd.service -> /usr/lib/systemd/system/rpc-mountd.service
The files are completely missing, not been relocated.
I am not sure if some of them are obsolete in between, bit AFAIR, the rpc-idmapd is needed to resolve the user ID's to user names. Or did I miss an important change in this concept ?
Markus' issue looks like a wrong user configuration. This could be a missing .pacnew file merge or something else. This has nothing to do with the initial bug we solved.
I will be doing further testing this weekend and report back.
I can add that i started seeing some client failures which could be remediated by just doing 'mount -av' by hand after boot. I was also seeing client errors on vers 4.2 but it seemed to fall back to vers 4.1 ok. To avoid the error I forced vers = 4.1 in fstab but this did not always fix the mount at boot issue. It could be systemd timing - i,e, perhaps if I waited longer, but I am not sure.
Anyway - will be testing both server side and client side this weekend.
Thanks.
Update about the NFS-Server on my PC 1:
Being at home now, I mounted the shares from PC 1 on PC2 and it works. Even User ID mapping works.
Anything else is something different and should be discussed in other bug reports (see
FS#43915for systemdmount issue).