FS#63833 - [linux] 5.3 prevents nfs-server from starting

Attached to Project: Arch Linux
Opened by James Harvey (jamespharvey20) - Thursday, 19 September 2019, 03:41 GMT
Last edited by freswa (frederik) - Friday, 21 February 2020, 22:17 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To No-one
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Disclaimer: If this wasn't in testing, I wouldn't be reporting at this stage. I don't know if heftig is considering releasing 5.3 soon, or if it's a way off, or if he'll be waiting for a minor revision(s). So, I at least wanted to say something so heftig would be able to consider this.

Description: -Syu and installing testing/linux 5.3, and then rebooting prevents nfs-server.service from starting, it just runs forever, and clients can never connect. Downgrading only linux fixes.

I'm specifically using NFS v4 over IPOIB (IP-over-InfiniBand.)

In about 4 hours, I'll be able to start working on this and get more details. Is this IPOIB-specific or is it also broken over just ethernet? Is this just NFS v4? Also, there will be an upstream bugreport of course.

This is an nfs server that's been running without issue for well over a year.

Additional info:
* testing/linux 5.3
* I found this NFSv4 breakage on 5.3, but I don't think it's the same issue, and a fix was supposedly included in 5.3-rc8:20190828102256.3nhyb2ngzitwd7az@XZHOUW.usersys.redhat.com/"> https://lore.kernel.org/linux-nfs/20190828102256.3nhyb2ngzitwd7az@XZHOUW.usersys.redhat.com/
* I also found this NFSv4 breakage on 5.3, but I don't think it's the same issue, upstream suggested it might be user error, and reporter never responded: https://lkml.org/lkml/2019/8/19/393

Steps to reproduce:
?? Should be more comments from me within a few hours.
This task depends upon

Closed by  freswa (frederik)
Friday, 21 February 2020, 22:17 GMT
Reason for closing:  Upstream
Comment by James Harvey (jamespharvey20) - Thursday, 19 September 2019, 13:09 GMT
Posted to linux-nfs. Should be visible at https://www.spinics.net/lists/linux-nfs/ with subject "5.3.0 Regression: rpc.nfsd v4 uninterruptible sleep for 5+ minutes w/o rpc-statd/etc" but it looks like that archive might be lagging behind by about a day. In the meantime, my email with extra diagnostics can be viewed at http://ix.io/1pNg but of course replies won't be shown there.

It's "only" typically exactly a 5 minute delay in it starting, but sometimes a bit longer. So, it you're more patient, it doesn't prevent it from starting.

It's also only when forcing v4 only. By this, I mean including in /etc/nfs.conf "[nfsd]" "vers3=n" and then systemctl masking: gssproxy, nfs-blkmap, rpc-statd, and rpcbind (service & socket.) There have been a few discussions of running v4 without all the extra services on the forums, and it's at least been mentioned once it should be added to the wiki but never was, so I don't know how many people this will actually impact. My guess is not many.

And, the easy fix is just to temporarily unmask those services/socket.

Nothing to do with InfiniBand/IPOIB/rdma kernel modules.

Occurs even when commenting out all of /etc/exports and /etc/nfs.conf except for the "vers3=n" and rebooting.

Also note flyspray butchered one of my original report links, but neither of those are related to this bug.

EDIT: 2019-10-10: bcodding has sent a patch to linux-nfs said to fix the issue. It's not yet released. Most current version is "[PATCH v3] SUNRPC: fix race to sk_err after xs_error_report"

Loading...