FS#71775 - NFS regression experienced with 5.13.x kernels (server side)
Attached to Project:
Arch Linux
Opened by Mike Javorski (javmorin) - Sunday, 08 August 2021, 22:59 GMT
Last edited by Jan Alexander Steffens (heftig) - Monday, 20 September 2021, 17:58 GMT
Opened by Mike Javorski (javmorin) - Sunday, 08 August 2021, 22:59 GMT
Last edited by Jan Alexander Steffens (heftig) - Monday, 20 September 2021, 17:58 GMT
|
Details
Description:
I have been experiencing NFS file access hangs with multiple release versions of the 5.13.x linux kernel. In each case, all file transfers freeze for 5-10 seconds and then resume. This seems worse when reading through many files sequentially (jumping between and seeking within video files often provokes it. My server: - Archlinux w/ an arch kernel package - filesystems exported with "rw,sync,no_subtree_check,insecure" options Client: - Archlinux w/ latest provided "arch" kernel (5.13.9-arch1-1 at writing) - nfs mounted via /net autofs with "soft,nodev,nosuid" options (ver=4.2 is indicated in mount) I have tried the 5.13.x kernel several times since the first stable release (most recently with 5.13.9-arch1-1), all with similar results. Each time, I am forced to downgrade the linux package to a 5.12.x kernel (5.12.15-arch1 as of writing) to clear up the transfer issues and stabilize performance. No other changes are made between tests. I have confirmed the freezing behavior using both ext4 and btrfs filesystems exported from this server. At this point I would appreciate some guidance in what to provide in order to diagnose and resolve this issue. I don't have a lot of kernel debugging experience, so instruction would be helpful. Additional info: * linux 5.13.x-arch vs 5.12.15-arch1-1 |
This task depends upon
Closed by Jan Alexander Steffens (heftig)
Monday, 20 September 2021, 17:58 GMT
Reason for closing: Fixed
Additional comments about closing: linux 5.14.6.arch1-1
Monday, 20 September 2021, 17:58 GMT
Reason for closing: Fixed
Additional comments about closing: linux 5.14.6.arch1-1
Try reproducing using 5.13.9 without without the three commits Arch added as requested by upstream [1]?
[1] https://lore.kernel.org/linux-nfs/CAOv1SKCmdtchm5Z2NU80o49tkrHpAkPFaHKj4-vLDN5bZNCz-Q%40mail.gmail.com/
https://drive.google.com/file/d/19hRre_IeAHomEdZnoOQAySbTLOZIhY2z/view?usp=sharing linux-loqs-5.12rc4.r70.gb73ac6808b0f-1-x86_64.pkg.tar.zst last commit of nfsd pull for 5.13
If the first kernel is good and the second is bad that would narrow it down to 70 commits. kernels are linux mainline unpatched, -loqs appended so you can install alongside the linux kernel package.
Thank you for your help.
Here is a link to that cap file if you are interested: https://drive.google.com/file/d/1T42iX9xCdF9Oe4f7JXsnWqD8oJPrpMqV/view?usp=sharing
I am hoping that Neil may come back with some insights as well.
I've also noticed that it takes a while before the issue starts to be noticeable, mainly several GB of data transferred, and the more data are transferred, the more frequent/slow the issue becomes. Remounting on the client, or restarting nfsd on the server does not help IIRC.
HTH
from20210825193314.354079-1-trond.myklebust@hammerspace.com/"> https://lore.kernel.org/linux-nfs/20210825193314.354079-1-trond.myklebust@hammerspace.com/ seems to have resolved the freezing issue for me.
I have asked if there is any possibility of it being back-ported to linux-stable/5.13, but I don't know if that's a possibility, or the timing of same. Maybe the archlinux devs can do that for the arch kernel in the meantime?
It's already merged into 5.14 (likely to be final this weekend) so that should resolve the trouble too when the archlinux kernel is updated.
I will leave it up until this patch (or a similar solution) lands in the main linux package.
This further patch can be found here:162915504980.9892.4132343755469951234@noble.neil.brown.name/T/#md4e6e4300ed2a36260eca0d8befb7744732df3fe"> https://lore.kernel.org/linux-nfs/162915504980.9892.4132343755469951234@noble.neil.brown.name/T/#md4e6e4300ed2a36260eca0d8befb7744732df3fe
If anyone should want to test it, but not want to deal with the recompile time, here is yet another kernel package I have built which includes both of these fixes: https://drive.google.com/file/d/19R7oECtlCLixGqMM_99kYtNGp-M7veUY/view?usp=sharing
I don't believe this second patch has been pushed for inclusion upstream yet, so it will likely miss the initial 5.14 release if that happens this weekend.
I will update once it's actually merged/released on stable (unless the Arch devs decide to backport it before then).
I will continue monitoring upstream to make sure it lands in 5.14 proper, but at this point Arch users should be all set.