FS#30442 - [NFS4] rpc.idmapd Crash with Kernel 3.4.x

Attached to Project: Arch Linux
Opened by Carlos Candeias (jcci) - Tuesday, 26 June 2012, 03:08 GMT
Last edited by Tobias Powalowski (tpowa) - Wednesday, 04 July 2012, 06:42 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
NFS4 idmapd crashes after an unknown number of write accesses.
The problem is confirmed for all kernel 3.4.x including 3.4.4.2 an Arch x86_64. It happened on meanwhile 3 computer with different hardware.
After NFS crashed the process /usr/sbin/rpc.idmapd is still there, but there is no way to kill and restart it. Consequently NFS is dead and the computer almost freezes. Restart by tty is possible in most cases (umount NFS4 shares fails of course).
Working with kernel 3.3.8 is fine.


Additional info:
* package version(s)
All kernel 3.4.x including 3.4.4.2

* config and/or log files etc.
This is the kernel log (from messages.log)

Jun 26 10:09:07 anyhost kernel: [ 8012.955208] PGD 3ff7f9067 PUD 3ff7f8067 PMD 80000003fe0001e3
Jun 26 10:09:07 anyhost kernel: [ 8012.955255] Oops: 0011 [#1] PREEMPT SMP
Jun 26 10:09:07 anyhost kernel: [ 8012.955291] CPU 5
Jun 26 10:09:07 anyhost kernel: [ 8012.955307] Modules linked in: fuse tun cpufreq_conservative nfsd exportfs snd_hda_codec_hdmi snd_hda_codec_realtek mxm_wmi usbhid hid microcode aesni_intel aes_x86_64 aes_generic snd_hda_intel ghash_clmulni_intel cryptd snd_hda_codec snd_hwdep snd_pcm snd_page_alloc serio_raw snd_timer r8169 i2c_i801 iTCO_wdt pcspkr snd mii iTCO_vendor_support soundcore mei(C) kvm_intel wmi kvm evdev coretemp acpi_cpufreq mperf processor nfs nfs_acl lockd auth_rpcgss sunrpc fscache crc32c_intel i915 video button i2c_algo_bit intel_agp intel_gtt drm_kms_helper drm i2c_core btrfs crc32c libcrc32c zlib_deflate ext4 crc16 jbd2 mbcache ehci_hcd xhci_hcd usbcore usb_common sr_mod cdrom sd_mod ahci libahci libata scsi_mod
Jun 26 10:09:07 anyhost kernel: [ 8012.955891]
Jun 26 10:09:07 anyhost kernel: [ 8012.955905] Pid: 493, comm: rpc.idmapd Tainted: G WC 3.4.4-2-ARCH #1 Gigabyte Technology Co., Ltd. Z68A-D3H-B3/Z68A-D3H-B3
Jun 26 10:09:07 anyhost kernel: [ 8012.955990] RIP: 0010:[<ffffea000ea80fc0>] [<ffffea000ea80fc0>] 0xffffea000ea80fbf
Jun 26 10:09:07 anyhost kernel: [ 8012.956047] RSP: 0018:ffff8803e1e27d40 EFLAGS: 00010246
Jun 26 10:09:07 anyhost kernel: [ 8012.956083] RAX: ffff8803e8a44df0 RBX: ffff8803a39b2f00 RCX: ffff8803e8944f00
Jun 26 10:09:07 anyhost kernel: [ 8012.956131] RDX: 0000000000000005 RSI: ffff8803e1e27de9 RDI: ffff8803e8a44de0
Jun 26 10:09:07 anyhost kernel: [ 8012.956201] RBP: ffff8803e1e27d88 R08: 2222222222222222 R09: 2222222222222222
Jun 26 10:09:07 anyhost kernel: [ 8012.956267] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8803e8944f00
Jun 26 10:09:07 anyhost kernel: [ 8012.956316] R13: ffff8803e8a44de0 R14: ffff8803e1e27de9 R15: 0000000000000005
Jun 26 10:09:07 anyhost kernel: [ 8012.956366] FS: 00007f2d8e852700(0000) GS:ffff8803ffd40000(0000) knlGS:0000000000000000
Jun 26 10:09:07 anyhost kernel: [ 8012.956422] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 26 10:09:07 anyhost kernel: [ 8012.956461] CR2: ffffea000ea80fc0 CR3: 00000003e1eaf000 CR4: 00000000000407e0
Jun 26 10:09:07 anyhost kernel: [ 8012.956510] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 26 10:09:07 anyhost kernel: [ 8012.956558] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 26 10:09:07 anyhost kernel: [ 8012.956608] Process rpc.idmapd (pid: 493, threadinfo ffff8803e1e26000, task ffff8803e89b1fc0)
Jun 26 10:09:07 anyhost kernel: [ 8012.956664] Stack:
Jun 26 10:09:07 anyhost kernel: [ 8012.956679] ffffffff811f027c ffff8803e1e27db0 ffff8803e1e27db0 ffff8803e837dfa0
Jun 26 10:09:07 anyhost kernel: [ 8012.956739] ffff8803e8944f00 ffff8803e8a44de0 ffff8803e1e27de9 0000000000000005
Jun 26 10:09:07 anyhost kernel: [ 8012.956798] ffff8803a39b2f00 ffff8803e1e27dd8 ffffffff811f0383 ffff8803e1e27de8
Jun 26 10:09:07 anyhost kernel: [ 8012.956858] Call Trace:
Jun 26 10:09:07 anyhost kernel: [ 8012.956883] [<ffffffff811f027c>] ? __key_instantiate_and_link+0x5c/0x100
Jun 26 10:09:07 anyhost kernel: [ 8012.956931] [<ffffffff811f0383>] key_instantiate_and_link+0x63/0xa0
Jun 26 10:09:07 anyhost kernel: [ 8012.956987] [<ffffffffa03aa7cd>] idmap_pipe_downcall+0x1bd/0x1e0 [nfs]
Jun 26 10:09:07 anyhost kernel: [ 8012.957040] [<ffffffffa033dcc9>] rpc_pipe_write+0x69/0x90 [sunrpc]
Jun 26 10:09:07 anyhost kernel: [ 8012.957085] [<ffffffff8116e868>] vfs_write+0xa8/0x180
Jun 26 10:09:07 anyhost kernel: [ 8012.957122] [<ffffffff8116ebaa>] sys_write+0x4a/0xa0
Jun 26 10:09:07 anyhost kernel: [ 8012.957159] [<ffffffff8146a8e9>] system_call_fastpath+0x16/0x1b
Jun 26 10:09:07 anyhost kernel: [ 8012.957199] Code: 00 00 00 ff ff ff ff 03 00 00 00 e0 41 29 0f 00 ea ff ff 60 b8 8e 0e 00 ea ff ff 88 79 ef e2 03 88 ff ff 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 02 00 00 00 00 00 00 00 00 00 d4 13 ea 03
Jun 26 10:09:07 anyhost kernel: [ 8012.957580] RSP <ffff8803e1e27d40>
Jun 26 10:09:07 anyhost kernel: [ 8012.957609] CR2: ffffea000ea80fc0
Jun 26 10:09:07 anyhost kernel: [ 8013.025034] ---[ end trace a7919e7f17c0a727 ]---

Steps to reproduce:
We use the computer with a home directory as well as all working dir on NFS4. Any "save", or "save as", rename potentially causes a crash.
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Wednesday, 04 July 2012, 06:42 GMT
Reason for closing:  Fixed
Comment by Tobias Powalowski (tpowa) - Tuesday, 26 June 2012, 15:30 GMT
Have you tried new keyutils package too with new config?
Comment by Carlos Candeias (jcci) - Wednesday, 27 June 2012, 00:36 GMT
Keyutils is installed, but I don't see where to make settings in the /etc/request-key.conf file. The Wiki unfortunately also doesn't help.
So the NFS4 works with the default settings, as it did for some years.
However, the "error.log" shows a lot of those:
request-key: Cannot find command to construct key 994854967
The number at the end changes all the time.

But keyutils does not depend on anything, so what is missing and why NFS crashes?
Comment by Tobias Powalowski (tpowa) - Thursday, 28 June 2012, 09:37 GMT
Please contact upstream developers, no arch dev has issues with nfs4 at the moment.
Comment by Carlos Candeias (jcci) - Friday, 29 June 2012, 05:30 GMT
Today a new version of keyutils came. At least the request-key error message is gone.
Now I will try one workstation with the actual kernel and keyutils and further report.
I understand it is hard to manage a problem you can not reproduce, but in fact every computer in our company shows the very same problem and NFS is a core function of Linux. So this is not about a special configuration unless some requirement for the NFS server changed.
Some Google check confirms that I'm not alone: http://comments.gmane.org/gmane.linux.kernel/1306434
It is suggested that the use of keyutils solves the problem, but keyutils didn't work properly until 30min ago.
Comment by Carlos Candeias (jcci) - Wednesday, 04 July 2012, 00:15 GMT
Confirmed now: The rpc.idmapd trouble is gone with keyutils 1.5.5-3. The previous versions caused a lot of error logs like "request-key: Cannot find command to construct key 994854967" which is an indicator for an unstable NFS4.
So I believe this bug report can be closed.
Thanks Tobias!

Loading...