FS#37886 - [nfs-utils] rpc.gssd fails with a kernel oops

Attached to Project: Arch Linux
Opened by fiat500 (fiat500) - Saturday, 23 November 2013, 15:54 GMT
Last edited by Tobias Powalowski (tpowa) - Sunday, 12 January 2014, 13:06 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
Rpc.gssd faults with a kernel oops, and stopping the rpc-gssd service doesn't kill the offending process. Shutdown doesn't go cleanly and requires hard shutdown.

Additional info:
Gives a kernel oops similar to:
NFS: Registering the id_resolver key type
Key type id_resolver registered
Key type id_legacy registered
BUG: unable to handle kernel NULL pointer dereference at 0000000000000c68
IP: [<ffffffffa0a3900c>] put_pipe_version+0x1c/0x80 [auth_rpcgss]
PGD 7b2fa067 PUD 764f1067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 nfsd auth_rpcgss oid_registry nfs_acl xfs crc32c libcrc32c arc4 b43 bcma mac80211 cfg80211 joydev iTCO_wdt iTCO_vendor_support coretemp hp_wmi sparse_keymap ssb rfkill mmc_core pcmcia microcode pcmcia_core pcspkr psmouse serio_raw i2c_i801 lpc_ich 8139too snd_hda_codec_conexant snd_hda_intel 8139cp snd_hda_codec mii snd_hwdep snd_pcm snd_page_alloc snd_timer snd wmi thermal shpchp evdev battery ac processor soundcore nfs lockd sunrpc fscache ext4 crc16 mbcache jbd2 hid_generic usbhid hid sd_mod sr_mod cdrom ata_generic pata_acpi ahci libahci ata_piix libata ehci_pci scsi_mod uhci_hcd ehci_hcd usbcore usb_common i915 video button i2c_algo_bit intel_agp intel_gtt drm_kms_helper drm i2c_core
CPU: 0 PID: 822 Comm: rpc.gssd Not tainted 3.12.0-1-ARCH #1
Hardware name:
task: ffff88007a60c580 ti: ffff880076646000 task.ti: ffff880076646000
RIP: 0010:[<ffffffffa0a3900c>] [<ffffffffa0a3900c>] put_pipe_version+0x1c/0x80 [auth_rpcgss]
RSP: 0000:ffff880076647e38 EFLAGS: 00010202
RAX: ffff88007a60c580 RBX: 0000000000000001 RCX: 00000000c0000100
RDX: ffff880076647fd8 RSI: ffff88007a60c580 RDI: 0000000000000000
RBP: ffff880076647e48 R08: ffff880076646000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000010 R14: ffff880053defa00 R15: ffff880053dc0d20
FS: 00007f1aa8c4d740(0000) GS:ffff88007ea00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000c68 CR3: 000000007650c000 CR4: 00000000000007f0
Stack:
ffff880053defa00 ffff880059cfe5c8 ffff880076647e60 ffffffffa0a3909d
ffff8800766865c0 ffff880076647ec0 ffffffffa0a39fe8 0000000000000000
0000000000000065 ffff880059cfe5c8 ffff8800766865c4 0000000100317f41
Call Trace:
[<ffffffffa0a3909d>] gss_release_msg+0x2d/0x80 [auth_rpcgss]
[<ffffffffa0a39fe8>] gss_pipe_downcall+0x248/0x540 [auth_rpcgss]
[<ffffffffa048a866>] rpc_pipe_write+0x56/0x70 [sunrpc]
[<ffffffff811a3a8d>] vfs_write+0xbd/0x1e0
[<ffffffff811a44e9>] SyS_write+0x49/0xa0
[<ffffffff814fabad>] system_call_fastpath+0x1a/0x1f
Code: 31 c0 e8 26 07 ab e0 31 c0 eb b4 0f 1f 40 00 66 66 66 66 90 55 48 89 e5 41 54 49 89 fc 53 8b 1d ab 42 a6 ff e8 c6 8b 64 e0 85 db <49> 8b 84 24 68 0c 00 00 74 4b 3b 18 77 47 83 eb 01 48 63 db 48
RIP [<ffffffffa0a3900c>] put_pipe_version+0x1c/0x80 [auth_rpcgss]
RSP <ffff880076647e38>
CR2: 0000000000000c68
---[ end trace be1777d1c894f03f ]---
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=18002 jiffies, g=299650, c=299649, q=7598)
INFO: Stall ended before state dump start
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=72007 jiffies, g=299650, c=299649, q=25163)
INFO: Stall ended before state dump start
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=126013 jiffies, g=299650, c=299649, q=42479)
INFO: Stall ended before state dump start
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=180018 jiffies, g=299650, c=299649, q=59833)
INFO: Stall ended before state dump start
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=234023 jiffies, g=299650, c=299649, q=79646)
INFO: Stall ended before state dump start

Steps to reproduce:
Mount a nfs share on the client with rpc-gssd running to a nfs server and will not ever complete the mount resulting in kernel oops using default package configuration (not sure if this makes a difference). Disabling rpc-gssd with systemctl and rebooting clears up the problems with a slight delay for mounting nfs share (that of which was occurring without rpc-nssd).
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Sunday, 12 January 2014, 13:06 GMT
Reason for closing:  Fixed
Additional comments about closing:  3.12.7-2
Comment by Dave Reisner (falconindy) - Saturday, 23 November 2013, 16:22 GMT
This needs to be reported upstream.

The delay you notice when rpc.gssd isn't running is expected: http://www.spinics.net/lists/linux-nfs/msg38268.html
Comment by Olaf the Lost Viking (OlafLostViking) - Wednesday, 11 December 2013, 17:36 GMT
This still happens with nfs-utils 1.2.9-1 on amd64 3.12.3-1-ARCH within a Xen domU. But not in my dom0 with the same software versions.

Is jelly using Xen, too?
Comment by Rob (Painless) - Sunday, 15 December 2013, 21:08 GMT
I've managed to avoid both the mount delay and running an unrequired service (rpc.gssd) by blacklisting rpcsec_gss_krb5. Seems to work fine on i686. (See https://bugzilla.redhat.com/show_bug.cgi?id=1001934 - third suggested workaround).
Comment by fiat500 (fiat500) - Sunday, 15 December 2013, 23:26 GMT
I second the finding of the blacklisting with x86_64 architecture. My attempts to stall it in nfs configuration files was to no avail.

Loading...