firefly-linux-kernel-4.4.55.git
12 years agonfsd: add notifier to handle mount/unmount of rpc_pipefs sb
Jeff Layton [Wed, 21 Mar 2012 13:52:08 +0000 (09:52 -0400)]
nfsd: add notifier to handle mount/unmount of rpc_pipefs sb

In the event that rpc_pipefs isn't mounted when nfsd starts, we
must register a notifier to handle creating the dentry once it
is mounted, and to remove the dentry on unmount.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: add the infrastructure to handle the cld upcall
Jeff Layton [Wed, 21 Mar 2012 13:52:07 +0000 (09:52 -0400)]
nfsd: add the infrastructure to handle the cld upcall

...and add a mechanism for switching between the "legacy" tracker and
the new one. The decision is made by looking to see whether the
v4recoverydir exists. If it does, then the legacy client tracker is
used.

If it's not, then the kernel will create a "cld" pipe in rpc_pipefs.
That pipe is used to talk to a daemon for handling the upcall.

Most of the data structures for the new client tracker are handled on a
per-namespace basis, so this upcall should be essentially ready for
containerization. For now however, nfsd just starts it by calling the
initialization and exit functions for init_net.

I'm making the assumption that at some point in the future we'll be able
to determine the net namespace from the nfs4_client. Until then, this
patch hardcodes init_net in those places. I've sprinkled some "FIXME"
comments around that code to attempt to make it clear where we'll need
to fix that up later.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: add a header describing upcall to nfsdcld
Jeff Layton [Wed, 21 Mar 2012 13:52:06 +0000 (09:52 -0400)]
nfsd: add a header describing upcall to nfsdcld

The daemon takes a versioned binary struct. Hopefully this should allow
us to revise the struct later if it becomes necessary.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: add a per-net-namespace struct for nfsd
Jeff Layton [Wed, 21 Mar 2012 13:52:05 +0000 (09:52 -0400)]
nfsd: add a per-net-namespace struct for nfsd

Eventually, we'll need this when nfsd gets containerized fully. For
now, create a struct on a per-net-namespace basis that will just hold
a pointer to the cld_net structure. That struct will hold all of the
per-net data that we need for the cld tracker.

Eventually we can add other pernet objects to struct nfsd_net.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agosunrpc: create nfsd dir in rpc_pipefs
Jeff Layton [Wed, 21 Mar 2012 13:52:04 +0000 (09:52 -0400)]
sunrpc: create nfsd dir in rpc_pipefs

Add a new top-level dir in rpc_pipefs to hold the pipe for the clientid
upcall.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: add nfsd4_client_tracking_ops struct and a way to set it
Jeff Layton [Wed, 21 Mar 2012 20:42:43 +0000 (16:42 -0400)]
nfsd: add nfsd4_client_tracking_ops struct and a way to set it

Abstract out the mechanism that we use to track clients into a set of
client name tracking functions.

This gives us a mechanism to plug in a new set of client tracking
functions without disturbing the callers. It also gives us a way to
decide on what tracking scheme to use at runtime.

For now, this just looks like pointless abstraction, but later we'll
add a new alternate scheme for tracking clients on stable storage.

Note too that this patch anticipates the eventual containerization
of this code by passing in struct net pointers in places. No attempt
is made to containerize the legacy client tracker however.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: convert nfs4_client->cl_cb_flags to a generic flags field
Jeff Layton [Wed, 21 Mar 2012 13:52:02 +0000 (09:52 -0400)]
nfsd: convert nfs4_client->cl_cb_flags to a generic flags field

We'll need a way to flag the nfs4_client as already being recorded on
stable storage so that we don't continually upcall. Currently, that's
recorded in the cl_firststate field of the client struct. Using an
entire u32 to store a flag is rather wasteful though.

The cl_cb_flags field is only using 2 bits right now, so repurpose that
to a generic flags field. Rename NFSD4_CLIENT_KILL to
NFSD4_CLIENT_CB_KILL to make it evident that it's part of the callback
flags. Add a mask that we can use for existing checks that look to see
whether any flags are set, so that the new flags don't interfere.

Convert all references to cl_firstate to the NFSD4_CLIENT_STABLE flag,
and add a new NFSD4_CLIENT_RECLAIM_COMPLETE flag.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoMerge nfs containerization work from Trond's tree
J. Bruce Fields [Wed, 21 Mar 2012 20:42:14 +0000 (16:42 -0400)]
Merge nfs containerization work from Trond's tree

The nfs containerization work is a prerequisite for Jeff Layton's reboot
recovery rework.

12 years agoNFS: fix sb->s_id in nfs debug prints
Vivek Trivedi [Thu, 15 Mar 2012 18:28:52 +0000 (23:58 +0530)]
NFS: fix sb->s_id in nfs debug prints

NFS bdi flush thread in ps output is printed like "flush-<major number
in decimal>:<minor number in decimal>"
For example:
$ ps aux | grep flush
 2079 root         0 SW   [flush-0:18]
                                 ^^^^

nfs_bdi_register()
==> bdi_register_dev()
==> bdi_register(bdi, NULL, "%u:%u", MAJOR(dev), MINOR(dev));
                             ^^^^^

However, NFS sb->s_id store major:minor number in hex:

nfs_initialise_sb()
==>         snprintf(sb->s_id, sizeof(sb->s_id),
                 "%x:%x", MAJOR(sb->s_dev), MINOR(sb->s_dev));
                  ^^^^^

If we enable nfs debug prints using command:
$ rpcdebug -m nfs -s all

write to a file:
$ dd if=/dev/zero of=<NFS Mount>/testfile.txt bs=32768 count=1

Without Patch:
[ 2431.032000] NFS:     0 initiated write call (req 0:12/40, 32768 bytes
@ offset 0)                                         ^^^^

With Patch:
[ 2431.032000] NFS:     0 initiated write call (req 0:18/40, 32768 bytes
@ offset 0)                                         ^^^^

We should store NFS "s->s_id" in decimal to avoid confusion between NFS
flush thread name(in ps output) and NFS debug prints.

Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoxprtrdma: Remove assumption that each segment is <= PAGE_SIZE
Tom Tucker [Mon, 20 Feb 2012 19:07:57 +0000 (13:07 -0600)]
xprtrdma: Remove assumption that each segment is <= PAGE_SIZE

The xprtrdma FRMR mapping logic assumes that a segment is <= PAGE_SIZE.
This is not true for NFS4.

Signed-off-by: Tom Tucker <tom@ogc.us>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoxprtrdma: The transport should not bug-check when a dup reply is received
Tom Tucker [Mon, 20 Feb 2012 19:07:42 +0000 (13:07 -0600)]
xprtrdma: The transport should not bug-check when a dup reply is received

The client side RDMA transport will bug check if it receives a duplicate
reply, instead we should simply drop the duplicate reply.

Signed-off-by: Tom Tucker <tom@ogc.us>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agopnfs-obj: autologin: Add support for protocol autologin
Sachin Bhamare [Tue, 20 Mar 2012 03:47:58 +0000 (20:47 -0700)]
pnfs-obj: autologin: Add support for protocol autologin

The pnfs-objects protocol mandates that we autologin into devices not
present in the system, according to information specified in the
get_device_info returned from the server.

The Protocol specifies two login hints.
1. An IP address:port combination
2. A string URI which is constructed as a URL with a protocol prefix
   followed by :// and a string as address. For each  protocol prefix
   the string-address format might be different.

We only support the second option. The first option is just redundant
to the second one.
NOTE: The Kernel part of autologin does not parse the URI string. It
just channels it to a user-mode script. So any new login protocols should
only update the user-mode script which is a part of the nfs-utils package,
but the Kernel need not change.

We implement the autologin by using the call_usermodehelper() API.
(Thanks to Steve Dickson <steved@redhat.com> for pointing it out)
So there is no running daemon needed, and/or special setup.

We Add the osd_login_prog Kernel module parameters which defaults to:
/sbin/osd_login

Kernel try's to upcall the program specified in osd_login_prog. If the file is
not found or the execution fails Kernel will disable any farther upcalls, by
zeroing out  osd_login_prog, Until Admin re-enables it by setting the
osd_login_prog parameter to a proper program.

Also add text about the osd_login program command line API to:
Documentation/filesystems/nfs/pnfs.txt
and documentation of the new  osd_login_prog  module parameter to:
Documentation/kernel-parameters.txt

TODO: Add timeout option in the case osd_login program gets
              stuck

Signed-off-by: Sachin Bhamare <sbhamare@panasas.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Remove nfs4_setup_sequence from generic rename code
Bryan Schumaker [Mon, 19 Mar 2012 18:54:42 +0000 (14:54 -0400)]
NFS: Remove nfs4_setup_sequence from generic rename code

This is an NFS v4 specific operation, so it belongs in the NFS v4 code
and not the generic client.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Remove nfs4_setup_sequence from generic unlink code
Bryan Schumaker [Mon, 19 Mar 2012 18:54:41 +0000 (14:54 -0400)]
NFS: Remove nfs4_setup_sequence from generic unlink code

This is an NFS v4 specific operation, so it belongs in the NFS v4 code
and not the generic client.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Remove nfs4_setup_sequence from generic read code
Bryan Schumaker [Mon, 19 Mar 2012 18:54:40 +0000 (14:54 -0400)]
NFS: Remove nfs4_setup_sequence from generic read code

This is an NFS v4 specific operation, so it belongs in the NFS v4 code
and not the generic client.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Remove nfs4_setup_sequence from generic write code
Bryan Schumaker [Mon, 19 Mar 2012 18:54:39 +0000 (14:54 -0400)]
NFS: Remove nfs4_setup_sequence from generic write code

This is an NFS v4 specific operation, so it belongs in the NFS v4 code
and not the generic client.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Fix more NFS debug related build warnings
Trond Myklebust [Tue, 20 Mar 2012 18:12:46 +0000 (14:12 -0400)]
NFS: Fix more NFS debug related build warnings

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined
Trond Myklebust [Tue, 20 Mar 2012 13:22:00 +0000 (09:22 -0400)]
SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined

Stephen Rothwell reports:
net/sunrpc/rpcb_clnt.c: In function 'rpcb_enc_mapping':
net/sunrpc/rpcb_clnt.c:820:19: warning: unused variable 'task' [-Wunused-variable]
net/sunrpc/rpcb_clnt.c: In function 'rpcb_dec_getport':
net/sunrpc/rpcb_clnt.c:837:19: warning: unused variable 'task' [-Wunused-variable]
net/sunrpc/rpcb_clnt.c: In function 'rpcb_dec_set':
net/sunrpc/rpcb_clnt.c:860:19: warning: unused variable 'task' [-Wunused-variable]
net/sunrpc/rpcb_clnt.c: In function 'rpcb_enc_getaddr':
net/sunrpc/rpcb_clnt.c:892:19: warning: unused variable 'task' [-Wunused-variable]
net/sunrpc/rpcb_clnt.c: In function 'rpcb_dec_getaddr':
net/sunrpc/rpcb_clnt.c:914:19: warning: unused variable 'task' [-Wunused-variable]
fs/lockd/svclock.c:49:20: warning: 'nlmdbg_cookie2a' declared 'static' but never defined [-Wunused-function]

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agonfs: non void functions must return a value
Stephen Rothwell [Tue, 20 Mar 2012 08:26:42 +0000 (19:26 +1100)]
nfs: non void functions must return a value

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Kill compiler warning when RPC_DEBUG is unset
Chuck Lever [Tue, 20 Mar 2012 23:20:53 +0000 (19:20 -0400)]
SUNRPC: Kill compiler warning when RPC_DEBUG is unset

Loads of these:

linux/net/sunrpc/rpcb_clnt.c:942:2: warning: suggest braces around
  empty body in ‘do’ statement [-Wempty-body]

show up when I unset CONFIG_PROC_SYSCTL.  Seen with

  gcc (GCC) 4.6.1 20110908 (Red Hat 4.6.1-9)

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSD: Fix nfs4_verifier memory alignment
Chuck Lever [Fri, 2 Mar 2012 22:13:50 +0000 (17:13 -0500)]
NFSD: Fix nfs4_verifier memory alignment

Clean up due to code review.

The nfs4_verifier's data field is not guaranteed to be u32-aligned.
Casting an array of chars to a u32 * is considered generally
hazardous.

We can fix most of this by using a __be32 array to generate the
verifier's contents and then byte-copying it into the verifier field.

However, there is one spot where there is a backwards compatibility
constraint: the do_nfsd_create() call expects a verifier which is
32-bit aligned.  Fix this spot by forcing the alignment of the create
verifier in the nfsd4_open args structure.

Also, sizeof(nfs4_verifer) is the size of the in-core verifier data
structure, but NFS4_VERIFIER_SIZE is the number of octets in an XDR'd
verifier.  The two are not interchangeable, even if they happen to
have the same value.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoNFSD: Fix warnings when NFSD_DEBUG is not defined
Trond Myklebust [Tue, 20 Mar 2012 19:11:17 +0000 (15:11 -0400)]
NFSD: Fix warnings when NFSD_DEBUG is not defined

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoSUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG
Trond Myklebust [Sun, 18 Mar 2012 18:07:42 +0000 (14:07 -0400)]
SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG

This allows us to turn on/off the dprintk() debugging interfaces for
those distributions that don't ship the 'rpcdebug' utility.
It also allows us to add Kbuild dependencies. Specifically, we already
know that dprintk() in general relies on CONFIG_SYSCTL. Now it turns out
that the NFS dprintks depend on CONFIG_CRC32 after we added support
for the filehandle hash.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Use cond_resched_lock() to reduce latencies in the commit scans
Trond Myklebust [Sat, 17 Mar 2012 15:59:30 +0000 (11:59 -0400)]
NFS: Use cond_resched_lock() to reduce latencies in the commit scans

Ensure that we conditionally drop the inode->i_lock when it is safe
to do so in the commit loops.
We do so after locking the nfs_page, but before removing it from the
commit list. We can then use list_safe_reset_next to recover the loop
after the lock is retaken.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: It is not safe to dereference lsp->ls_state in release_lockowner
Trond Myklebust [Mon, 19 Mar 2012 20:17:18 +0000 (16:17 -0400)]
NFSv4: It is not safe to dereference lsp->ls_state in release_lockowner

It is quite possible for the release_lockowner RPC call to race with the
close RPC call, in which case, we cannot dereference lsp->ls_state in
order to find the nfs_server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: ncommit count is being double decremented
Fred Isaman [Tue, 20 Mar 2012 16:51:24 +0000 (12:51 -0400)]
NFS: ncommit count is being double decremented

The decrement is handled by each call to nfs_request_remove_commit_list,
no need to do it again in nfs_scan_commit.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()
Trond Myklebust [Mon, 19 Mar 2012 17:39:35 +0000 (13:39 -0400)]
SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()

The problem is that for the case of priority queues, we
have to assume that __rpc_remove_wait_queue_priority will move new
elements from the tk_wait.links lists into the queue->tasks[] list.
We therefore cannot use list_for_each_entry_safe() on queue->tasks[],
since that will skip these new tasks that __rpc_remove_wait_queue_priority
is adding.

Without this fix, rpc_wake_up and rpc_wake_up_status will both fail
to wake up all functions on priority wait queues, which can result
in some nasty hangs.

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
12 years agonfsd: merge cookie collision fixes from ext4 tree
J. Bruce Fields [Mon, 19 Mar 2012 16:34:39 +0000 (12:34 -0400)]
nfsd: merge cookie collision fixes from ext4 tree

These changes fix readdir loops on ext4 filesystems with dir_index
turned on.  I'm pulling them from Ted's tree as I'd like to give them
some extra nfsd testing, and expect to be applying (potentially
conflicting) patches to the same code before the next merge window.

From the nfs-ext4-premerge branch of

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
Bernd Schubert [Mon, 19 Mar 2012 02:44:50 +0000 (22:44 -0400)]
nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)

Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
the NFS version. NFSv2 gets 32-bit hashes only.

NOTE: This patch got rather complex as Christoph asked to set the
filp->f_mode flag in the open call or immediatly after dentry_open()
in nfsd_open() to avoid races.
Personally I still do not see a reason for that and in my opinion
FMODE_32BITHASH/FMODE_64BITHASH flags could be set nfsd_readdir(), as it
follows directly after nfsd_open() without a chance of races.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: J. Bruce Fields<bfields@redhat.com>
12 years agonfsd: rename 'int access' to 'int may_flags' in nfsd_open()
Bernd Schubert [Mon, 19 Mar 2012 02:44:49 +0000 (22:44 -0400)]
nfsd: rename 'int access' to 'int may_flags' in nfsd_open()

Just rename this variable, as the next patch will add a flag and
'access' as variable name would not be correct any more.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: J. Bruce Fields<bfields@redhat.com>
12 years agoext4: return 32/64-bit dir name hash according to usage type
Fan Yong [Mon, 19 Mar 2012 02:44:40 +0000 (22:44 -0400)]
ext4: return 32/64-bit dir name hash according to usage type

Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
and telldir().  However, this causes problems if there are 32-bit hash
collisions, since the NFSv2 server can get stuck resending the same
entries from the directory repeatedly.

Allow ext4 to return a full 64-bit hash (both major and minor) for
telldir to decrease the chance of hash collisions.  This still needs
integration on the NFS side.

Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
(blame me if something is not correct)

Signed-off-by: Fan Yong <yong.fan@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoTry using machine credentials for RENEW calls
Sachin Prabhu [Fri, 16 Mar 2012 19:25:52 +0000 (19:25 +0000)]
Try using machine credentials for RENEW calls

Using user credentials for RENEW calls will fail when the user
credentials have expired.

To avoid this, try using the machine credentials when making RENEW
calls. If no machine credentials have been set, fall back to using user
credentials as before.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Fix a few issues in filelayout_commit_pagelist
Trond Myklebust [Fri, 16 Mar 2012 17:52:45 +0000 (13:52 -0400)]
NFSv4.1: Fix a few issues in filelayout_commit_pagelist

- Fix a race in which NFS_I(inode)->commits_outstanding could potentially
  go to zero (triggering a call to nfs_commit_clear_lock()) before we're
  done sending out all the commit RPC calls.

- If nfs_commitdata_alloc fails, there is no reason why we shouldn't
  try to send off all the commits-to-ds.

- Simplify the error handling.

- Change pnfs_commit_list() to always return either
  PNFS_ATTEMPTED or PNFS_NOT_ATTEMPTED.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code
Trond Myklebust [Thu, 15 Mar 2012 21:16:40 +0000 (17:16 -0400)]
NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code

Move more pnfs-isms out of the generic commit code.

Bugfixes:

- filelayout_scan_commit_lists doesn't need to get/put the lseg.
  In fact since it is run under the inode->i_lock, the lseg_put()
  can deadlock.

- Ensure that we distinguish between what needs to be done for
  commit-to-data server and what needs to be done for commit-to-MDS
  using the new flag PG_COMMIT_TO_DS. Otherwise we may end up calling
  put_lseg() on a bucket for a struct nfs_page that got written
  through the MDS.

- Fix a case where we were using list_del() on an nfs_page->wb_list
  instead of list_del_init().

- filelayout_initiate_commit needs to call filelayout_commit_release
  on error instead of the mds_ops->rpc_release(). Otherwise it won't
  clear the commit lock.

Cleanups:

- Let the files layout manage the commit lists for the pNFS case.
  Don't expose stuff like pnfs_choose_commit_list, and the fact
  that the commit buckets hold references to the layout segment
  in common code.

- Cast out the put_lseg() calls for the struct nfs_read/write_data->lseg
  into the pNFS layer from whence they came.

- Let the pNFS layer manage the NFS_INO_PNFS_COMMIT bit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFS: Fix a compile error when !defined NFS_DEBUG
Trond Myklebust [Thu, 15 Mar 2012 01:55:01 +0000 (21:55 -0400)]
NFS: Fix a compile error when !defined NFS_DEBUG

We should use the 'ifdebug' wrapper rather than trying to inline
tests of nfs_debug, so that the code compiles correctly when we
don't define NFS_DEBUG.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Rate limit the state manager for lock reclaim warning messages
William Dauchy [Wed, 14 Mar 2012 11:32:04 +0000 (12:32 +0100)]
NFSv4: Rate limit the state manager for lock reclaim warning messages

Adding rate limit on `Lock reclaim failed` messages since it could fill
up system logs
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agopnfs-obj: Uglify objio_segment allocation for the sake of the principle :-(
Boaz Harrosh [Wed, 14 Mar 2012 03:44:26 +0000 (20:44 -0700)]
pnfs-obj: Uglify objio_segment allocation for the sake of the principle :-(

At some past instance Linus Trovalds wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> commit a84a79e4d369a73c0130b5858199e949432da4c6 upstream.
>
> The size is always valid, but variable-length arrays generate worse code
> for no good reason (unless the function happens to be inlined and the
> compiler sees the length for the simple constant it is).
>
> Also, there seems to be some code generation problem on POWER, where
> Henrik Bakken reports that register r28 can get corrupted under some
> subtle circumstances (interrupt happening at the wrong time?).  That all
> indicates some seriously broken compiler issues, but since variable
> length arrays are bad regardless, there's little point in trying to
> chase it down.
>
> "Just don't do that, then".

Since then any use of "variable length arrays" has become blasphemous.
Even in perfectly good, beautiful, perfectly safe code like the one
below where the variable length arrays are only used as a sizeof()
parameter, for type-safe dynamic structure allocations. GCC is not
executing any stack allocation code.

I have produced a small file which defines two functions main1(unsigned numdevs)
and main2(unsigned numdevs). main1 uses code as before with call to malloc
and main2 uses code as of after this patch. I compiled it as:
gcc -O2 -S see_asm.c
and here is what I get:

<see_asm.s>
main1:
.LFB7:
.cfi_startproc
mov %edi, %edi
leaq 4(%rdi,%rdi), %rdi
salq $3, %rdi
jmp malloc
.cfi_endproc
.LFE7:
.size main1, .-main1
.p2align 4,,15
.globl main2
.type main2, @function
main2:
.LFB8:
.cfi_startproc
mov %edi, %edi
addq $2, %rdi
salq $4, %rdi
jmp malloc
.cfi_endproc
.LFE8:
.size main2, .-main2
.section .text.startup,"ax",@progbits
.p2align 4,,15
</see_asm.s>

*Exact* same code !!!

So please seriously consider not accepting this patch and leave the
perfectly good code intact.

CC: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agofs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash
Bernd Schubert [Wed, 14 Mar 2012 02:51:38 +0000 (22:51 -0400)]
fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash

Those flags are supposed to be set by NFS readdir() to tell ext3/ext4
to 32bit (NFSv2) or 64bit hash values (offsets) in seekdir().

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoNFS: null dereference in dev_remove()
Dan Carpenter [Tue, 13 Mar 2012 17:18:48 +0000 (20:18 +0300)]
NFS: null dereference in dev_remove()

In commit 5ffaf85541 "NFS: replace global bl_wq with per-net one" we
made "msg" a pointer instead of a struct stored in stack memory.  But we
forgot to change the memset() here so we're still clearing stack memory
instead clearing the struct like we intended.  It will lead to a kernel
crash.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Rate limit the state manager warning messages
Trond Myklebust [Mon, 12 Mar 2012 22:01:48 +0000 (18:01 -0400)]
NFSv4: Rate limit the state manager warning messages

Prevent the state manager from filling up system logs when recovery
fails on the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
12 years agoSUNRPC: Don't use variable length automatic arrays in kernel code
Trond Myklebust [Mon, 12 Mar 2012 17:29:05 +0000 (13:29 -0400)]
SUNRPC: Don't use variable length automatic arrays in kernel code

Replace the variable length array in the RPCSEC_GSS crypto code with
a fixed length one. The size should be bounded by the variable
GSS_KRB5_MAX_BLOCKSIZE, so use that.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Check return value from rpc_queue_upcall()
Bryan Schumaker [Mon, 12 Mar 2012 15:33:00 +0000 (11:33 -0400)]
NFS: Check return value from rpc_queue_upcall()

This function could fail to queue the upcall if rpc.idmapd is not running,
causing a warning message to be printed.  Instead, I want to check the
return value and revoke the key if the upcall can't be run.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Only define some function when v4.1 is enabled
Bryan Schumaker [Mon, 12 Mar 2012 15:28:24 +0000 (11:28 -0400)]
NFS: Only define some function when v4.1 is enabled

Now that the nfs4_cb_match_client() function is static, gcc notices that
it is only used when CONFIG_NFS_V4_1 is enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Fix a few sparse warnings
Trond Myklebust [Sun, 11 Mar 2012 19:22:54 +0000 (15:22 -0400)]
SUNRPC: Fix a few sparse warnings

net/sunrpc/svcsock.c:412:22: warning: incorrect type in assignment
(different address spaces)
 - svc_partial_recvfrom now takes a struct kvec, so the variable
   save_iovbase needs to be an ordinary (void *)

Make a bunch of variables in net/sunrpc/xprtsock.c static

Fix a couple of "warning: symbol 'foo' was not declared. Should it be
static?" reports.

Fix a couple of conflicting function declarations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Fix a number of sparse warnings
Trond Myklebust [Sun, 11 Mar 2012 17:11:00 +0000 (13:11 -0400)]
NFS: Fix a number of sparse warnings

Fix a number of "warning: symbol 'foo' was not declared. Should it be
static?" conditions.

Fix 2 cases of "warning: Using plain integer as NULL pointer"

fs/nfs/delegation.c:263:31: warning: restricted fmode_t degrades to integer
  - We want to allow upgrades to a WRITE delegation, but should otherwise
    consider servers that hand out duplicate delegations to be borken.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: replace global bl_wq with per-net one
Stanislav Kinsbursky [Sun, 11 Mar 2012 14:20:31 +0000 (18:20 +0400)]
NFS: replace global bl_wq with per-net one

This queue is used for sleeping in kernel and it have to be per-net since we
don't want to wake any other waiters except in out network nemespace.
BTW, move wq to per-net data is easy. But some way to handle upcall timeouts
have to be provided. On message destroy in case of timeout, tasks, waiting for
message to be delivered, should be awakened. Thus, some data required to
located the right wait queue. Chosen solution replaces rpc_pipe_msg object with
new introduced bl_pipe_msg object, containing rpc_pipe_msg and proper wq.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: replace global bl_mount_reply with per-net one
Stanislav Kinsbursky [Sun, 11 Mar 2012 14:20:23 +0000 (18:20 +0400)]
NFS: replace global bl_mount_reply with per-net one

This global variable is used for blocklayout downcall and thus can be corrupted
if case of existence of multiple networks namespaces.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: remove nfs_inode radix tree
Fred Isaman [Thu, 8 Mar 2012 22:29:35 +0000 (17:29 -0500)]
NFS: remove nfs_inode radix tree

The radix tree is only being used to compile lists of reqs needing commit.
It is simpler to just put the reqs directly into a list.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: remove NFS_PAGE_TAG_LOCKED
Fred Isaman [Thu, 8 Mar 2012 22:29:34 +0000 (17:29 -0500)]
NFS: remove NFS_PAGE_TAG_LOCKED

The last real use of this tag was removed by
commit 7f2f12d963 NFS: Simplify nfs_wb_page()

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.0: Re-establish the callback channel on NFS4ERR_CB_PATHDOWN
Trond Myklebust [Sat, 10 Mar 2012 16:23:15 +0000 (11:23 -0500)]
NFSv4.0: Re-establish the callback channel on NFS4ERR_CB_PATHDOWN

When the NFSv4.0 server tells us that it can no-longer talk to us
on the callback channel, we should attempt a new SETCLIENTID in
order to re-transmit the callback channel information.

Note that as long as we do not change the boot verifier, this is
a safe procedure; the server is required to keep our state.

Also move the function nfs_handle_cb_pathdown to fs/nfs/nfs4state.c,
and change the name in order to mark it as being specific to NFSv4.0.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agonfsd4: make sure set CB_PATH_DOWN sequence flag set
J. Bruce Fields [Fri, 9 Mar 2012 22:02:28 +0000 (17:02 -0500)]
nfsd4: make sure set CB_PATH_DOWN sequence flag set

Make sure this is set whenever there is no callback channel.

If a client does not set up a callback channel at all, then it will get
this flag set from the very start.  That's OK, it can just ignore the
flag if it doesn't care.  If a client does care, I think it's better to
inform it of the problem as early as possible.

Reported-by: Rick Macklem <rmacklem@uoguelph.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoNFSv4: Clean up nfs4_select_rw_stateid()
Trond Myklebust [Thu, 8 Mar 2012 22:42:01 +0000 (17:42 -0500)]
NFSv4: Clean up nfs4_select_rw_stateid()

Ensure that we select delegation stateids first, then
lock stateids and then open stateids.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Don't copy read delegation stateids in setattr
Trond Myklebust [Thu, 8 Mar 2012 22:16:12 +0000 (17:16 -0500)]
NFS: Don't copy read delegation stateids in setattr

The server will just return an NFS4ERR_OPENMODE anyway.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1 cleanup DS stateid error handling
Andy Adamson [Thu, 8 Mar 2012 16:03:53 +0000 (11:03 -0500)]
NFSv4.1 cleanup DS stateid error handling

The error handler nfs4_state parameter is never NULL in the pNFS case as
the open_context must carry an nfs_state.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Return the delegation if the server returns NFS4ERR_OPENMODE
Trond Myklebust [Wed, 7 Mar 2012 21:39:06 +0000 (16:39 -0500)]
NFSv4: Return the delegation if the server returns NFS4ERR_OPENMODE

If a setattr() fails because of an NFS4ERR_OPENMODE error, it is
probably due to us holding a read delegation. Ensure that the
recovery routines return that delegation in this case.

Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
12 years agoNFSv4: Don't free the nfs4_lock_state until after the release_lockowner
Trond Myklebust [Wed, 7 Mar 2012 18:49:12 +0000 (13:49 -0500)]
NFSv4: Don't free the nfs4_lock_state until after the release_lockowner

Otherwise we can end up with sequence id problems if the client reuses
the owner_id before the server has processed the release_lockowner

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1 handle DS stateid errors
Andy Adamson [Wed, 7 Mar 2012 15:49:41 +0000 (10:49 -0500)]
NFSv4.1 handle DS stateid errors

Handle DS READ and WRITE stateid errors by recovering the stateid on the MDS.

NFS4ERR_OLD_STATEID is ignored as the client always sends a
state sequenceid of zero for DS READ and WRITE stateids.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: add fh_crc to debug output
Weston Andros Adamson [Wed, 7 Mar 2012 02:58:20 +0000 (21:58 -0500)]
NFS: add fh_crc to debug output

Print the filehandle crc in two debug messages

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: add filehandle crc for debug display
Weston Andros Adamson [Wed, 7 Mar 2012 01:46:43 +0000 (20:46 -0500)]
NFS: add filehandle crc for debug display

Match wireshark's CRC-32 hash for easier debugging

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agonfsd4: reduce do_open_lookup() stack usage
J. Bruce Fields [Fri, 27 Jan 2012 21:49:55 +0000 (16:49 -0500)]
nfsd4: reduce do_open_lookup() stack usage

I get 320 bytes for struct svc_fh on x86_64, really a little large to be
putting on the stack; kmalloc() instead.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: delay setting current filehandle till success
J. Bruce Fields [Fri, 27 Jan 2012 21:26:02 +0000 (16:26 -0500)]
nfsd4: delay setting current filehandle till success

Compound processing stops on error, so the current filehandle won't be
used on error.  Thus the order here doesn't really matter.  It'll be
more convenient to do it later, though.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd41: free_session/free_client must be called under the client_lock
Benny Halevy [Fri, 24 Feb 2012 01:40:52 +0000 (17:40 -0800)]
nfsd41: free_session/free_client must be called under the client_lock

The session client is manipulated under the client_lock hence
both free_session and nfsd4_del_conns must be called under this lock.

This patch adds a BUG_ON that checks this condition in the
respective functions and implements the missing locks.

nfsd4_{get,put}_session helpers were moved to the C file that uses them
so to prevent use from external files and an unlocked version of
nfsd4_put_session is provided for external use from nfs4xdr.c

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd41: refactor nfsd4_deleg_xgrade_none_ext logic out of nfsd4_process_open2
Benny Halevy [Tue, 21 Feb 2012 22:16:54 +0000 (14:16 -0800)]
nfsd41: refactor nfsd4_deleg_xgrade_none_ext logic out of nfsd4_process_open2

Handle the case where the nfsv4.1 client asked to uprade or downgrade
its delegations and server returns no delegation.

In this case, op_delegate_type is set to NFS4_OPEN_DELEGATE_NONE_EXT
and op_why_no_deleg is set respectively to WND4_NOT_SUPP_{UP,DOWN}GRADE

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd41: refactor nfs4_open_deleg_none_ext logic out of nfs4_open_delegation
Benny Halevy [Tue, 21 Feb 2012 22:16:44 +0000 (14:16 -0800)]
nfsd41: refactor nfs4_open_deleg_none_ext logic out of nfs4_open_delegation

When a 4.1 client asks for a delegation and the server returns none
op_delegate_type is set to NFS4_OPEN_DELEGATE_NONE_EXT
and op_why_no_deleg is set to either WND4_CONTENTION or WND4_RESOURCE.
Or, if the client sent a NFS4_SHARE_WANT_CANCEL (which it is not supposed
to ever do until our server supports delegations signaling),
op_why_no_deleg is set to WND4_CANCELLED.

Note that for WND4_CONTENTION and WND4_RESOURCE, the xdr layer is hard coded
at this time to encode boolean FALSE for ond_server_will_push_deleg /
ond_server_will_signal_avail.

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agosvcrdma: silence a Sparse warning
Dan Carpenter [Tue, 21 Feb 2012 07:28:04 +0000 (10:28 +0300)]
svcrdma: silence a Sparse warning

Sparse complains that the definition function definition and the
implementation aren't anotated the same way.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: fix recovery-entry leak nfsd startup failure
J. Bruce Fields [Tue, 6 Mar 2012 20:52:04 +0000 (15:52 -0500)]
nfsd4: fix recovery-entry leak nfsd startup failure

Another leak on error

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: fix recovery-dir leak on nfsd startup failure
Jeff Layton [Mon, 5 Mar 2012 16:42:36 +0000 (11:42 -0500)]
nfsd4: fix recovery-dir leak on nfsd startup failure

The current code never calls nfsd4_shutdown_recdir if nfs4_state_start
returns an error. Also, it's better to go ahead and consolidate these
functions since one is just a trivial wrapper around the other.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: purge stable client records with insufficient state
J. Bruce Fields [Tue, 6 Mar 2012 19:43:36 +0000 (14:43 -0500)]
nfsd4: purge stable client records with insufficient state

To escape having your stable storage record purged at the end of the
grace period, it's not sufficient to simply have performed a
setclientid_confirm; you also need to meet the same requirements as
someone creating a new record: either you should have done an open or
open reclaim (in the 4.0 case) or a reclaim_complete (in the 4.1 case).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: don't set cl_firststate on first reclaim in 4.1 case
J. Bruce Fields [Tue, 6 Mar 2012 19:35:16 +0000 (14:35 -0500)]
nfsd4: don't set cl_firststate on first reclaim in 4.1 case

We set cl_firststate when we first decide that a client will be
permitted to reclaim state on next boot.  This happens:

- for new 4.0 clients, when they confirm their first open
- for returning 4.0 clients, when they reclaim their first open
- for 4.1+ clients, when they perform reclaim_complete

We also use cl_firststate to decide whether a reclaim_complete has
already been performed, in the 4.1+ case.

We were setting it on 4.1 open reclaims, which caused spurious
COMPLETE_ALREADY errors on RECLAIM_COMPLETE from an nfs4.1 client with
anything to reclaim.

Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoNFSv4: Add a helper encode_uint64
Trond Myklebust [Mon, 5 Mar 2012 16:40:12 +0000 (11:40 -0500)]
NFSv4: Add a helper encode_uint64

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: More xdr cleanups
Trond Myklebust [Mon, 5 Mar 2012 16:27:16 +0000 (11:27 -0500)]
NFSv4: More xdr cleanups

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Cleanup - convert more functions to use encode_op_hdr
Trond Myklebust [Mon, 5 Mar 2012 01:49:32 +0000 (20:49 -0500)]
NFSv4: Cleanup - convert more functions to use encode_op_hdr

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Fix nfs4_verifier memory alignment
Chuck Lever [Fri, 2 Mar 2012 22:14:31 +0000 (17:14 -0500)]
NFS: Fix nfs4_verifier memory alignment

Clean up due to code review.

The nfs4_verifier's data field is not guaranteed to be u32-aligned.
Casting an array of chars to a u32 * is considered generally
hazardous.

Fix this by using a __be32 array to generate a verifier's contents,
and then byte-copy the contents into the verifier field.  The contents
of a verifier, for all intents and purposes, are opaque bytes.  Only
local code that generates a verifier need know the actual content and
format.  Everyone else compares the full byte array for exact
equality.

Also, sizeof(nfs4_verifer) is the size of the in-core verifier data
structure, but NFS4_VERIFIER_SIZE is the number of octets in an XDR'd
verifier.  The two are not interchangeable, even if they happen to
have the same value.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Add a encode op helper
Trond Myklebust [Sun, 4 Mar 2012 23:13:57 +0000 (18:13 -0500)]
NFSv4: Add a encode op helper

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Add a helper for encoding NFSv4 sequence ids
Trond Myklebust [Sun, 4 Mar 2012 23:13:57 +0000 (18:13 -0500)]
NFSv4: Add a helper for encoding NFSv4 sequence ids

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Minor clean ups for encode_string()
Trond Myklebust [Sun, 4 Mar 2012 23:13:57 +0000 (18:13 -0500)]
NFSv4: Minor clean ups for encode_string()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Simplify the struct nfs4_stateid
Trond Myklebust [Sun, 4 Mar 2012 23:13:57 +0000 (18:13 -0500)]
NFSv4: Simplify the struct nfs4_stateid

Replace the union with the common struct stateid4 as defined in both
RFC3530 and RFC5661. This makes it easier to access the sequence id,
which will again make implementing support for parallel OPEN calls
easier.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Add helpers for basic copying of stateids
Trond Myklebust [Sun, 4 Mar 2012 23:13:56 +0000 (18:13 -0500)]
NFSv4: Add helpers for basic copying of stateids

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Rename nfs4_copy_stateid()
Trond Myklebust [Sun, 4 Mar 2012 23:13:56 +0000 (18:13 -0500)]
NFSv4: Rename nfs4_copy_stateid()

It is really a function for selecting the correct stateid to use in a
read or write situation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Add a helper for encoding stateids
Trond Myklebust [Sun, 4 Mar 2012 23:13:56 +0000 (18:13 -0500)]
NFSv4: Add a helper for encoding stateids

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Add a helper for encoding opaque data
Trond Myklebust [Sun, 4 Mar 2012 23:13:56 +0000 (18:13 -0500)]
NFSv4: Add a helper for encoding opaque data

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Rename encode_stateid() to encode_open_stateid()
Trond Myklebust [Sun, 4 Mar 2012 23:13:56 +0000 (18:13 -0500)]
NFSv4: Rename encode_stateid() to encode_open_stateid()

The current version of encode_stateid really only applies to open stateids.
You can't use it for locks, delegations or layouts.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Further clean-ups of delegation stateid validation
Trond Myklebust [Sun, 4 Mar 2012 23:13:56 +0000 (18:13 -0500)]
NFSv4: Further clean-ups of delegation stateid validation

Change the name to reflect what we're really doing: testing two
stateids for whether or not they match according the the rules in
RFC3530 and RFC5661.
Move the code from callback_proc.c to nfs4proc.c

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Fix matching of the stateids when returning a delegation
Trond Myklebust [Sun, 4 Mar 2012 23:13:56 +0000 (18:13 -0500)]
NFSv4.1: Fix matching of the stateids when returning a delegation

nfs41_validate_delegation_stateid is broken if we supply a stateid with
a non-zero sequence id. Instead of trying to match the sequence id,
the function assumes that we always want to error. While this is
true for a delegation callback, it is not true in general.

Also fix a typo in nfs4_callback_recall.

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Properly handle the case where the delegation is revoked
Trond Myklebust [Tue, 6 Mar 2012 00:56:44 +0000 (19:56 -0500)]
NFS: Properly handle the case where the delegation is revoked

If we know that the delegation stateid is bad or revoked, we need to
remove that delegation as soon as possible, and then mark all the
stateids that relied on that delegation for recovery. We cannot use
the delegation as part of the recovery process.

Also note that NFSv4.1 uses a different error code (NFS4ERR_DELEG_REVOKED)
to indicate that the delegation was revoked.

Finally, ensure that setlk() and setattr() can both recover safely from
a revoked delegation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
12 years agoNFS: Fix a typo in _nfs_display_fhandle
Trond Myklebust [Tue, 6 Mar 2012 15:14:35 +0000 (10:14 -0500)]
NFS: Fix a typo in _nfs_display_fhandle

The check for 'fh == NULL' needs to come _before_ we dereference
fh.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Fix a compile issue when !CONFIG_NFS_V4_1
Trond Myklebust [Sun, 4 Mar 2012 23:12:57 +0000 (18:12 -0500)]
NFS: Fix a compile issue when !CONFIG_NFS_V4_1

The attempt to display the implementation ID needs to be conditional on
whether or not CONFIG_NFS_V4_1 is defined

Reported-by: Bryan Schumaker <Bryan.Schumaker@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Undo changes to idmap.h
Bryan Schumaker [Mon, 5 Mar 2012 19:58:15 +0000 (14:58 -0500)]
NFS: Undo changes to idmap.h

When compiled without NFS v4 configured these function won't be defined
and the compiler will yell.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoMerge commit 'nfs-for-3.3-4' into nfs-for-next
Trond Myklebust [Sat, 3 Mar 2012 20:04:15 +0000 (15:04 -0500)]
Merge commit 'nfs-for-3.3-4' into nfs-for-next

Conflicts:
fs/nfs/nfs4proc.c

Back-merge of the upstream kernel in order to fix a conflict with the
slotid type conversion and implementation id patches...

12 years agoNFS: Reduce debugging noise from encode_compound_hdr
Chuck Lever [Fri, 2 Mar 2012 21:58:56 +0000 (16:58 -0500)]
NFS: Reduce debugging noise from encode_compound_hdr

Get rid of

  encode_compound: tag=

when XDR debugging is enabled.  The current Linux client never sets
compound tags.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Request fh_expire_type attribute in "server caps" operation
Chuck Lever [Thu, 1 Mar 2012 22:02:05 +0000 (17:02 -0500)]
NFS: Request fh_expire_type attribute in "server caps" operation

The fh_expire_type file attribute is a filesystem wide attribute that
consists of flags that indicate what characteristics file handles
on this FSID have.

Our client doesn't support volatile file handles.  It should find
out early (say, at mount time) whether the server is going to play
shenanighans with file handles during a migration.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Introduce NFS_ATTR_FATTR_V4_LOCATIONS
Chuck Lever [Thu, 1 Mar 2012 22:01:57 +0000 (17:01 -0500)]
NFS: Introduce NFS_ATTR_FATTR_V4_LOCATIONS

The Linux NFS client must distinguish between referral events (which
it currently supports) and migration events (which it does not yet
support).

In both types of events, an fs_locations array is returned.  But upper
layers, not the XDR layer, should make the distinction between a
referral and a migration.  There really isn't a way for an XDR decoder
function to distinguish the two, in general.

Slightly adjust the FATTR flags returned by decode_fs_locations()
to set NFS_ATTR_FATTR_V4_LOCATIONS only if a non-empty locations
array was returned from the server.  Then have logic in nfs4proc.c
distinguish whether the locations array is for a referral or
something else.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Simplify arguments of encode_renew()
Chuck Lever [Thu, 1 Mar 2012 22:01:48 +0000 (17:01 -0500)]
NFS: Simplify arguments of encode_renew()

Clean up: pass just the clientid4 to encode_renew().  This enables it
to be used by callers who might not have an full nfs_client.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Add a client-side function to display NFS file handles
Chuck Lever [Thu, 1 Mar 2012 22:01:31 +0000 (17:01 -0500)]
NFS: Add a client-side function to display NFS file handles

For debugging, introduce a simplistic function to print NFS file
handles on the system console.  The main function is hooked into the
dprintk debugging facility, but you can directly call the helper,
_nfs_display_fhandle(), if you want to print a handle unconditionally.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Make clientaddr= optional
Chuck Lever [Thu, 1 Mar 2012 22:01:23 +0000 (17:01 -0500)]
NFS: Make clientaddr= optional

For NFSv4 mounts, the clientaddr= mount option has always been
required.  Now we have rpc_localaddr() in the kernel, which was
modeled after the same logic in the mount.nfs command that constructs
the clientaddr= mount option.  If user space doesn't provide a
clientaddr= mount option, the kernel can now construct its own.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Add API to acquire source address
Chuck Lever [Thu, 1 Mar 2012 22:01:14 +0000 (17:01 -0500)]
SUNRPC: Add API to acquire source address

NFSv4.0 clients must send endpoint information for their callback
service to NFSv4.0 servers during their first contact with a server.
Traditionally on Linux, user space provides the callback endpoint IP
address via the "clientaddr=" mount option.

During an NFSv4 migration event, it is possible that an FSID may be
migrated to a destination server that is accessible via a different
source IP address than the source server was.  The client must update
callback endpoint information on the destination server so that it can
maintain leases and allow delegation.

Without a new "clientaddr=" option from user space, however, the
kernel itself must construct an appropriate IP address for the
callback update.  Provide an API in the RPC client for upper layer
RPC consumers to acquire a source address for a remote.

The mechanism used by the mount.nfs command is copied: set up a
connected UDP socket to the designated remote, then scrape the source
address off the socket.  We are careful to select the correct network
namespace when setting up the temporary UDP socket.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Move clnt->cl_server into struct rpc_xprt
Trond Myklebust [Thu, 1 Mar 2012 22:01:05 +0000 (17:01 -0500)]
SUNRPC: Move clnt->cl_server into struct rpc_xprt

When the cl_xprt field is updated, the cl_server field will also have
to change.  Since the contents of cl_server follow the remote endpoint
of cl_xprt, just move that field to the rpc_xprt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: simplify check_gss_callback_principal(), whitespace changes ]
[ cel: forward ported to 3.4 ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Use RCU to dereference the rpc_clnt.cl_xprt field
Trond Myklebust [Thu, 1 Mar 2012 22:00:56 +0000 (17:00 -0500)]
SUNRPC: Use RCU to dereference the rpc_clnt.cl_xprt field

A migration event will replace the rpc_xprt used by an rpc_clnt.  To
ensure this can be done safely, all references to cl_xprt must now use
a form of rcu_dereference().

Special care is taken with rpc_peeraddr2str(), which returns a pointer
to memory whose lifetime is the same as the rpc_xprt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: fix lockdep splats and layering violations ]
[ cel: forward ported to 3.4 ]
[ cel: remove rpc_max_reqs(), add rpc_net_ns() ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Add debugging messages to NFSv4's CLOSE procedure
Chuck Lever [Thu, 1 Mar 2012 22:00:40 +0000 (17:00 -0500)]
NFS: Add debugging messages to NFSv4's CLOSE procedure

CLOSE is new with NFSv4.  Sometimes it's important to know the timing
of this operation compared to things like lease renewal.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Clean up debugging in decode_pathname()
Chuck Lever [Thu, 1 Mar 2012 22:00:31 +0000 (17:00 -0500)]
NFS: Clean up debugging in decode_pathname()

I noticed recently that decode_attr_fs_locations() is not generating
very pretty debugging output.  The pathname components each appear on
a separate line of output, though that does not appear to be the
intended display behavior.  The preferred way to generate continued
lines of output on the console is to use pr_cont().

Note that incoming pathname4 components contain a string that is not
necessarily NUL-terminated.  I did actually see some trailing garbage
on the console.  In addition to correcting the line continuation
problem, add a string precision format specifier to ensure that each
component string is displayed properly, and that vsnprintf() does
not Oops.

Someone pointed out that allowing incoming network data to possibly
generate a console line of unbounded length may not be such a good
idea.  Since this output will rarely be enabled, and there is a hard
upper bound (NFS4_PATHNAME_MAXCOMPONENTS) in our implementation, this
is probably not a major concern.

It might be useful to additionally sanity-check the length of each
incoming component, however.  RFC 3530bis15 does not suggest a maximum
number of UTF-8 characters per component for either the pathname4 or
component4 types.  However, we could invent one that is appropriate
for our implementation.

Another possibility is to scrap all of this and print these pathnames
in upper layers after a reasonable amount of sanity checking in the
XDR layer.  This would give us an opportunity to allocate a full
buffer so that the whole pathname would be output via a single
dprintk.

Introduced by commit 7aaa0b3b: "NFSv4: convert fs-locations-components
to conform to RFC3530," (June 9, 2006).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>