firefly-linux-kernel-4.4.55.git
11 years agonfsd4: better error return to indicate SSV non-support
J. Bruce Fields [Fri, 12 Apr 2013 22:10:56 +0000 (18:10 -0400)]
nfsd4: better error return to indicate SSV non-support

As 4.1 becomes less experimental and SSV still isn't implemented, we
have to admit it's not going to be, and return some sensible error
rather than just saying "our server's broken".  Discussion in the ietf
group hasn't turned up any objections to using NFS4ERR_ENC_ALG_UNSUPP
for that purpose.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd: fix EXDEV checking in rename
J. Bruce Fields [Mon, 15 Apr 2013 20:03:46 +0000 (16:03 -0400)]
nfsd: fix EXDEV checking in rename

We again check for the EXDEV a little later on, so the first check is
redundant.  This check is also slightly racier, since a badly timed
eviction from the export cache could leave us with the two fh_export
pointers pointing to two different cache entries which each refer to the
same underlying export.

It's better to compare vfsmounts as the later check does, but that
leaves a minor security hole in the case where the two exports refer to
two different directories especially if (for example) they have
different root-squashing options.

So, compare ex_path.dentry too.

Reported-by: Joe Habermann <joe.habermann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoSUNRPC: Use gssproxy upcall for server RPCGSS authentication.
Simo Sorce [Fri, 25 May 2012 22:09:56 +0000 (18:09 -0400)]
SUNRPC: Use gssproxy upcall for server RPCGSS authentication.

The main advantge of this new upcall mechanism is that it can handle
big tickets as seen in Kerberos implementations where tickets carry
authorization data like the MS-PAC buffer with AD or the Posix Authorization
Data being discussed in IETF on the krbwg working group.

The Gssproxy program is used to perform the accept_sec_context call on the
kernel's behalf. The code is changed to also pass the input buffer straight
to upcall mechanism to avoid allocating and copying many pages as tokens can
be as big (potentially more in future) as 64KiB.

Signed-off-by: Simo Sorce <simo@redhat.com>
[bfields: containerization, negotiation api]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoSUNRPC: Add RPC based upcall mechanism for RPCGSS auth
Simo Sorce [Fri, 25 May 2012 22:09:55 +0000 (18:09 -0400)]
SUNRPC: Add RPC based upcall mechanism for RPCGSS auth

This patch implements a sunrpc client to use the services of the gssproxy
userspace daemon.

In particular it allows to perform calls in user space using an RPC
call instead of custom hand-coded upcall/downcall messages.

Currently only accept_sec_context is implemented as that is all is needed for
the server case.

File server modules like NFS and CIFS can use full gssapi services this way,
once init_sec_context is also implemented.

For the NFS server case this code allow to lift the limit of max 2k krb5
tickets. This limit is prevents legitimate kerberos deployments from using krb5
authentication with the Linux NFS server as they have normally ticket that are
many kilobytes large.

It will also allow to lift the limitation on the size of the credential set
(uid,gid,gids) passed down from user space for users that have very many groups
associated. Currently the downcall mechanism used by rpc.svcgssd is limited
to around 2k secondary groups of the 65k allowed by kernel structures.

Signed-off-by: Simo Sorce <simo@redhat.com>
[bfields: containerization, concurrent upcalls, misc. fixes and cleanup]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoSUNRPC: conditionally return endtime from import_sec_context
Simo Sorce [Fri, 25 May 2012 22:09:53 +0000 (18:09 -0400)]
SUNRPC: conditionally return endtime from import_sec_context

We expose this parameter for a future caller.
It will be used to extract the endtime from the gss-proxy upcall mechanism,
in order to set the rsc cache expiration time.

Signed-off-by: Simo Sorce <simo@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoSUNRPC: allow disabling idle timeout
J. Bruce Fields [Thu, 11 Apr 2013 19:06:36 +0000 (15:06 -0400)]
SUNRPC: allow disabling idle timeout

In the gss-proxy case we don't want to have to reconnect at random--we
want to connect only on gss-proxy startup when we can steal gss-proxy's
context to do the connect in the right namespace.

So, provide a flag that allows the rpc_create caller to turn off the
idle timeout.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoSUNRPC: attempt AF_LOCAL connect on setup
J. Bruce Fields [Thu, 21 Feb 2013 15:14:22 +0000 (10:14 -0500)]
SUNRPC: attempt AF_LOCAL connect on setup

In the gss-proxy case, setup time is when I know I'll have the right
namespace for the connect.

In other cases, it might be useful to get any connection errors
earlier--though actually in practice it doesn't make any difference for
rpcbind.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoMerge Trond's nfs-for-next
J. Bruce Fields [Fri, 26 Apr 2013 15:37:29 +0000 (11:37 -0400)]
Merge Trond's nfs-for-next

Merging Trond's nfs-for-next branch, mainly to get
b7993cebb841b0da7a33e9d5ce301a9fd3209165 "SUNRPC: Allow rpc_create() to
request that TCP slots be unlimited", which a small piece of the
gss-proxy work depends on.

11 years agonfsd: Decode and send 64bit time values
Bryan Schumaker [Fri, 19 Apr 2013 20:09:38 +0000 (16:09 -0400)]
nfsd: Decode and send 64bit time values

The seconds field of an nfstime4 structure is 64bit, but we are assuming
that the first 32bits are zero-filled.  So if the client tries to set
atime to a value before the epoch (touch -t 196001010101), then the
server will save the wrong value on disk.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoNFSv4: Ensure that we clear the NFS_OPEN_STATE flag when appropriate
Trond Myklebust [Mon, 22 Apr 2013 15:29:51 +0000 (11:29 -0400)]
NFSv4: Ensure that we clear the NFS_OPEN_STATE flag when appropriate

We should always clear it before initiating file recovery.
Also ensure that we clear it after a CLOSE and/or after TEST_STATEID fails.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoLOCKD: Ensure that nlmclnt_block resets block->b_status after a server reboot
Trond Myklebust [Sun, 21 Apr 2013 22:01:06 +0000 (18:01 -0400)]
LOCKD: Ensure that nlmclnt_block resets block->b_status after a server reboot

After a server reboot, the reclaimer thread will recover all the existing
locks. For locks that are blocked, however, it will change the value
of block->b_status to nlm_lck_denied_grace_period in order to signal that
they need to wake up and resend the original blocking lock request.

Due to a bug, however, the block->b_status never gets reset after the
blocked locks have been woken up, and so the process goes into an
infinite loop of resends until the blocked lock is satisfied.

Reported-by: Marc Eshel <eshel@us.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
11 years agoNFSv4: Ensure the LOCK call cannot use the delegation stateid
Trond Myklebust [Sat, 20 Apr 2013 05:30:53 +0000 (01:30 -0400)]
NFSv4: Ensure the LOCK call cannot use the delegation stateid

Defensive patch to ensure that we copy the state->open_stateid, which
can never be set to the delegation stateid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Use the open stateid if the delegation has the wrong mode
Trond Myklebust [Sat, 20 Apr 2013 05:25:45 +0000 (01:25 -0400)]
NFSv4: Use the open stateid if the delegation has the wrong mode

Fix nfs4_select_rw_stateid() so that it chooses the open stateid
(or an all-zero stateid) if the delegation does not match the selected
read/write mode.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agonfs: Send atime and mtime as a 64bit value
Bryan Schumaker [Fri, 19 Apr 2013 20:09:37 +0000 (16:09 -0400)]
nfs: Send atime and mtime as a 64bit value

RFC 3530 says that the seconds value of a nfstime4 structure is a 64bit
value, but we are instead sending a 32-bit 0 and then a 32bit conversion
of the 64bit Linux value.  This means that if we try to set atime to a
value before the epoch (touch -t 196001010101) the client will only send
part of the new value due to lost precision.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agonfsd4: put_client_renew_locked can be static
Fengguang Wu [Wed, 17 Apr 2013 02:14:15 +0000 (22:14 -0400)]
nfsd4: put_client_renew_locked can be static

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: remove unused macro
J. Bruce Fields [Wed, 17 Apr 2013 01:29:03 +0000 (21:29 -0400)]
nfsd4: remove unused macro

Cleanup a piece I forgot to remove in
9411b1d4c7df26dca6bc6261b5dc87a5b4c81e5c "nfsd4: cleanup handling of
nfsv4.0 closed stateid's".

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoNFSv4: Record the OPEN create mode used in the nfs4_opendata structure
Trond Myklebust [Tue, 16 Apr 2013 22:42:34 +0000 (18:42 -0400)]
NFSv4: Record the OPEN create mode used in the nfs4_opendata structure

If we're doing NFSv4.1 against a server that has persistent sessions,
then we should not need to call SETATTR in order to reset the file
attributes immediately after doing an exclusive create.

Note that since the create mode depends on the type of session that
has been negotiated with the server, we should not choose the
mode until after we've got a session slot.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agonfsd4: remove some useless code
fanchaoting [Thu, 11 Apr 2013 13:24:13 +0000 (21:24 +0800)]
nfsd4: remove some useless code

The "list_empty(&oo->oo_owner.so_stateids)" is aways true, so remove it.

Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED
J. Bruce Fields [Tue, 9 Apr 2013 21:02:51 +0000 (17:02 -0400)]
nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED

A 4.1 server must notify a client that has had any state revoked using
the SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag.  The client can figure
out exactly which state is the problem using CHECK_STATEID and then free
it using FREE_STATEID.  The status flag will be unset once all such
revoked stateids are freed.

Our server's only recallable state is delegations.  So we keep with each
4.1 client a list of delegations that have timed out and been recalled,
but haven't yet been freed by FREE_STATEID.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoNFSv4.1: Set the RPC_CLNT_CREATE_INFINITE_SLOTS flag for NFSv4.1 transports
Trond Myklebust [Sun, 14 Apr 2013 15:49:51 +0000 (11:49 -0400)]
NFSv4.1: Set the RPC_CLNT_CREATE_INFINITE_SLOTS flag for NFSv4.1 transports

This ensures that the RPC layer doesn't override the NFS session
negotiation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoSUNRPC: Allow rpc_create() to request that TCP slots be unlimited
Trond Myklebust [Sun, 14 Apr 2013 15:42:00 +0000 (11:42 -0400)]
SUNRPC: Allow rpc_create() to request that TCP slots be unlimited

This is mainly for use by NFSv4.1, where the session negotiation
ultimately wants to decide how many RPC slots we can fill.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoSUNRPC: Fix a livelock problem in the xprt->backlog queue
Trond Myklebust [Sun, 14 Apr 2013 14:49:37 +0000 (10:49 -0400)]
SUNRPC: Fix a livelock problem in the xprt->backlog queue

This patch ensures that we throttle new RPC requests if there are
requests already waiting in the xprt->backlog queue. The reason for
doing this is to fix livelock issues that can occur when an existing
(high priority) task is waiting in the backlog queue, gets woken up
by xprt_free_slot(), but a new task then steals the slot.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Fix handling of revoked delegations by setattr
Trond Myklebust [Fri, 12 Apr 2013 19:04:51 +0000 (15:04 -0400)]
NFSv4: Fix handling of revoked delegations by setattr

Currently, _nfs4_do_setattr() will use the delegation stateid if no
writeable open file stateid is available.
If the server revokes that delegation stateid, then the call to
nfs4_handle_exception() will fail to handle the error due to the
lack of a struct nfs4_state, and will just convert the error into
an EIO.

This patch just removes the requirement that we must have a
struct nfs4_state in order to invalidate the delegation and
retry.

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4 release the sequence id in the return on close case
Andy Adamson [Thu, 11 Apr 2013 13:28:45 +0000 (09:28 -0400)]
NFSv4 release the sequence id in the return on close case

Otherwise we deadlock if state recovery is initiated while we
sleep.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agonfs: remove unnecessary check for NULL inode->i_flock from nfs_delegation_claim_locks
Jeff Layton [Wed, 10 Apr 2013 19:36:48 +0000 (15:36 -0400)]
nfs: remove unnecessary check for NULL inode->i_flock from nfs_delegation_claim_locks

The second check was added in commit 65b62a29 but it will never be true.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agonfsd4: clean up validate_stateid
J. Bruce Fields [Tue, 9 Apr 2013 21:42:28 +0000 (17:42 -0400)]
nfsd4: clean up validate_stateid

The logic here is better expressed with a switch statement.

While we're here, CLOSED stateids (or stateids of an unkown type--which
would indicate a server bug) should probably return nfserr_bad_stateid,
though this behavior shouldn't affect any non-buggy client.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: check backchannel attributes on create_session
J. Bruce Fields [Tue, 9 Apr 2013 15:34:36 +0000 (11:34 -0400)]
nfsd4: check backchannel attributes on create_session

Make sure the client gives us an adequate backchannel.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: fix forechannel attribute negotiation
J. Bruce Fields [Mon, 8 Apr 2013 20:44:14 +0000 (16:44 -0400)]
nfsd4: fix forechannel attribute negotiation

Negotiation of the 4.1 session forechannel attributes is a mess.  Fix:

- Move it all into check_forechannel_attrs instead of spreading
  it between that, alloc_session, and init_forechannel_attrs.
- set a minimum "slotsize" so that our drc memory limits apply
  even for small maxresponsesize_cached.  This also fixes some
  bugs when slotsize becomes <= 0.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: cleanup check_forechannel_attrs
J. Bruce Fields [Mon, 8 Apr 2013 19:42:12 +0000 (15:42 -0400)]
nfsd4: cleanup check_forechannel_attrs

Pass this struct by reference, not by value, and return an error instead
of a boolean to allow for future additions.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: don't close read-write opens too soon
J. Bruce Fields [Fri, 29 Mar 2013 00:37:14 +0000 (20:37 -0400)]
nfsd4: don't close read-write opens too soon

Don't actually close any opens until we don't need them at all.

This means being left with write access when it's not really necessary,
but that's better than putting a file that might still have posix locks
held on it, as we have been.

Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: release lockowners on last unlock in 4.1 case
J. Bruce Fields [Sun, 7 Apr 2013 17:28:16 +0000 (13:28 -0400)]
nfsd4: release lockowners on last unlock in 4.1 case

In the 4.1 case we're supposed to release lockowners as soon as they're
no longer used.

It would probably be more efficient to reference count them, but that's
slightly fiddly due to the need to have callbacks from locks.c to take
into account lock merging and splitting.

For most cases just scanning the inode's lock list on unlock for
matching locks will be sufficient.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: more sessions/open-owner-replay cleanup
J. Bruce Fields [Fri, 22 Mar 2013 22:03:49 +0000 (18:03 -0400)]
nfsd4: more sessions/open-owner-replay cleanup

More logic that's unnecessary in the 4.1 case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: no need for replay_owner in sessions case
J. Bruce Fields [Fri, 22 Mar 2013 21:44:19 +0000 (17:44 -0400)]
nfsd4: no need for replay_owner in sessions case

The replay_owner will never be used in the sessions case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: remove some redundant comments
J. Bruce Fields [Sun, 7 Apr 2013 17:21:08 +0000 (13:21 -0400)]
nfsd4: remove some redundant comments

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd: use kmem_cache_free() instead of kfree()
Wei Yongjun [Tue, 9 Apr 2013 06:15:31 +0000 (14:15 +0800)]
nfsd: use kmem_cache_free() instead of kfree()

memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoNFS: Ensure that NFS file unlock waits for readahead to complete
Trond Myklebust [Tue, 9 Apr 2013 01:49:53 +0000 (21:49 -0400)]
NFS: Ensure that NFS file unlock waits for readahead to complete

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Add functionality to allow waiting on all outstanding reads to complete
Trond Myklebust [Tue, 9 Apr 2013 01:38:12 +0000 (21:38 -0400)]
NFS: Add functionality to allow waiting on all outstanding reads to complete

This will later allow NFS locking code to wait for readahead to complete
before releasing byte range locks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Handle timeouts correctly when probing for lease validity
Trond Myklebust [Mon, 8 Apr 2013 21:50:28 +0000 (17:50 -0400)]
NFSv4: Handle timeouts correctly when probing for lease validity

When we send a RENEW or SEQUENCE operation in order to probe if the
lease is still valid, we want it to be able to time out since the
lease we are probing is likely to time out too. Currently, because
we use soft mount semantics for these RPC calls, the return value
is EIO, which causes the state manager to exit with an "unhandled
error" message.
This patch changes the call semantics, so that the RPC layer returns
ETIMEDOUT instead of EIO. We then have the state manager default to
a simple retry instead of exiting.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agonfsd4: cleanup handling of nfsv4.0 closed stateid's
J. Bruce Fields [Mon, 1 Apr 2013 20:37:12 +0000 (16:37 -0400)]
nfsd4: cleanup handling of nfsv4.0 closed stateid's

Closed stateid's are kept around a little while to handle close replays
in the 4.0 case.  So we stash them in the last-used stateid in the
oo_last_closed_stateid field of the open owner.  We can free that in
encode_seqid_op_tail once the seqid on the open owner is next
incremented.  But we don't want to do that on the close itself; so we
set NFS4_OO_PURGE_CLOSE flag set on the open owner, skip freeing it the
first time through encode_seqid_op_tail, then when we see that flag set
next time we free it.

This is unnecessarily baroque.

Instead, just move the logic that increments the seqid out of the xdr
code and into the operation code itself.

The justification given for the current placement is that we need to
wait till the last minute to be sure we know whether the status is a
sequence-id-mutating error or not, but examination of the code shows
that can't actually happen.

Reported-by: Yanchuan Nian <ycnian@gmail.com>
Tested-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoNFSv4: Fix CB_RECALL_ANY to only return delegations that are not in use
Trond Myklebust [Wed, 3 Apr 2013 23:27:52 +0000 (19:27 -0400)]
NFSv4: Fix CB_RECALL_ANY to only return delegations that are not in use

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Clean up nfs_expire_all_delegations
Trond Myklebust [Wed, 3 Apr 2013 23:23:58 +0000 (19:23 -0400)]
NFSv4: Clean up nfs_expire_all_delegations

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Fix nfs_server_return_all_delegations
Trond Myklebust [Wed, 3 Apr 2013 23:04:58 +0000 (19:04 -0400)]
NFSv4: Fix nfs_server_return_all_delegations

If the state manager thread is already running, we may end up
racing with it in nfs_client_return_marked_delegations. Better to
just allow the state manager thread to do the job.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Be less aggressive about returning delegations for open files
Trond Myklebust [Wed, 3 Apr 2013 18:33:49 +0000 (14:33 -0400)]
NFSv4: Be less aggressive about returning delegations for open files

Currently, if the application that holds the file open isn't doing
I/O, we may end up returning the delegation. This means that we can
no longer cache the file as aggressively, and often also that we
multiply the state that both the server and the client needs to track.

This patch adds a check for open files to the routine that scans
for delegations that are unreferenced.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Clean up delegation recall error handling
Trond Myklebust [Mon, 1 Apr 2013 19:56:46 +0000 (15:56 -0400)]
NFSv4: Clean up delegation recall error handling

Unify the error handling in nfs4_open_delegation_recall and
nfs4_lock_delegation_recall.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Clean up nfs4_open_delegation_recall
Trond Myklebust [Mon, 1 Apr 2013 19:40:44 +0000 (15:40 -0400)]
NFSv4: Clean up nfs4_open_delegation_recall

Make it symmetric with nfs4_lock_delegation_recall

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Clean up nfs4_lock_delegation_recall
Trond Myklebust [Mon, 1 Apr 2013 18:47:22 +0000 (14:47 -0400)]
NFSv4: Clean up nfs4_lock_delegation_recall

All error cases are handled by the switch() statement, meaning that the
call to nfs4_handle_exception() is unreachable.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_open_delegation_recall
Trond Myklebust [Mon, 1 Apr 2013 19:34:05 +0000 (15:34 -0400)]
NFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_open_delegation_recall

A server shouldn't normally return NFS4ERR_GRACE if the client holds a
delegation, since no conflicting lock reclaims can be granted, however
the spec does not require the server to grant the open in this
instance

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
11 years agoNFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_lock_delegation_recall
Trond Myklebust [Mon, 1 Apr 2013 18:27:29 +0000 (14:27 -0400)]
NFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_lock_delegation_recall

A server shouldn't normally return NFS4ERR_GRACE if the client holds a
delegation, since no conflicting lock reclaims can be granted, however
the spec does not require the server to grant the lock in this
instance.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
11 years agosunrpc: drop "select NETVM"
Paul Bolle [Sat, 9 Mar 2013 16:02:31 +0000 (17:02 +0100)]
sunrpc: drop "select NETVM"

The Kconfig entry for SUNRPC_SWAP selects NETVM. That select statement
was added in commit a564b8f0398636ba30b07c0eaebdef7ff7837249 ("nfs:
enable swap on NFS"). But there's no Kconfig symbol NETVM. It apparently
was only in used in development versions of the swap over nfs
functionality but never entered mainline. Anyhow, it is a nop and can
safely be dropped.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agonfs: allow the v4.1 callback thread to freeze
Jeff Layton [Mon, 25 Mar 2013 11:59:57 +0000 (07:59 -0400)]
nfs: allow the v4.1 callback thread to freeze

The v4.1 callback thread has set_freezable() at the top, but it doesn't
ever try to freeze within the loop. Have it call try_to_freeze() at the
top of the loop. If a freeze event occurs, recheck kthread_should_stop()
after thawing.

Reported-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agonfsd4: remove unused nfs4_check_deleg argument
J. Bruce Fields [Thu, 21 Mar 2013 19:49:47 +0000 (15:49 -0400)]
nfsd4: remove unused nfs4_check_deleg argument

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: make del_recall_lru per-network-namespace
J. Bruce Fields [Thu, 21 Mar 2013 19:19:33 +0000 (15:19 -0400)]
nfsd4: make del_recall_lru per-network-namespace

If nothing else this simplifies the nfs4_state_shutdown_net logic a tad.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: shut down more of delegation earlier
J. Bruce Fields [Thu, 21 Mar 2013 15:21:50 +0000 (11:21 -0400)]
nfsd4: shut down more of delegation earlier

Once we've unhashed the delegation, it's only hanging around for the
benefit of an oustanding recall, which only needs the encoded
filehandle, stateid, and dl_retries counter.  No point keeping the file
around any longer, or keeping it hashed.

This also fixes a race: calls to idr_remove should really be serialized
by the caller, but the nfs4_put_delegation call from the callback code
isn't taking the state lock.

(Better might be to cancel the callback before destroying the
delegation, and remove any need for reference counting--but I don't see
an easy way to cancel an rpc call.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: minor cb_recall simplification
J. Bruce Fields [Thu, 21 Mar 2013 14:59:29 +0000 (10:59 -0400)]
nfsd4: minor cb_recall simplification

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoSUNRPC/cache: add module_put() on error path in cache_open()
Alexey Khoroshilov [Fri, 22 Mar 2013 20:36:44 +0000 (00:36 +0400)]
SUNRPC/cache: add module_put() on error path in cache_open()

If kmalloc() fails in cache_open(), module cd->owner left locked.
The patch adds module_put(cd->owner) on this path.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd: remove /proc/fs/nfs when create /proc/fs/nfs/exports error
fanchaoting [Wed, 27 Mar 2013 08:31:18 +0000 (16:31 +0800)]
nfsd: remove /proc/fs/nfs when create /proc/fs/nfs/exports error

when create /proc/fs/nfs/exports error, we should remove /proc/fs/nfs,
if don't do it, it maybe cause Memory leak.

Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Reviewed-by: chendt.fnst <chendt.fnst@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd: don't run get_file if nfs4_preprocess_stateid_op return error
fanchaoting [Mon, 1 Apr 2013 13:07:22 +0000 (21:07 +0800)]
nfsd: don't run get_file if nfs4_preprocess_stateid_op return error

we should return error status directly when nfs4_preprocess_stateid_op
return error.

Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd: convert the file_hashtbl to a hlist
Jeff Layton [Tue, 2 Apr 2013 13:01:59 +0000 (09:01 -0400)]
nfsd: convert the file_hashtbl to a hlist

We only ever traverse the hash chains in the forward direction, so a
double pointer list head isn't really necessary.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: don't destroy in-use session
J. Bruce Fields [Tue, 19 Mar 2013 16:05:39 +0000 (12:05 -0400)]
nfsd4: don't destroy in-use session

This changes session destruction to be similar to client destruction in
that attempts to destroy a session while in use (which should be rare
corner cases) result in DELAY.  This simplifies things somewhat and
helps meet a coming 4.2 requirement.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: don't destroy in-use clients
J. Bruce Fields [Tue, 2 Apr 2013 02:23:49 +0000 (22:23 -0400)]
nfsd4: don't destroy in-use clients

When a setclientid_confirm or create_session confirms a client after a
client reboot, it also destroys any previous state held by that client.

The shutdown of that previous state must be careful not to free the
client out from under threads processing other requests that refer to
the client.

This is a particular problem in the NFSv4.1 case when we hold a
reference to a session (hence a client) throughout compound processing.

The server attempts to handle this by unhashing the client at the time
it's destroyed, then delaying the final free to the end.  But this still
leaves some races in the current code.

I believe it's simpler just to fail the attempt to destroy the client by
returning NFS4ERR_DELAY.  This is a case that should never happen
anyway.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: simplify bind_conn_to_session locking
J. Bruce Fields [Mon, 18 Mar 2013 21:31:30 +0000 (17:31 -0400)]
nfsd4: simplify bind_conn_to_session locking

The locking here is very fiddly, and there's no reason for us to be
setting cstate->session, since this is the only op in the compound.
Let's just take the state lock and drop the reference counting.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: fix destroy_session race
J. Bruce Fields [Thu, 14 Mar 2013 23:55:33 +0000 (19:55 -0400)]
nfsd4: fix destroy_session race

destroy_session uses the session and client without continuously holding
any reference or locks.

Put the whole thing under the state lock for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: clientid lookup cleanup
J. Bruce Fields [Thu, 14 Mar 2013 22:24:52 +0000 (18:24 -0400)]
nfsd4: clientid lookup cleanup

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: destroy_clientid simplification
J. Bruce Fields [Thu, 14 Mar 2013 22:20:01 +0000 (18:20 -0400)]
nfsd4: destroy_clientid simplification

I'm not sure what the check for clientid expiry was meant to do here.

The check for a matching session is redundant given the previous check
for state: a client without state is, in particular, a client without
sessions.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: remove some dprintk's
J. Bruce Fields [Thu, 14 Mar 2013 22:12:03 +0000 (18:12 -0400)]
nfsd4: remove some dprintk's

E.g. printk's that just report the return value from an op are
uninteresting as we already do that in the main proc_compound loop.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: STALE_STATEID cleanup
J. Bruce Fields [Tue, 12 Mar 2013 21:36:17 +0000 (17:36 -0400)]
nfsd4: STALE_STATEID cleanup

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: warn on odd create_session state
J. Bruce Fields [Tue, 12 Mar 2013 14:12:37 +0000 (10:12 -0400)]
nfsd4: warn on odd create_session state

This should never happen.

(Note: the comparable case in setclientid_confirm *can* happen, since
updating a client record can result in both confirmed and unconfirmed
records with the same clientid.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd: fix bug on nfs4 stateid deallocation
ycnian@gmail.com [Mon, 11 Mar 2013 00:46:14 +0000 (08:46 +0800)]
nfsd: fix bug on nfs4 stateid deallocation

NFS4_OO_PURGE_CLOSE is not handled properly. To avoid memory leak, nfs4
stateid which is pointed by oo_last_closed_stid is freed in nfsd4_close(),
but NFS4_OO_PURGE_CLOSE isn't cleared meanwhile. So the stateid released in
THIS close procedure may be freed immediately in the coming encoding function.
Sorry that Signed-off-by was forgotten in last version.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd: remove unused macro in nfsv4
Yanchuan Nian [Mon, 11 Mar 2013 02:43:26 +0000 (10:43 +0800)]
nfsd: remove unused macro in nfsv4

lk_rflags is never used anywhere, and rflags is not defined in struct
nfsd4_lock.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: fix use-after-free of 4.1 client on connection loss
J. Bruce Fields [Fri, 8 Mar 2013 14:30:43 +0000 (09:30 -0500)]
nfsd4: fix use-after-free of 4.1 client on connection loss

Once we drop the lock here there's nothing keeping the client around:
the only lock still held is the xpt_lock on this socket, but this socket
no longer has any connection with the client so there's no way for other
code to know we're still using the client.

The solution is simple: all nfsd4_probe_callback does is set a few
variables and queue some work, so there's no reason we can't just keep
it under the lock.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: fix race on client shutdown
J. Bruce Fields [Thu, 7 Mar 2013 22:26:18 +0000 (17:26 -0500)]
nfsd4: fix race on client shutdown

Dropping the session's reference count after the client's means we leave
a window where the session's se_client pointer is NULL.  An xpt_user
callback that encounters such a session may then crash:

[  303.956011] BUG: unable to handle kernel NULL pointer dereference at 0000000000000318
[  303.959061] IP: [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[  303.959061] PGD 37811067 PUD 3d498067 PMD 0
[  303.959061] Oops: 0002 [#8] PREEMPT SMP
[  303.959061] Modules linked in: md5 nfsd auth_rpcgss nfs_acl snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc microcode psmouse snd_timer serio_raw pcspkr evdev snd soundcore i2c_piix4 i2c_core intel_agp intel_gtt processor button nfs lockd sunrpc fscache ata_generic pata_acpi ata_piix uhci_hcd libata btrfs usbcore usb_common crc32c scsi_mod libcrc32c zlib_deflate floppy virtio_balloon virtio_net virtio_pci virtio_blk virtio_ring virtio
[  303.959061] CPU 0
[  303.959061] Pid: 264, comm: nfsd Tainted: G      D      3.8.0-ARCH+ #156 Bochs Bochs
[  303.959061] RIP: 0010:[<ffffffff81481a8e>]  [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[  303.959061] RSP: 0018:ffff880037877dd8  EFLAGS: 00010202
[  303.959061] RAX: 0000000000000100 RBX: ffff880037a2b698 RCX: ffff88003d879278
[  303.959061] RDX: ffff88003d879278 RSI: dead000000100100 RDI: 0000000000000318
[  303.959061] RBP: ffff880037877dd8 R08: ffff88003c5a0f00 R09: 0000000000000002
[  303.959061] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  303.959061] R13: 0000000000000318 R14: ffff880037a2b680 R15: ffff88003c1cbe00
[  303.959061] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[  303.959061] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  303.959061] CR2: 0000000000000318 CR3: 000000003d49c000 CR4: 00000000000006f0
[  303.959061] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  303.959061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  303.959061] Process nfsd (pid: 264, threadinfo ffff880037876000, task ffff88003c1fd0a0)
[  303.959061] Stack:
[  303.959061]  ffff880037877e08 ffffffffa03772ec ffff88003d879000 ffff88003d879278
[  303.959061]  ffff88003d879080 0000000000000000 ffff880037877e38 ffffffffa0222a1f
[  303.959061]  0000000000107ac0 ffff88003c22e000 ffff88003d879000 ffff88003c1cbe00
[  303.959061] Call Trace:
[  303.959061]  [<ffffffffa03772ec>] nfsd4_conn_lost+0x3c/0xa0 [nfsd]
[  303.959061]  [<ffffffffa0222a1f>] svc_delete_xprt+0x10f/0x180 [sunrpc]
[  303.959061]  [<ffffffffa0223d96>] svc_recv+0xe6/0x580 [sunrpc]
[  303.959061]  [<ffffffffa03587c5>] nfsd+0xb5/0x140 [nfsd]
[  303.959061]  [<ffffffffa0358710>] ? nfsd_destroy+0x90/0x90 [nfsd]
[  303.959061]  [<ffffffff8107ae00>] kthread+0xc0/0xd0
[  303.959061]  [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pte_at+0x50/0x100
[  303.959061]  [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
[  303.959061]  [<ffffffff814898ec>] ret_from_fork+0x7c/0xb0
[  303.959061]  [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
[  303.959061] Code: ff ff 5d c3 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 83 80 44 e0 ff ff 01 b8 00 01 00 00 <3e> 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f
[  303.959061] RIP  [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[  303.959061]  RSP <ffff880037877dd8>
[  303.959061] CR2: 0000000000000318
[  304.001218] ---[ end trace 2d809cd4a7931f5a ]---
[  304.001903] note: nfsd[264] exited with preempt_count 2

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: handle seqid-mutating open errors from xdr decoding
J. Bruce Fields [Thu, 28 Feb 2013 20:51:49 +0000 (12:51 -0800)]
nfsd4: handle seqid-mutating open errors from xdr decoding

If a client sets an owner (or group_owner or acl) attribute on open for
create, and the mapping of that owner to an id fails, then we return
BAD_OWNER.  But BAD_OWNER is a seqid-mutating error, so we can't
shortcut the open processing that case: we have to at least look up the
owner so we can find the seqid to bump.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd4: remove BUG_ON
J. Bruce Fields [Thu, 28 Feb 2013 19:55:46 +0000 (11:55 -0800)]
nfsd4: remove BUG_ON

This BUG_ON just crashes the thread a little earlier than it would
otherwise--it doesn't seem useful.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd: scale up the number of DRC hash buckets with cache size
Jeff Layton [Wed, 27 Mar 2013 14:15:39 +0000 (10:15 -0400)]
nfsd: scale up the number of DRC hash buckets with cache size

We've now increased the size of the duplicate reply cache by quite a
bit, but the number of hash buckets has not changed. So, we've gone from
an average hash chain length of 16 in the old code to 4096 when the
cache is its largest. Change the code to scale out the number of buckets
with the max size of the cache.

At the same time, we also need to fix the hash function since the
existing one isn't really suitable when there are more than 256 buckets.
Move instead to use the stock hash_32 function for this. Testing on a
machine that had 2048 buckets showed that this gave a smaller
longest:average ratio than the existing hash function:

The formula here is longest hash bucket searched divided by average
number of entries per bucket at the time that we saw that longest
bucket:

    old hash: 68/(39258/2048) == 3.547404
    hash_32:  45/(33773/2048) == 2.728807

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd: keep stats on worst hash balancing seen so far
Jeff Layton [Wed, 27 Mar 2013 14:15:39 +0000 (10:15 -0400)]
nfsd: keep stats on worst hash balancing seen so far

The typical case with the DRC is a cache miss, so if we keep track of
the max number of entries that we've ever walked over in a search, then
we should have a reasonable estimate of the longest hash chain that
we've ever seen.

With that, we'll also keep track of the total size of the cache when we
see the longest chain. In the case of a tie, we prefer to track the
smallest total cache size in order to properly gauge the worst-case
ratio of max vs. avg chain length.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd: add new reply_cache_stats file in nfsdfs
Jeff Layton [Wed, 27 Mar 2013 14:15:38 +0000 (10:15 -0400)]
nfsd: add new reply_cache_stats file in nfsdfs

For presenting statistics relating to duplicate reply cache.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd: track memory utilization by the DRC
Jeff Layton [Wed, 27 Mar 2013 14:15:38 +0000 (10:15 -0400)]
nfsd: track memory utilization by the DRC

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd: break out comparator into separate function
Jeff Layton [Wed, 27 Mar 2013 14:15:37 +0000 (10:15 -0400)]
nfsd: break out comparator into separate function

Break out the function that compares the rqstp and checksum against a
reply cache entry. While we're at it, track the efficacy of the checksum
over the NFS data by tracking the cases where we would have incorrectly
matched a DRC entry if we had not tracked it or the length.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agonfsd: eliminate one of the DRC cache searches
Jeff Layton [Wed, 27 Mar 2013 14:15:37 +0000 (10:15 -0400)]
nfsd: eliminate one of the DRC cache searches

The most common case is to do a search of the cache, followed by an
insert. In the case where we have to allocate an entry off the slab,
then we end up having to redo the search, which is wasteful.

Better optimize the code for the common case by eliminating the initial
search of the cache and always preallocating an entry. In the case of a
cache hit, we'll end up just freeing that entry but that's preferable to
an extra search.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoNFSv4: Fix Oopses in the fs_locations code
Trond Myklebust [Wed, 27 Mar 2013 15:54:45 +0000 (11:54 -0400)]
NFSv4: Fix Oopses in the fs_locations code

If the server sends us a pathname with more components than the client
limit of NFS4_PATHNAME_MAXCOMPONENTS, more server entries than the client
limit of NFS4_FS_LOCATION_MAXSERVERS, or sends a total number of
fs_locations entries than the client limit of NFS4_FS_LOCATIONS_MAXENTRIES
then we will currently Oops because the limit checks are done _after_ we've
decoded the data into the arrays.

Reported-by: fanchaoting<fanchaoting@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Fix another reboot recovery race
Trond Myklebust [Thu, 28 Mar 2013 18:01:33 +0000 (14:01 -0400)]
NFSv4: Fix another reboot recovery race

If the open_context for the file is not yet fully initialised,
then open recovery cannot succeed, and since nfs4_state_find_open_context
returns an ENOENT, we end up treating the file as being irrecoverable.

What we really want to do, is just defer the recovery until later.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Add a mapping for NFS4ERR_FILE_OPEN in nfs4_map_errors
Trond Myklebust [Sat, 23 Mar 2013 19:22:45 +0000 (15:22 -0400)]
NFSv4: Add a mapping for NFS4ERR_FILE_OPEN in nfs4_map_errors

With unlink is an asynchronous operation in the sillyrename case, it
expects nfs4_async_handle_error() to map the error correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agonfsd4: reject "negative" acl lengths
J. Bruce Fields [Tue, 26 Mar 2013 18:11:13 +0000 (14:11 -0400)]
nfsd4: reject "negative" acl lengths

Since we only enforce an upper bound, not a lower bound, a "negative"
length can get through here.

The symptom seen was a warning when we attempt to a kmalloc with an
excessive size.

Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoNFSv4.1: Use CLAIM_DELEG_CUR_FH opens when available
Trond Myklebust [Mon, 18 Mar 2013 16:50:59 +0000 (12:50 -0400)]
NFSv4.1: Use CLAIM_DELEG_CUR_FH opens when available

Now that we do CLAIM_FH opens, we may run into situations where we
get a delegation but don't have perfect knowledge of the file path.
When returning the delegation, we might therefore not be able to
us CLAIM_DELEGATE_CUR opens to convert the delegation into OPEN
stateids and locks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4.1: Enable open-by-filehandle
Trond Myklebust [Fri, 15 Mar 2013 20:44:28 +0000 (16:44 -0400)]
NFSv4.1: Enable open-by-filehandle

Sometimes, we actually _want_ to do open-by-filehandle, for instance
when recovering opens after a network partition, or when called
from nfs4_file_open.
Enable that functionality using a new capability NFS_CAP_ATOMIC_OPEN_V1,
and which is only enabled for NFSv4.1 servers that support it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4.1: Add xdr support for CLAIM_FH and CLAIM_DELEG_CUR_FH opens
Trond Myklebust [Fri, 15 Mar 2013 19:39:06 +0000 (15:39 -0400)]
NFSv4.1: Add xdr support for CLAIM_FH and CLAIM_DELEG_CUR_FH opens

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Clean up nfs4_opendata_alloc in preparation for NFSv4.1 open modes
Trond Myklebust [Fri, 15 Mar 2013 18:57:33 +0000 (14:57 -0400)]
NFSv4: Clean up nfs4_opendata_alloc in preparation for NFSv4.1 open modes

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4.1: Select the "most recent locking state" for read/write/setattr stateids
Trond Myklebust [Sun, 17 Mar 2013 19:31:15 +0000 (15:31 -0400)]
NFSv4.1: Select the "most recent locking state" for read/write/setattr stateids

Follow the practice described in section 8.2.2 of RFC5661: When sending a
read/write or setattr stateid, set the seqid field to zero in order to
signal that the NFS server should apply the most recent locking state.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Prepare for minorversion-specific nfs_server capabilities
Trond Myklebust [Fri, 15 Mar 2013 20:11:57 +0000 (16:11 -0400)]
NFSv4: Prepare for minorversion-specific nfs_server capabilities

Clean up the setting of the nfs_server->caps, by shoving it all
into nfs4_server_common_setup().
Then add an 'initial capabilities' field into struct nfs4_minor_version_ops.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Resend the READ/WRITE RPC call if a stateid change causes an error
Trond Myklebust [Sun, 17 Mar 2013 00:54:34 +0000 (20:54 -0400)]
NFSv4: Resend the READ/WRITE RPC call if a stateid change causes an error

Adds logic to ensure that if the server returns a BAD_STATEID,
or other state related error, then we check if the stateid has
already changed. If it has, then rather than start state recovery,
we should just resend the failed RPC call with the new stateid.

Allow nfs4_select_rw_stateid to notify that the stateid is unstable by
having it return -EWOULDBLOCK if an RPC is underway that might change the
stateid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: The stateid must remain the same for replayed RPC calls
Trond Myklebust [Sun, 17 Mar 2013 19:52:00 +0000 (15:52 -0400)]
NFSv4: The stateid must remain the same for replayed RPC calls

If we replay a READ or WRITE call, we should not be changing the
stateid. Currently, we may end up doing so, because the stateid
is only selected at xdr encode time.

This patch ensures that we select the stateid after we get an NFSv4.1
session slot, and that we keep that same stateid across retries.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: __nfs_find_lock_context needs to check ctx->lock_context for a match too
Trond Myklebust [Fri, 15 Mar 2013 22:11:31 +0000 (18:11 -0400)]
NFS: __nfs_find_lock_context needs to check ctx->lock_context for a match too

Currently, we're forcing an unnecessary duplication of the
initial nfs_lock_context in calls to nfs_get_lock_context, since
__nfs_find_lock_context ignores the ctx->lock_context.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Don't accept more reads/writes if the open context recovery failed
Trond Myklebust [Mon, 18 Mar 2013 23:45:14 +0000 (19:45 -0400)]
NFS: Don't accept more reads/writes if the open context recovery failed

If the state recovery failed, we want to ensure that the application
doesn't try to use the same file descriptor for more reads or writes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4: Fail I/O if the state recovery fails irrevocably
Trond Myklebust [Thu, 14 Mar 2013 20:57:48 +0000 (16:57 -0400)]
NFSv4: Fail I/O if the state recovery fails irrevocably

If state recovery fails with an ESTALE or a ENOENT, then we shouldn't
keep retrying. Instead, mark the stateid as being invalid and
fail the I/O with an EIO error.
For other operations such as POSIX and BSD file locking, truncate
etc, fail with an EBADF to indicate that this file descriptor is no
longer valid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoSUNRPC: Report network/connection errors correctly for SOFTCONN rpc tasks
Trond Myklebust [Mon, 4 Mar 2013 22:29:33 +0000 (17:29 -0500)]
SUNRPC: Report network/connection errors correctly for SOFTCONN rpc tasks

In the case of a SOFTCONN rpc task, we really want to ensure that it
reports errors like ENETUNREACH back to the caller. Currently, only
some of these errors are being reported back (connect errors are not),
and they are being converted by the RPC layer into EIO.

Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoSUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked
Trond Myklebust [Mon, 25 Mar 2013 15:23:40 +0000 (11:23 -0400)]
SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked

We need to be careful when testing task->tk_waitqueue in
rpc_wake_up_task_queue_locked, because it can be changed while we
are holding the queue->lock.
By adding appropriate memory barriers, we can ensure that it is safe to
test task->tk_waitqueue for equality if the RPC_TASK_QUEUED bit is set.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
11 years agonfsd: fix bad offset use
Kent Overstreet [Fri, 22 Mar 2013 18:18:24 +0000 (11:18 -0700)]
nfsd: fix bad offset use

vfs_writev() updates the offset argument - but the code then passes the
offset to vfs_fsync_range(). Since offset now points to the offset after
what was just written, this is probably not what was intended

Introduced by face15025ffdf664de95e86ae831544154d26c9c "nfsd: use
vfs_fsync_range(), not O_SYNC, for stable writes".

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Reviewed-by: Zach Brown <zab@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11 years agoNFSv4.1: Add a helper pnfs_commit_and_return_layout
Trond Myklebust [Wed, 20 Mar 2013 17:23:33 +0000 (13:23 -0400)]
NFSv4.1: Add a helper pnfs_commit_and_return_layout

In order to be able to safely return the layout in nfs4_proc_setattr,
we need to block new uses of the layout, wait for all outstanding
users of the layout to complete, commit the layout and then return it.

This patch adds a helper in order to do all this safely.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
11 years agoNFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn
Trond Myklebust [Wed, 20 Mar 2013 17:03:00 +0000 (13:03 -0400)]
NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn

Note that clearing NFS_INO_LAYOUTCOMMIT is tricky, since it requires
you to also clear the NFS_LSEG_LAYOUTCOMMIT bits from the layout
segments.
The only two sites that need to do this are the ones that call
pnfs_return_layout() without first doing a layout commit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@vger.kernel.org
11 years agoNFSv4.1: Fix a race in pNFS layoutcommit
Trond Myklebust [Wed, 20 Mar 2013 16:34:32 +0000 (12:34 -0400)]
NFSv4.1: Fix a race in pNFS layoutcommit

We need to clear the NFS_LSEG_LAYOUTCOMMIT bits atomically with the
NFS_INO_LAYOUTCOMMIT bit, otherwise we may end up with situations
where the two are out of sync.
The first half of the problem is to ensure that pnfs_layoutcommit_inode
clears the NFS_LSEG_LAYOUTCOMMIT bit through pnfs_list_write_lseg.
We still need to keep the reference to those segments until the RPC call
is finished, so in order to make it clear _where_ those references come
from, we add a helper pnfs_list_write_lseg_done() that cleans up after
pnfs_list_write_lseg.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@vger.kernel.org