firefly-linux-kernel-4.4.55.git
10 years agonfsd: Add reference counting to the lock and open stateids
Trond Myklebust [Wed, 30 Jul 2014 01:34:06 +0000 (21:34 -0400)]
nfsd: Add reference counting to the lock and open stateids

When we remove the client_mutex, we'll need to be able to ensure that
these objects aren't destroyed while we're not holding locks.

Add a ->free() callback to the struct nfs4_stid, so that we can
release a reference to the stid without caring about the contents.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: print status when nfsd4_open fails to open file it just created
Jeff Layton [Wed, 30 Jul 2014 01:37:44 +0000 (21:37 -0400)]
nfsd: print status when nfsd4_open fails to open file it just created

It's possible for nfsd to fail opening a file that it has just created.
When that happens, we throw a WARN but it doesn't include any info about
the error code. Print the status code to give us a bit more info.

Our QA group hit some of these warnings under some very heavy stress
testing. My suspicion is that they hit the file-max limit, but it's hard
to know for sure. Go ahead and add a -ENFILE mapping to
nfserr_serverfault to make the error more distinct (and correct).

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoSUNRPC: Allow svc_reserve() to notify TCP socket that space has been freed
Trond Myklebust [Fri, 25 Jul 2014 03:59:33 +0000 (23:59 -0400)]
SUNRPC: Allow svc_reserve() to notify TCP socket that space has been freed

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoSUNRPC: svc_tcp_write_space: don't clear SOCK_NOSPACE prematurely
Trond Myklebust [Fri, 25 Jul 2014 03:59:32 +0000 (23:59 -0400)]
SUNRPC: svc_tcp_write_space: don't clear SOCK_NOSPACE prematurely

If requests are queued in the socket inbuffer waiting for an
svc_tcp_has_wspace() requirement to be satisfied, then we do not want
to clear the SOCK_NOSPACE flag until we've satisfied that requirement.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoSUNRPC: Reduce contention in svc_xprt_enqueue()
Trond Myklebust [Fri, 25 Jul 2014 03:59:31 +0000 (23:59 -0400)]
SUNRPC: Reduce contention in svc_xprt_enqueue()

Ensure that all calls to svc_xprt_enqueue() except svc_xprt_received()
check the value of XPT_BUSY, before attempting to grab spinlocks etc.
This is to avoid situations such as the following "perf" trace,
which shows heavy contention on the pool spinlock:

    54.15%            nfsd  [kernel.kallsyms]        [k] _raw_spin_lock_bh
                      |
                      --- _raw_spin_lock_bh
                         |
                         |--71.43%-- svc_xprt_enqueue
                         |          |
                         |          |--50.31%-- svc_reserve
                         |          |
                         |          |--31.35%-- svc_xprt_received
                         |          |
                         |          |--18.34%-- svc_tcp_data_ready
...

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: remove dl_fh field from struct nfs4_delegation
Jeff Layton [Fri, 25 Jul 2014 11:34:27 +0000 (07:34 -0400)]
nfsd: remove dl_fh field from struct nfs4_delegation

Now that the nfs4_file has a filehandle in it, we no longer need to
keep a per-delegation copy of it. Switch to using the one in the
nfs4_file instead.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: give block_delegation and delegation_blocked its own spinlock
Jeff Layton [Fri, 25 Jul 2014 11:34:26 +0000 (07:34 -0400)]
nfsd: give block_delegation and delegation_blocked its own spinlock

The state lock can be fairly heavily contended, and there's no reason
that nfs4_file lookups and delegation_blocked should be mutually
exclusive.  Let's give the new block_delegation code its own spinlock.
It does mean that we'll need to take a different lock in the delegation
break code, but that's not generally as critical to performance.

Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: clean up nfs4_set_delegation
Jeff Layton [Fri, 25 Jul 2014 11:34:25 +0000 (07:34 -0400)]
nfsd: clean up nfs4_set_delegation

Move the alloc_init_deleg call into nfs4_set_delegation and change the
function to return a pointer to the delegation or an IS_ERR return. This
allows us to skip allocating a delegation if the file has already
experienced a lease conflict.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: clean up arguments to nfs4_open_delegation
Jeff Layton [Fri, 25 Jul 2014 11:34:24 +0000 (07:34 -0400)]
nfsd: clean up arguments to nfs4_open_delegation

No need to pass in a net pointer since we can derive that.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: drop unused stp arg to alloc_init_deleg
Jeff Layton [Fri, 25 Jul 2014 11:34:23 +0000 (07:34 -0400)]
nfsd: drop unused stp arg to alloc_init_deleg

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Convert delegation counter to an atomic_long_t type
Trond Myklebust [Fri, 25 Jul 2014 11:34:22 +0000 (07:34 -0400)]
nfsd: Convert delegation counter to an atomic_long_t type

We want to convert to an atomic type so that we don't need to lock
across the call to alloc_init_deleg(). Then convert to a long type so
that we match the size of 'max_delegations'.

None of this is a problem today, but it will be once we remove
client_mutex protection.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock
Jeff Layton [Fri, 25 Jul 2014 11:34:21 +0000 (07:34 -0400)]
nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock

Currently, both destroy_revoked_delegation and revoke_delegation
manipulate the cl_revoked list without any locking aside from the
client_mutex. Ensure that the clp->cl_lock is held when manipulating it,
except for the list walking in destroy_client. At that point, the client
should no longer be in use, and so it should be safe to walk the list
without any locking. That also means that we don't need to do the
list_splice_init there either.

Also, the fact that revoke_delegation deletes dl_recall_lru list_head
without any locking makes it difficult to know whether it's doing so
safely in all cases. Move the list_del_init calls into the callers, and
add a WARN_ON in the event that t's passed a delegation that has a
non-empty list_head.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: fully unhash delegations when revoking them
Jeff Layton [Fri, 25 Jul 2014 11:34:20 +0000 (07:34 -0400)]
nfsd: fully unhash delegations when revoking them

Ensure that the delegations cannot be found by the laundromat etc once
we add them to the various 'revoke' lists.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: simplify stateid allocation and file handling
Trond Myklebust [Fri, 25 Jul 2014 11:34:19 +0000 (07:34 -0400)]
nfsd: simplify stateid allocation and file handling

Don't allow stateids to clear the open file pointer until they are
being destroyed. In a later patches we'll want to rely on the fact that
we have a valid file pointer when dealing with the stateid and this
will save us from having to do a lot of NULL pointer checks before
doing so.

Also, move to allocating stateids with kzalloc and get rid of the
explicit zeroing of fields.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Do not let nfs4_file pin the struct inode
Jeff Layton [Wed, 23 Jul 2014 20:17:41 +0000 (16:17 -0400)]
nfsd: Do not let nfs4_file pin the struct inode

Remove the fi_inode field in struct nfs4_file in order to remove the
possibility of struct nfs4_file pinning the inode when it does not have
any open state.

The only place we still need to get to an inode is in check_for_locks,
so change it to use find_any_file and use the inode from any that it
finds. If it doesn't find one, then just assume there aren't any.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: nfs4_check_fh - make it actually check the filehandle
Trond Myklebust [Wed, 23 Jul 2014 20:17:40 +0000 (16:17 -0400)]
nfsd: nfs4_check_fh - make it actually check the filehandle

...instead of just checking the inode that corresponds to it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Use the filehandle to look up the struct nfs4_file instead of inode
Trond Myklebust [Wed, 23 Jul 2014 20:17:39 +0000 (16:17 -0400)]
nfsd: Use the filehandle to look up the struct nfs4_file instead of inode

This makes more sense anyway since an inode pointer value can change
even when the filehandle doesn't.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Store the filehandle with the struct nfs4_file
Trond Myklebust [Wed, 23 Jul 2014 20:17:38 +0000 (16:17 -0400)]
nfsd: Store the filehandle with the struct nfs4_file

For use when we may not have a struct inode.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: convert comma to semicolon
Himangi Saraogi [Wed, 23 Jul 2014 14:42:31 +0000 (20:12 +0530)]
nfsd4: convert comma to semicolon

Replace a comma between expression statements by a semicolon. This changes
the semantics of the code, but given the current indentation appears to be
what is intended.

A simplified version of the Coccinelle semantic patch that performs this
transformation is as follows:
// <smpl>
@r@
expression e1,e2;
@@

 e1
-,
+;
 e2;
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agosvcrdma: Double the default credit limit
Chuck Lever [Tue, 22 Jul 2014 21:48:04 +0000 (17:48 -0400)]
svcrdma: Double the default credit limit

The RDMA credit limit controls how many concurrent RPCs are allowed
per connection.

An NFS/RDMA client and server exchange their credit limits in the
RPC/RDMA headers. The Linux client and the Solaris client and server
allow 32 credits. The Linux server allows only 16, which limits its
performance.

Set the server's default credit limit to 32, like the other well-
known implementations, so the out-of-the-shrinkwrap performance of
the Linux server is better.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: ensure that st_access_bmap and st_deny_bmap are initialized to 0
Jeff Layton [Wed, 23 Jul 2014 17:46:49 +0000 (13:46 -0400)]
nfsd: ensure that st_access_bmap and st_deny_bmap are initialized to 0

Open stateids must be initialized with the st_access_bmap and
st_deny_bmap set to 0, so that nfs4_get_vfs_file can properly record
their state in old_access_bmap and old_deny_bmap.

This bug was introduced in commit baeb4ff0e502 (nfsd: make deny mode
enforcement more efficient and close races in it) and was causing the
refcounts to end up incorrect when nfs4_get_vfs_file returned an error
after bumping the refcounts. This made it impossible to unmount the
underlying filesystem after running pynfs tests that involve deny modes.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agosvcrdma: Add zero padding if the client doesn't send it
Chuck Lever [Tue, 22 Jul 2014 20:00:40 +0000 (16:00 -0400)]
svcrdma: Add zero padding if the client doesn't send it

See RFC 5666 section 3.7: clients don't have to send zero XDR
padding.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=246
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: bump dl_time when unhashing delegation
Jeff Layton [Tue, 22 Jul 2014 17:52:06 +0000 (13:52 -0400)]
nfsd: bump dl_time when unhashing delegation

There's a potential race between a lease break and DELEGRETURN call.

Suppose a lease break comes in and queues the workqueue job for a
delegation, but it doesn't run just yet. Then, a DELEGRETURN comes in
finds the delegation and calls destroy_delegation on it to unhash it and
put its primary reference.

Next, the workqueue job runs and queues the delegation back onto the
del_recall_lru list, issues the CB_RECALL and puts the final reference.
With that, the final reference to the delegation is put, but it's still
on the LRU list.

When we go to unhash a delegation, it's because we intend to get rid of
it soon afterward, so we don't want lease breaks to mess with it once
that occurs. Fix this by bumping the dl_time whenever we unhash a
delegation, to ensure that lease breaks don't monkey with it.

I believe this is a regression due to commit 02e1215f9f7 (nfsd: Avoid
taking state_lock while holding inode lock in nfsd_break_one_deleg).
Prior to that, the state_lock was held in the lm_break callback itself,
and that would have prevented this race.

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Move the delegation reference counter into the struct nfs4_stid
Trond Myklebust [Mon, 21 Jul 2014 13:34:58 +0000 (09:34 -0400)]
nfsd: Move the delegation reference counter into the struct nfs4_stid

We will want to add reference counting to the lock stateid and open
stateids too in later patches.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: fix race that grants unrecallable delegation
Jeff Layton [Mon, 21 Jul 2014 13:34:57 +0000 (09:34 -0400)]
nfsd: fix race that grants unrecallable delegation

If nfs4_setlease succesfully acquires a new delegation, then another
task breaks the delegation before we reach hash_delegation_locked, then
the breaking task will see an empty fi_delegations list and do nothing.
The client will receive an open reply incorrectly granting a delegation
and will never receive a recall.

Move more of the delegation fields to be protected by the fi_lock. It's
more granular than the state_lock and in later patches we'll want to
be able to rely on it in addition to the state_lock.

Attempt to acquire a delegation. If that succeeds, take the spinlocks
and then check to see if the file has had a conflict show up since then.
If it has, then we assume that the lease is no longer valid and that
we shouldn't hand out a delegation.

There's also one more potential (but very unlikely) problem. If the
lease is broken before the delegation is hashed, then it could leak.
In the event that the fi_delegations list is empty, reset the
fl_break_time to jiffies so that it's cleaned up ASAP by
the normal lease handling code.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: CREATE_SESSION should update backchannel immediately
J. Bruce Fields [Fri, 18 Jul 2014 19:06:47 +0000 (15:06 -0400)]
nfsd4: CREATE_SESSION should update backchannel immediately

nfsd4_probe_callback kicks off some work that will eventually run
nfsd4_process_cb_update and update the session flags.  In theory we
could process a following SEQUENCE call before that update happens
resulting in flags that don't accurately represent, for example, the
lack of a backchannel.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoSUNRPC: xdr_get_next_encode_buffer should be declared static
Trond Myklebust [Sat, 12 Jul 2014 22:01:02 +0000 (18:01 -0400)]
SUNRPC: xdr_get_next_encode_buffer should be declared static

Quell another sparse warning.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agosvcrdma: Select NFSv4.1 backchannel transport based on forward channel
Chuck Lever [Wed, 16 Jul 2014 19:38:32 +0000 (15:38 -0400)]
svcrdma: Select NFSv4.1 backchannel transport based on forward channel

The current code always selects XPRT_TRANSPORT_BC_TCP for the back
channel, even when the forward channel was not TCP (eg, RDMA). When
a 4.1 mount is attempted with RDMA, the server panics in the TCP BC
code when trying to send CB_NULL.

Instead, construct the transport protocol number from the forward
channel transport or'd with XPRT_TRANSPORT_BC. Transports that do
not support bi-directional RPC will not have registered a "BC"
transport, causing create_backchannel_client() to fail immediately.

Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: zero op arguments beyond the 8th compound op
J. Bruce Fields [Thu, 17 Jul 2014 20:20:39 +0000 (16:20 -0400)]
nfsd4: zero op arguments beyond the 8th compound op

The first 8 ops of the compound are zeroed since they're a part of the
argument that's zeroed by the

memset(rqstp->rq_argp, 0, procp->pc_argsize);

in svc_process_common().  But we handle larger compounds by allocating
the memory on the fly in nfsd4_decode_compound().  Other than code
recently fixed by 01529e3f8179 "NFSD: Fix memory leak in encoding denied
lock", I don't know of any examples of code depending on this
initialization. But it definitely seems possible, and I'd rather be
safe.

Compounds this long are unusual so I'm much more worried about failure
in this poorly tested cases than about an insignificant performance hit.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: silence sparse warning about accessing credentials
Jeff Layton [Tue, 15 Jul 2014 16:59:36 +0000 (12:59 -0400)]
nfsd: silence sparse warning about accessing credentials

sparse says:

    fs/nfsd/auth.c:31:38: warning: incorrect type in argument 1 (different address spaces)
    fs/nfsd/auth.c:31:38:    expected struct cred const *cred
    fs/nfsd/auth.c:31:38:    got struct cred const [noderef] <asn:4>*real_cred

Add a new accessor for the ->real_cred and use that to fetch the
pointer. Accessing current->real_cred directly is actually quite safe
since we know that they can't go away so this is mostly a cosmetic fixup
to silence sparse.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Ensure stateids remain unique until they are freed
Trond Myklebust [Wed, 16 Jul 2014 14:31:59 +0000 (10:31 -0400)]
nfsd: Ensure stateids remain unique until they are freed

Add an extra delegation state to allow the stateid to remain in the idr
tree until the last reference has been released. This will be necessary
to ensure uniqueness once the client_mutex is removed.

[jlayton: reset the sc_type under the state_lock in unhash_delegation]

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: nfs4_alloc_init_lease should take a nfs4_file arg
Jeff Layton [Wed, 16 Jul 2014 14:31:58 +0000 (10:31 -0400)]
nfsd: nfs4_alloc_init_lease should take a nfs4_file arg

No need to pass the delegation pointer in here as it's only used to get
the nfs4_file pointer.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Avoid taking state_lock while holding inode lock in nfsd_break_one_deleg
Jeff Layton [Wed, 16 Jul 2014 14:31:57 +0000 (10:31 -0400)]
nfsd: Avoid taking state_lock while holding inode lock in nfsd_break_one_deleg

state_lock is a heavily contended global lock. We don't want to grab
that while simultaneously holding the inode->i_lock.

Add a new per-nfs4_file lock that we can use to protect the
per-nfs4_file delegation list. Hold that while walking the list in the
break_deleg callback and queue the workqueue job for each one.

The workqueue job can then take the state_lock and do the list
manipulations without the i_lock being held prior to starting the
rpc call.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: eliminate nfsd4_init_callback
Jeff Layton [Wed, 16 Jul 2014 14:31:56 +0000 (10:31 -0400)]
nfsd: eliminate nfsd4_init_callback

It's just an obfuscated INIT_WORK call. Just make the work_func_t a
non-static symbol and use a normal INIT_WORK call.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: Fix bad checking of space for padding in splice read
Kinglong Mee [Wed, 9 Jul 2014 13:51:27 +0000 (21:51 +0800)]
NFSD: Fix bad checking of space for padding in splice read

Note that the caller has already reserved space for count and eof, so
xdr->p has already moved past them, only the padding remains.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes dc97618ddd (nfsd4: separate splice and readv cases)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: Check acl returned from get_acl/posix_acl_from_mode
Kinglong Mee [Wed, 9 Jul 2014 13:54:16 +0000 (21:54 +0800)]
NFSD: Check acl returned from get_acl/posix_acl_from_mode

Commit 4ac7249ea5 (nfsd: use get_acl and ->set_acl)
don't check the acl returned from get_acl()/posix_acl_from_mode().

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agosvcrdma: send_write() must not overflow the device's max sge
Steve Wise [Wed, 9 Jul 2014 18:49:15 +0000 (13:49 -0500)]
svcrdma: send_write() must not overflow the device's max sge

Function send_write() must stop creating sges when it reaches the device
max and return the amount sent in the RDMA Write to the caller.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: cleanup and rename nfs4_check_open
Jeff Layton [Thu, 10 Jul 2014 18:07:35 +0000 (14:07 -0400)]
nfsd: cleanup and rename nfs4_check_open

Rename it to better describe what it does, and have it just return the
stateid instead of a __be32 (which is now always nfs_ok). Also, do the
search for an existing stateid after the delegation check, to reduce
cleanup if the delegation check returns error.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: make deny mode enforcement more efficient and close races in it
Jeff Layton [Thu, 10 Jul 2014 18:07:34 +0000 (14:07 -0400)]
nfsd: make deny mode enforcement more efficient and close races in it

The current enforcement of deny modes is both inefficient and scattered
across several places, which makes it hard to guarantee atomicity. The
inefficiency is a problem now, and the lack of atomicity will mean races
once the client_mutex is removed.

First, we address the inefficiency. We have to track deny modes on a
per-stateid basis to ensure that open downgrades are sane, but when the
server goes to enforce them it has to walk the entire list of stateids
and check against each one.

Instead of doing that, maintain a per-nfs4_file deny mode. When a file
is opened, we simply set any deny bits in that mode that were specified
in the OPEN call. We can then use that unified deny mode to do a simple
check to see whether there are any conflicts without needing to walk the
entire stateid list.

The only time we'll need to walk the entire list of stateids is when a
stateid that has a deny mode on it is being released, or one is having
its deny mode downgraded. In that case, we must walk the entire list and
recalculate the fi_share_deny field. Since deny modes are pretty rare
today, this should be very rare under normal workloads.

To address the potential for races once the client_mutex is removed,
protect fi_share_deny with the fi_lock. In nfs4_get_vfs_file, check to
make sure that any deny mode we want to apply won't conflict with
existing access. If that's ok, then have nfs4_file_get_access check that
new access to the file won't conflict with existing deny modes.

If that also passes, then get file access references, set the correct
access and deny bits in the stateid, and update the fi_share_deny field.
If opening the file or truncating it fails, then unwind the whole mess
and return the appropriate error.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: always hold the fi_lock when bumping fi_access refcounts
Jeff Layton [Thu, 10 Jul 2014 18:07:33 +0000 (14:07 -0400)]
nfsd: always hold the fi_lock when bumping fi_access refcounts

Once we remove the client_mutex, there's an unlikely but possible race
that could occur. It will be possible for nfs4_file_put_access to race
with nfs4_file_get_access. The refcount will go to zero (briefly) and
then bumped back to one. If that happens we set ourselves up for a
use-after-free and the potential for a lock to race onto the i_flock
list as a filp is being torn down.

Ensure that we can safely bump the refcount on the file by holding the
fi_lock whenever that's done. The only place it currently isn't is in
get_lock_access.

In order to ensure atomicity with finding the file, use the
find_*_file_locked variants and then call get_lock_access to get new
access references on the nfs4_file under the same lock.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: clean up reset_union_bmap_deny
Jeff Layton [Thu, 10 Jul 2014 18:07:32 +0000 (14:07 -0400)]
nfsd: clean up reset_union_bmap_deny

Fix the "deny" argument type, and start the loop at 1. The 0 iteration
is always a noop.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: set stateid access and deny bits in nfs4_get_vfs_file
Jeff Layton [Thu, 10 Jul 2014 18:07:31 +0000 (14:07 -0400)]
nfsd: set stateid access and deny bits in nfs4_get_vfs_file

Cleanup -- ensure that the stateid bits are set at the same time that
the file access refcounts are incremented. Keeping them coherent like
this makes it easier to ensure that we account for all of the
references.

Since the initialization of the st_*_bmap fields is done when it's
hashed, we go ahead and hash the stateid before getting access to the
file and unhash it if that function returns error. This will be
necessary anyway in a follow-on patch that will overhaul deny mode
handling.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: shrink st_access_bmap and st_deny_bmap
Jeff Layton [Thu, 10 Jul 2014 18:07:30 +0000 (14:07 -0400)]
nfsd: shrink st_access_bmap and st_deny_bmap

We never use anything above bit #3, so an unsigned long for each is
wasteful. Shrink them to a char each, and add some WARN_ON_ONCE calls if
we try to set or clear bits that would go outside those sizes.

Note too that because atomic bitops work on unsigned longs, we have to
abandon their use here. That shouldn't be a problem though since we
don't really care about the atomicity in this code anyway. Using them
was just a convenient way to flip bits.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: remove nfs4_file_put_fd
Jeff Layton [Thu, 10 Jul 2014 18:07:29 +0000 (14:07 -0400)]
nfsd: remove nfs4_file_put_fd

...and replace it with a simple swap call.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: refactor nfs4_file_get_access and nfs4_file_put_access
Jeff Layton [Thu, 10 Jul 2014 18:07:28 +0000 (14:07 -0400)]
nfsd: refactor nfs4_file_get_access and nfs4_file_put_access

Have them take NFS4_SHARE_ACCESS_* flags instead of an open mode. This
spares the callers from having to convert it themselves.

This also allows us to simplify these functions as we no longer need
to do the access_to_omode conversion in either one.

Note too that this patch eliminates the WARN_ON in
__nfs4_file_get_access. It's valid for now, but in a later patch we'll
be bumping the refcounts prior to opening the file in order to close
some races, at which point we'll need to remove it anyway.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: clean up helper __release_lock_stateid
Trond Myklebust [Thu, 10 Jul 2014 18:07:27 +0000 (14:07 -0400)]
nfsd: clean up helper __release_lock_stateid

Use filp_close instead of open coding. filp_close does a bit more than
just release the locks and put the filp. It also calls ->flush and
dnotify_flush, both of which should be done here anyway.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Add locking to the nfs4_file->fi_fds[] array
Trond Myklebust [Thu, 10 Jul 2014 18:07:26 +0000 (14:07 -0400)]
nfsd: Add locking to the nfs4_file->fi_fds[] array

Preparation for removal of the client_mutex, which currently protects
this array. While we don't actually need the find_*_file_locked variants
just yet, a later patch will. So go ahead and add them now to reduce
future churn in this code.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Add fine grained protection for the nfs4_file->fi_stateids list
Trond Myklebust [Thu, 10 Jul 2014 18:07:25 +0000 (14:07 -0400)]
nfsd: Add fine grained protection for the nfs4_file->fi_stateids list

Access to this list is currently serialized by the client_mutex. Add
finer grained locking around this list in preparation for its removal.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: reduce some spinlocking in put_client_renew
Jeff Layton [Tue, 8 Jul 2014 18:02:50 +0000 (14:02 -0400)]
nfsd: reduce some spinlocking in put_client_renew

No need to take the lock unless the count goes to 0.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: close potential race between delegation break and laundromat
Jeff Layton [Tue, 8 Jul 2014 18:02:49 +0000 (14:02 -0400)]
nfsd: close potential race between delegation break and laundromat

Bruce says:

    There's also a preexisting expire_client/laundromat vs break race:

    - expire_client/laundromat adds a delegation to its local
      reaplist using the same dl_recall_lru field that a delegation
      uses to track its position on the recall lru and drops the
      state lock.

    - a concurrent break_lease adds the delegation to the lru.

    - expire/client/laundromat then walks it reaplist and sees the
      lru head as just another delegation on the list....

Fix this race by checking the dl_time under the state_lock. If we find
that it's not 0, then we know that it has already been queued to the LRU
list and that we shouldn't queue it again.

In the case of destroy_client, we must also ensure that we don't hit
similar races by ensuring that we don't move any delegations to the
reaplist with a dl_time of 0. Just bump the dl_time by one before we
drop the state_lock. We're destroying the delegations anyway, so a 1s
difference there won't matter.

The fault injection code also requires a bit of surgery here:

First, in the case of nfsd_forget_client_delegations, we must prevent
the same sort of race vs. the delegation break callback. For that, we
just increment the dl_time to ensure that a delegation callback can't
race in while we're working on it.

We can't do that for nfsd_recall_client_delegations, as we need to have
it actually queue the delegation, and that won't happen if we increment
the dl_time. The state lock is held over that function, so we don't need
to worry about these sorts of races there.

There is one other potential bug nfsd_recall_client_delegations though.
Entries on the victims list are not dequeued before calling
nfsd_break_one_deleg. That's a potential list corruptor, so ensure that
we do that there.

Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: Fix memory leak in encoding denied lock
Kinglong Mee [Mon, 7 Jul 2014 14:10:56 +0000 (22:10 +0800)]
NFSD: Fix memory leak in encoding denied lock

Commit 8c7424cff6 (nfsd4: don't try to encode conflicting owner if low on space)
forgot free conf->data in nfsd4_encode_lockt and before sign conf->data to NULL
in nfsd4_encode_lock_denied.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Convert nfs4_check_open_reclaim() to work with lookup_clientid()
Trond Myklebust [Mon, 30 Jun 2014 15:48:47 +0000 (11:48 -0400)]
nfsd: Convert nfs4_check_open_reclaim() to work with lookup_clientid()

lookup_clientid is preferable to find_confirmed_client since it's able
to use the cached client in the compound state.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Always use lookup_clientid() in nfsd4_process_open1
Trond Myklebust [Mon, 30 Jun 2014 15:48:46 +0000 (11:48 -0400)]
nfsd: Always use lookup_clientid() in nfsd4_process_open1

In later patches, we'll be moving the stateowner table into the
nfs4_client, and by doing this we ensure that we have a cached
nfs4_client pointer.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Convert nfsd4_process_open1() to work with lookup_clientid()
Trond Myklebust [Mon, 30 Jun 2014 15:48:45 +0000 (11:48 -0400)]
nfsd: Convert nfsd4_process_open1() to work with lookup_clientid()

...and have alloc_init_open_stateowner just use the cstate->clp pointer
instead of passing in a clp separately. This allows us to use the
cached nfs4_client pointer in the cstate instead of having to look it
up again.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Allow struct nfsd4_compound_state to cache the nfs4_client
Jeff Layton [Mon, 30 Jun 2014 15:48:44 +0000 (11:48 -0400)]
nfsd: Allow struct nfsd4_compound_state to cache the nfs4_client

We want to use the nfsd4_compound_state to cache the nfs4_client in
order to optimise away extra lookups of the clid.

In the v4.0 case, we use this to ensure that we only have to look up the
client at most once per compound for each call into lookup_clientid. For
v4.1+ we set the pointer in the cstate during SEQUENCE processing so we
should never need to do a search for it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: add a nfserrno mapping for -E2BIG to nfserr_fbig
Jeff Layton [Thu, 3 Jul 2014 19:15:54 +0000 (15:15 -0400)]
nfsd: add a nfserrno mapping for -E2BIG to nfserr_fbig

I saw this pop up with some pynfs testing:

    [  123.609992] nfsd: non-standard errno: -7

...and -7 is -E2BIG. I think what happened is that XFS returned -E2BIG
due to some xattr operations with the ACL10 pynfs TEST (I guess it has
limited xattr size?).

Add a better mapping for that error since it's possible that we'll need
it. How about we convert it to NFSERR_FBIG? As Bruce points out, they
both have "BIG" in the name so it must be good.

Also, turn the printk in this function into a WARN() so that we can get
a bit more information about situations that don't have proper mappings.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: properly convert return from commit_metadata to __be32
Jeff Layton [Thu, 3 Jul 2014 11:54:19 +0000 (07:54 -0400)]
nfsd: properly convert return from commit_metadata to __be32

Commit 2a7420c03e504 (nfsd: Ensure that nfsd_create_setattr commits
files to stable storage), added a couple of calls to commit_metadata,
but doesn't convert their return codes to __be32 in the appropriate
places.

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Cleanup - Let nfsd4_lookup_stateid() take a cstate argument
Trond Myklebust [Mon, 30 Jun 2014 15:48:43 +0000 (11:48 -0400)]
nfsd: Cleanup - Let nfsd4_lookup_stateid() take a cstate argument

The cstate already holds information about the session, and hence
the client id, so it makes more sense to pass that information
rather than the current practice of passing a 'minor version' number.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Don't get a session reference without a client reference
Trond Myklebust [Mon, 30 Jun 2014 15:48:42 +0000 (11:48 -0400)]
nfsd: Don't get a session reference without a client reference

If the client were to disappear from underneath us while we're holding
a session reference, things would be bad. This cleanup helps ensure
that it cannot, which will be a possibility when the client_mutex is
removed.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: clean up nfsd4_release_lockowner
Jeff Layton [Mon, 30 Jun 2014 15:48:41 +0000 (11:48 -0400)]
nfsd: clean up nfsd4_release_lockowner

Now that we know that we won't have several lockowners with the same,
owner->data, we can simplify nfsd4_release_lockowner and get rid of
the lo_list in the process.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: NFSv4 lock-owners are not associated to a specific file
Trond Myklebust [Mon, 30 Jun 2014 15:48:40 +0000 (11:48 -0400)]
nfsd: NFSv4 lock-owners are not associated to a specific file

Just like open-owners, lock-owners are associated with a name, a clientid
and, in the case of minor version 0, a sequence id. There is no association
to a file.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Allow lockowners to hold several stateids
Jeff Layton [Mon, 30 Jun 2014 15:48:39 +0000 (11:48 -0400)]
nfsd: Allow lockowners to hold several stateids

A lockowner can have more than one lock stateid. For instance, if a
process has more than one file open and has locks on both, then the same
lockowner has more than one stateid associated with it. Change it so
that this reality is better reflected by the objects that nfsd uses.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: lock owners are not per open stateid
Trond Myklebust [Mon, 30 Jun 2014 15:48:38 +0000 (11:48 -0400)]
nfsd: lock owners are not per open stateid

In the NFSv4 spec, lock stateids are per-file objects. Lockowners are not.
This patch replaces the current list of lock owners in the open stateids
with a list of lock stateids.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: clean up nfsd4_close_open_stateid
Trond Myklebust [Mon, 30 Jun 2014 15:48:37 +0000 (11:48 -0400)]
nfsd: clean up nfsd4_close_open_stateid

Minor cleanup that should introduce no behavioral changes.

Currently this function just unhashes the stateid and leaves the caller
to do the work of the CLOSE processing.

Change nfsd4_close_open_stateid so that it handles doing all of the work
of closing a stateid. Move the handling of the unhashed stateid into it
instead of doing that work in nfsd4_close. This will help isolate some
coming changes to stateid handling from nfsd4_close.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: declare v4.1+ openowners confirmed on creation
Jeff Layton [Mon, 30 Jun 2014 15:48:36 +0000 (11:48 -0400)]
nfsd: declare v4.1+ openowners confirmed on creation

There's no need to confirm an openowner in v4.1 and above, so we can
go ahead and set NFS4_OO_CONFIRMED when we create openowners in
those versions. This will also be necessary when we remove the
client_mutex, as it'll be possible for two concurrent opens to race
in versions >4.0.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Cleanup nfs4svc_encode_compoundres
Trond Myklebust [Mon, 30 Jun 2014 15:48:35 +0000 (11:48 -0400)]
nfsd: Cleanup nfs4svc_encode_compoundres

Move the slot return, put session etc into a helper in fs/nfsd/nfs4state.c
instead of open coding in nfs4svc_encode_compoundres.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: nfs4_preprocess_seqid_op should only set *stpp on success
Trond Myklebust [Mon, 30 Jun 2014 15:48:34 +0000 (11:48 -0400)]
nfsd: nfs4_preprocess_seqid_op should only set *stpp on success

Not technically a bugfix, since nothing tries to use the return pointer
if this function doesn't return success, but it could be a problem
with some coming changes.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: add a new /proc/fs/nfsd/max_connections file
Jeff Layton [Wed, 2 Jul 2014 20:11:22 +0000 (16:11 -0400)]
nfsd: add a new /proc/fs/nfsd/max_connections file

Currently, the maximum number of connections that nfsd will allow
is based on the number of threads spawned. While this is fine for a
default, there really isn't a clear relationship between the two.

The number of threads corresponds to the number of concurrent requests
that we want to allow the server to process at any given time. The
connection limit corresponds to the maximum number of clients that we
want to allow the server to handle. These are two entirely different
quantities.

Break the dependency on increasing threads in order to allow for more
connections, by adding a new per-net parameter that can be set to a
non-zero value. The default is still to base it on the number of threads,
so there should be no behavior change for anyone who doesn't use it.

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Ensure that nfsd_create_setattr commits files to stable storage
Trond Myklebust [Tue, 1 Jul 2014 22:27:53 +0000 (18:27 -0400)]
nfsd: Ensure that nfsd_create_setattr commits files to stable storage

Since nfsd_create_setattr strips the mode from the struct iattr, it
is quite possible that it will optimise away the call to nfsd_setattr
altogether.
If this is the case, then we never call commit_metadata() on the
newly created file.

Also ensure that both nfsd_setattr() and nfsd_create_setattr() fail
when the call to commit_metadata fails.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: Remove iattr parameter from nfsd_symlink()
Kinglong Mee [Tue, 1 Jul 2014 09:48:02 +0000 (17:48 +0800)]
NFSD: Remove iattr parameter from nfsd_symlink()

Commit db2e747b1499 (vfs: remove mode parameter from vfs_symlink())
have remove mode parameter from vfs_symlink.
So that, iattr isn't needed by nfsd_symlink now, just remove it.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Protect addition to the file_hashtbl
Trond Myklebust [Mon, 30 Jun 2014 15:48:31 +0000 (11:48 -0400)]
nfsd: Protect addition to the file_hashtbl

Current code depends on the client_mutex to guarantee a single struct
nfs4_file per inode in the file_hashtbl and make addition atomic with
respect to lookup.  Rely instead on the state_Lock, to make it easier to
stop taking the client_mutex here later.

To prevent an i_lock/state_lock inversion, change nfsd4_init_file to
use ihold instead if igrab. That's also more efficient anyway as we
definitely hold a reference to the inode at that point.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: fix file access refcount leak when nfsd4_truncate fails
Christoph Hellwig [Mon, 30 Jun 2014 15:48:30 +0000 (11:48 -0400)]
nfsd: fix file access refcount leak when nfsd4_truncate fails

nfsd4_process_open2 will currently will get access to the file, and then
call nfsd4_truncate to (possibly) truncate it. If that operation fails
though, then the access references will never be released as the
nfs4_ol_stateid is never initialized.

Fix by moving the nfsd4_truncate call into nfs4_get_vfs_file, ensuring
that the refcounts are properly put if the truncate fails.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: Avoid warning message when compile at i686 arch
Kinglong Mee [Sun, 29 Jun 2014 11:18:17 +0000 (19:18 +0800)]
NFSD: Avoid warning message when compile at i686 arch

fs/nfsd/nfs4xdr.c: In function 'nfsd4_encode_readv':
>> fs/nfsd/nfs4xdr.c:3137:148: warning: comparison of distinct pointer types lacks a cast [enabled by default]
thislen = min(len, ((void *)xdr->end - (void *)xdr->p));

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: replace defer_free by svcxdr_tmpalloc
J. Bruce Fields [Tue, 24 Jun 2014 21:43:45 +0000 (17:43 -0400)]
nfsd4: replace defer_free by svcxdr_tmpalloc

Avoid an extra allocation for the tmpbuf struct itself, and stop
ignoring some allocation failures.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: remove nfs4_acl_new
J. Bruce Fields [Tue, 24 Jun 2014 21:51:21 +0000 (17:51 -0400)]
nfsd4: remove nfs4_acl_new

This is a not-that-useful kmalloc wrapper.  And I'd like one of the
callers to actually use something other than kmalloc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: define svcxdr_dupstr to share some common code
J. Bruce Fields [Tue, 24 Jun 2014 21:06:51 +0000 (17:06 -0400)]
nfsd4: define svcxdr_dupstr to share some common code

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: remove unused defer_free argument
J. Bruce Fields [Tue, 24 Jun 2014 20:51:12 +0000 (16:51 -0400)]
nfsd4: remove unused defer_free argument

28e05dd8457c "knfsd: nfsd4: represent nfsv4 acl with array instead of
linked list" removed the last user that wanted a custom free function.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: rename cr_linkname->cr_data
J. Bruce Fields [Tue, 24 Jun 2014 19:06:41 +0000 (15:06 -0400)]
nfsd4: rename cr_linkname->cr_data

The name of a link is currently stored in cr_name and cr_namelen, and
the content in cr_linkname and cr_linklen.  That's confusing.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: let nfsd_symlink assume null-terminated data
J. Bruce Fields [Fri, 20 Jun 2014 15:52:21 +0000 (11:52 -0400)]
nfsd: let nfsd_symlink assume null-terminated data

Currently nfsd_symlink has a weird hack to serve callers who don't
null-terminate symlink data: it looks ahead at the next byte to see if
it's zero, and copies it to a new buffer to null-terminate if not.

That means callers don't have to null-terminate, but they *do* have to
ensure that the byte following the end of the data is theirs to read.

That's a bit subtle, and the NFSv4 code actually got this wrong.

So let's just throw out that code and let callers pass null-terminated
strings; we've already fixed them to do that.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: make NFSv2 null terminate symlink data
J. Bruce Fields [Fri, 20 Jun 2014 15:49:49 +0000 (11:49 -0400)]
nfsd: make NFSv2 null terminate symlink data

It's simple enough for NFSv2 to null-terminate the symlink data.

A bit weird (it depends on knowing that we've already read the following
byte, which is either padding or part of the mode), but no worse than
the conditional kstrdup it otherwise relies on in nfsd_symlink().

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: fix rare symlink decoding bug
J. Bruce Fields [Thu, 19 Jun 2014 20:44:48 +0000 (16:44 -0400)]
nfsd: fix rare symlink decoding bug

An NFS operation that creates a new symlink includes the symlink data,
which is xdr-encoded as a length followed by the data plus 0 to 3 bytes
of zero-padding as required to reach a 4-byte boundary.

The vfs, on the other hand, wants null-terminated data.

The simple way to handle this would be by copying the data into a newly
allocated buffer with space for the final null.

The current nfsd_symlink code tries to be more clever by skipping that
step in the (likely) case where the byte following the string is already
0.

But that assumes that the byte following the string is ours to look at.
In fact, it might be the first byte of a page that we can't read, or of
some object that another task might modify.

Worse, the NFSv4 code tries to fix the problem by actually writing to
that byte.

In the NFSv2/v3 cases this actually appears to be safe:

- nfs3svc_decode_symlinkargs explicitly null-terminates the data
  (after first checking its length and copying it to a new
  page).
- NFSv2 limits symlinks to 1k.  The buffer holding the rpc
  request is always at least a page, and the link data (and
  previous fields) have maximum lengths that prevent the request
  from reaching the end of a page.

In the NFSv4 case the CREATE op is potentially just one part of a long
compound so can end up on the end of a page if you're unlucky.

The minimal fix here is to copy and null-terminate in the NFSv4 case.
The nfsd_symlink() interface here seems too fragile, though.  It should
really either do the copy itself every time or just require a
null-terminated string.

Reported-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: properly handle embedded newlines in fault_injection input
Jeff Layton [Wed, 18 Jun 2014 19:00:19 +0000 (15:00 -0400)]
nfsd: properly handle embedded newlines in fault_injection input

Currently rpc_pton() fails to handle the case where you echo an address
into the file, as it barfs on the newline. Ensure that we NULL out the
first occurrence of any newline.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: fix return of nfs4_acl_write_who
Jeff Layton [Tue, 17 Jun 2014 10:14:08 +0000 (06:14 -0400)]
nfsd: fix return of nfs4_acl_write_who

AFAICT, the only way to hit this error is to pass this function a bogus
"who" value. In that case, we probably don't want to return -1 as that
could get sent back to the client. Turn this into nfserr_serverfault,
which is a more appropriate error for a server bug like this.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: add appropriate __force directives to filehandle generation code
Jeff Layton [Tue, 17 Jun 2014 11:44:14 +0000 (07:44 -0400)]
nfsd: add appropriate __force directives to filehandle generation code

The filehandle structs all use host-endian values, but will sometimes
stuff big-endian values into those fields. This is OK since these
values are opaque to the client, but it confuses sparse. Add __force to
make it clear that we are doing this intentionally.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: nfsd_splice_read and nfsd_readv should return __be32
Jeff Layton [Tue, 17 Jun 2014 11:44:13 +0000 (07:44 -0400)]
nfsd: nfsd_splice_read and nfsd_readv should return __be32

The callers expect a __be32 return and the functions they call return
__be32, so having these return int is just wrong. Also, nfsd_finish_read
can be made static.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: clean up sparse endianness warnings in nfscache.c
Jeff Layton [Tue, 17 Jun 2014 11:44:12 +0000 (07:44 -0400)]
nfsd: clean up sparse endianness warnings in nfscache.c

We currently hash the XID to determine a hash bucket to use for the
reply cache entry, which is fed into hash_32 without byte-swapping it.
Add __force to make sparse happy, and add some comments to explain
why.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: add __force to opaque verifier field casts
Jeff Layton [Tue, 17 Jun 2014 11:44:11 +0000 (07:44 -0400)]
nfsd: add __force to opaque verifier field casts

sparse complains that we're stuffing non-byte-swapped values into
__be32's here. Since they're supposed to be opaque, it doesn't matter
much. Just add __force to make sparse happy.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: Using exp_get for export getting
Kinglong Mee [Tue, 10 Jun 2014 14:06:44 +0000 (22:06 +0800)]
NFSD: Using exp_get for export getting

Don't using cache_get besides export.h, using exp_get for export.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: Using path_get when assigning path for export
Kinglong Mee [Tue, 10 Jun 2014 14:04:43 +0000 (22:04 +0800)]
NFSD: Using path_get when assigning path for export

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoSUNRPC/NFSD: Change to type of bool for rq_usedeferral and rq_splice_ok
Kinglong Mee [Tue, 10 Jun 2014 10:29:39 +0000 (18:29 +0800)]
SUNRPC/NFSD: Change to type of bool for rq_usedeferral and rq_splice_ok

rq_usedeferral and rq_splice_ok are used as 0 and 1, just defined to bool.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: Using min/max/min_t/max_t for calculate
Kinglong Mee [Tue, 10 Jun 2014 10:08:19 +0000 (18:08 +0800)]
NFSD: Using min/max/min_t/max_t for calculate

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoLinux 3.16-rc2
Linus Torvalds [Sun, 22 Jun 2014 05:02:54 +0000 (19:02 -1000)]
Linux 3.16-rc2

10 years agoMerge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sun, 22 Jun 2014 05:01:15 +0000 (19:01 -1000)]
Merge branch 'i2c/for-next' of git://git./linux/kernel/git/wsa/linux

Pull i2c new drivers from Wolfram Sang:
 "Here is a pull request from i2c hoping for the "new driver" rule.

  Originally, I wanted to send this request during the merge window, but
  code checkers with very recent additions complained, so a few fixups
  were needed.  So, some more time went by and I merged rc1 to get a
  stable base"

So the "new driver" rule is really about drivers that people absolutely
need for the kernel to work on new hardware, which is not so much the
case for i2c.  So I considered not pulling this, but eventually
relented.

Just for FYI: the whole (and only) point of "new drivers" is not that
new drivers cannot regress things (they can, and they have - by
triggering badly tested code on machines that never triggered that code
before), but because they can bring to life machines that otherwise
wouldn't be useful at all without the drivers.

So the new driver rule is for essential things that actual consumers
would care about, ie devices like networking or disk drivers that matter
to normal people (not server people - they run old kernels anyway, so
mainlining new drivers is irrelevant for them).

* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: sun6-p2wi: fix call to snprintf
  i2c: rk3x: add NULL entry to the end of_device_id array
  i2c: sun6i-p2wi: use proper return value in probe
  i2c: sunxi: add P2WI (Push/Pull 2 Wire Interface) controller support
  i2c: sunxi: add P2WI DT bindings documentation
  i2c: rk3x: add driver for Rockchip RK3xxx SoC I2C adapter

10 years agoMerge tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux
Linus Torvalds [Sun, 22 Jun 2014 02:40:30 +0000 (16:40 -1000)]
Merge tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux

Pull file locking fixes from Jeff Layton:
 "File locking related bugfixes

  Nothing too earth-shattering here.  A fix for a potential regression
  due to a patch in pile #1, and the addition of a memory barrier to
  prevent a race condition between break_deleg and generic_add_lease"

* tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux:
  locks: set fl_owner for leases back to current->files
  locks: add missing memory barrier in break_deleg

10 years agoMerge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Linus Torvalds [Sun, 22 Jun 2014 02:38:16 +0000 (16:38 -1000)]
Merge branch 'rc-fixes' of git://git./linux/kernel/git/mmarek/kbuild

Pull kbuild fixes from Michal Marek:
 "There are three fixes for regressions caused by the relative paths
  series: deb-pkg, tar-pkg and *docs did not work with O=.

  Plus, there is a fix for the linux-headers deb package and a fixed
  typo.  These are not regression fixes but are safe enough"

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: fix a typo in a kbuild document
  builddeb: fix missing headers in linux-headers package
  Documentation: Fix DocBook build with relative $(srctree)
  kbuild: Fix tar-pkg with relative $(objtree)
  deb-pkg: Fix for relative paths

10 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux...
Linus Torvalds [Sun, 22 Jun 2014 00:21:43 +0000 (14:21 -1000)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
 "This fixes some lockups in btrfs reported with rc1.  It probably has
  some performance impact because it is backing off our spinning locks
  more often and switching to a blocking lock.  I'll be able to nail
  that down next week, but for now I want to get the lockups taken care
  of.

  Otherwise some more stack reduction and assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix wrong error handle when the device is missing or is not writeable
  Btrfs: fix deadlock when mounting a degraded fs
  Btrfs: use bio_endio_nodec instead of open code
  Btrfs: fix NULL pointer crash when running balance and scrub concurrently
  btrfs: Skip scrubbing removed chunks to avoid -ENOENT.
  Btrfs: fix broken free space cache after the system crashed
  Btrfs: make free space cache write out functions more readable
  Btrfs: remove unused wait queue in struct extent_buffer
  Btrfs: fix deadlocks with trylock on tree nodes

10 years agoMerge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Sun, 22 Jun 2014 00:20:38 +0000 (14:20 -1000)]
Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfixes from Bruce Fields:
 "Fixes for a new regression from the xdr encoding rewrite, and a
  delegation problem we've had for a while (made somewhat more annoying
  by the vfs delegation support added in 3.13)"

* 'for-3.16' of git://linux-nfs.org/~bfields/linux:
  NFSD: fix bug for readdir of pseudofs
  NFSD: Don't hand out delegations for 30 seconds after recalling them.

10 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 21 Jun 2014 17:07:17 +0000 (07:07 -1000)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "This is larger than usual: the main reason are the ARM symbol lookup
  speedups that came in late and were hard to resist.

  There's also a kprobes fix and various tooling fixes, plus the minimal
  re-enablement of the mmap2 support interface"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  x86/kprobes: Fix build errors and blacklist context_track_user
  perf tests: Add test for closing dso objects on EMFILE error
  perf tests: Add test for caching dso file descriptors
  perf tests: Allow reuse of test_file function
  perf tests: Spawn child for each test
  perf tools: Add dso__data_* interface descriptons
  perf tools: Allow to close dso fd in case of open failure
  perf tools: Add file size check and factor dso__data_read_offset
  perf tools: Cache dso data file descriptor
  perf tools: Add global count of opened dso objects
  perf tools: Add global list of opened dso objects
  perf tools: Add data_fd into dso object
  perf tools: Separate dso data related variables
  perf tools: Cache register accesses for unwind processing
  perf record: Fix to honor user freq/interval properly
  perf timechart: Reflow documentation
  perf probe: Improve error messages in --line option
  perf probe: Improve an error message of perf probe --vars mode
  perf probe: Show error code and description in verbose mode
  perf probe: Improve error message for unknown member of data structure
  ...

10 years agoMerge branch 'locking-urgent-for-linus.patch' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 21 Jun 2014 17:06:02 +0000 (07:06 -1000)]
Merge branch 'locking-urgent-for-linus.patch' of git://git./linux/kernel/git/tip/tip

Pull rtmutex fixes from Thomas Gleixner:
 "Another three patches to make the rtmutex code more robust.  That's
  the last urgent fallout from the big futex/rtmutex investigation"

* 'locking-urgent-for-linus.patch' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rtmutex: Plug slow unlock race
  rtmutex: Detect changes in the pi lock chain
  rtmutex: Handle deadlock detection smarter

10 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 21 Jun 2014 16:47:01 +0000 (06:47 -1000)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux

Pull s390 patches from Martin Schwidefsky:
 "A couple of bug fixes, a debug change for qdio, an update for the
  default config, and one small extension.

  The watchdog module based on diagnose 0x288 is converted to the
  watchdog API and it now works under LPAR as well"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/ccwgroup: use ccwgroup_ungroup wrapper
  s390/ccwgroup: fix an uninitialized return code
  s390/ccwgroup: obtain extra reference for asynchronous processing
  qdio: Keep device-specific dbf entries
  s390/compat: correct ucontext layout for high gprs
  s390/cio: set device name as early as possible
  s390: update default configuration
  s390: avoid format strings leaking into names
  s390/airq: silence lockdep warning
  s390/watchdog: add support for LPAR operation (diag288)
  s390/watchdog: use watchdog API
  s390/sclp_vt220: Enable ASCII console per default
  s390/qdio: replace shift loop by ilog2
  s390/cio: silence lockdep warning
  s390/uaccess: always load the kernel ASCE after task switch
  s390/ap_bus: Make modules parameters visible in sysfs