NeilBrown [Mon, 4 Aug 2014 06:24:00 +0000 (16:24 +1000)]
SUNRPC: remove all refcounting of groupinfo from rpcauth_lookupcred
current_cred() can only be changed by 'current', and
cred->group_info is never changed. If a new group_info is
needed, a new 'cred' is created.
Consequently it is always safe to access
current_cred()->group_info
without taking any further references.
So drop the refcounting and the incorrect rcu_dereference().
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NeilBrown [Mon, 4 Aug 2014 06:24:00 +0000 (16:24 +1000)]
NFS: fix two problems in lookup_revalidate in RCU-walk
1/ rcu_dereference isn't correct: that field isn't
RCU protected. It could potentially change at any time
so ACCESS_ONCE might be justified.
changes to ->d_parent are protected by ->d_seq. However
that isn't always checked after ->d_revalidate is called,
so it is safest to keep the double-check that ->d_parent
hasn't changed at the end of these functions.
2/ in nfs4_lookup_revalidate, "->d_parent" was forgotten.
So 'parent' was not the parent of 'dentry'.
This fails safe is the context is that dentry->d_inode is
NULL, and the result of parent->d_inode being NULL is
that ECHILD is returned, which is always safe.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NeilBrown [Mon, 14 Jul 2014 01:28:20 +0000 (11:28 +1000)]
NFS: allow lockless access to access_cache
The access cache is used during RCU-walk path lookups, so it is best
to avoid locking if possible as taking a lock kills concurrency.
The rbtree is not rcu-safe and cannot easily be made so.
Instead we simply check the last (i.e. most recent) entry on the LRU
list. If this doesn't match, then we return -ECHILD and retry in
lock/refcount mode.
This requires freeing the nfs_access_entry struct with rcu, and
requires using rcu access primatives when adding entries to the lru, and
when examining the last entry.
Calling put_rpccred before kfree_rcu looks a bit odd, but as
put_rpccred already provides rcu protection, we know that the cred will
not actually be freed until the next grace period, so any concurrent
access will be safe.
This patch provides about 5% performance improvement on a stat-heavy
synthetic work load with 4 threads on a 2-core CPU.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NeilBrown [Mon, 14 Jul 2014 01:28:20 +0000 (11:28 +1000)]
NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU
It fails with -ECHILD rather than make an RPC call.
This allows nfs_lookup_revalidate to call it in RCU-walk mode.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NeilBrown [Mon, 14 Jul 2014 01:28:20 +0000 (11:28 +1000)]
NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU
This requires nfs_check_verifier to take an rcu_walk flag, and requires
an rcu version of nfs_revalidate_inode which returns -ECHILD rather
than making an RPC call.
With this, nfs_lookup_revalidate can call nfs_neg_need_reval in
RCU-walk mode.
We can also move the LOOKUP_RCU check past the nfs_check_verifier()
call in nfs_lookup_revalidate.
If RCU_WALK prevents nfs_check_verifier or nfs_neg_need_reval from
doing a full check, they return a status indicating that a revalidation
is required. As this revalidation will not be possible in RCU_WALK
mode, -ECHILD will ultimately be returned, which is the desired result.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NeilBrown [Mon, 14 Jul 2014 01:28:20 +0000 (11:28 +1000)]
NFS: support RCU_WALK in nfs_permission()
nfs_permission makes two calls which are not always safe in RCU_WALK,
rpc_lookup_cred and nfs_do_access.
The second can easily be made rcu-safe by aborting with -ECHILD before
making the RPC call.
The former can be made rcu-safe by calling rpc_lookup_cred_nonblock()
instead.
As this will almost always succeed, we use it even when RCU_WALK
isn't being used as it still saves some spinlocks in a common case.
We only fall back to rpc_lookup_cred() if rpc_lookup_cred_nonblock()
fails and MAY_NOT_BLOCK isn't set.
This optimisation (always trying rpc_lookup_cred_nonblock()) is
particularly important when a security module is active.
In that case inode_permission() may return -ECHILD from
security_inode_permission() even though ->permission() succeeded in
RCU_WALK mode.
This leads to may_lookup() retrying inode_permission after performing
unlazy_walk(). The spinlock that rpc_lookup_cred() takes is often
more expensive than anything security_inode_permission() does, so that
spinlock becomes the main bottleneck.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NeilBrown [Mon, 14 Jul 2014 01:28:20 +0000 (11:28 +1000)]
sunrpc/auth: allow lockless (rcu) lookup of credential cache.
The new flag RPCAUTH_LOOKUP_RCU to credential lookup avoids locking,
does not take a reference on the returned credential, and returns
-ECHILD if a simple lookup was not possible.
The returned value can only be used within an rcu_read_lock protected
region.
The main user of this is the new rpc_lookup_cred_nonblock() which
returns a pointer to the current credential which is only rcu-safe (no
ref-count held), and might return -ECHILD if allocation was required.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NeilBrown [Mon, 14 Jul 2014 01:28:20 +0000 (11:28 +1000)]
NFS: prepare for RCU-walk support but pushing tests later in code.
nfs_lookup_revalidate, nfs4_lookup_revalidate, and nfs_permission
all need to understand and handle RCU-walk for NFS to gain the
benefits of RCU-walk for cached information.
Currently these functions all immediately return -ECHILD
if the relevant flag (LOOKUP_RCU or MAY_NOT_BLOCK) is set.
This patch pushes those tests later in the code so that we only abort
immediately before we enter rcu-unsafe code. As subsequent patches
make that rcu-unsafe code rcu-safe, several of these new tests will
disappear.
With this patch there are several paths through the code which will no
longer return -ECHILD during an RCU-walk. However these are mostly
error paths or other uninteresting cases.
A noteworthy change in nfs_lookup_revalidate is that we don't take
(or put) the reference to ->d_parent when LOOKUP_RCU is set.
Rather we rcu_dereference ->d_parent, and check that ->d_inode
is not NULL. We also check that ->d_parent hasn't changed after
all the tests.
In nfs4_lookup_revalidate we simply avoid testing LOOKUP_RCU on the
path that only calls nfs_lookup_revalidate() as that function
already performs the required test.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NeilBrown [Mon, 14 Jul 2014 01:28:20 +0000 (11:28 +1000)]
NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used.
nfs4_lookup_revalidate only uses 'parent' to get 'dir', and only
uses 'dir' if 'inode == NULL'.
So we don't need to find out what 'parent' or 'dir' is until we
know that 'inode' is NULL.
By moving 'dget_parent' inside the 'if', we can reduce the number of
call sites for 'dput(parent)'.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Alexey Khoroshilov [Thu, 17 Jul 2014 23:11:45 +0000 (03:11 +0400)]
NFS: add checks for returned value of try_module_get()
There is a couple of places in client code where returned value
of try_module_get() is ignored. As a result there is a small chance
to premature unload module because of unbalanced refcounting.
The patch adds error handling in that places.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Fri, 18 Jul 2014 00:42:19 +0000 (20:42 -0400)]
nfs: clear_request_commit while holding i_lock
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Fri, 18 Jul 2014 00:42:18 +0000 (20:42 -0400)]
pnfs: add pnfs_put_lseg_async
This is useful when lsegs need to be released while holding locks.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Fri, 18 Jul 2014 00:42:17 +0000 (20:42 -0400)]
pnfs: find swapped pages on pnfs commit lists too
nfs_page_find_head_request_locked looks through the regular nfs commit lists
when the page is swapped out, but doesn't look through the pnfs commit lists.
I'm not sure if anyone has hit any issues caused by this.
Suggested-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Fri, 18 Jul 2014 00:42:16 +0000 (20:42 -0400)]
nfs: fix comment and add warn_on for PG_INODE_REF
Fix the comment in nfs_page.h for PG_INODE_REF to reflect that it's no longer
set only on head requests. Also add a WARN_ON_ONCE in nfs_inode_remove_request
as PG_INODE_REF should always be set.
Suggested-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Fri, 18 Jul 2014 00:42:15 +0000 (20:42 -0400)]
nfs: check wait_on_bit_lock err in page_group_lock
Return errors from wait_on_bit_lock from nfs_page_group_lock.
Add a bool argument @wait to nfs_page_group_lock. If true, loop over
wait_on_bit_lock until it returns cleanly. If false, return the error
from wait_on_bit_lock.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 16 Jul 2014 10:52:22 +0000 (06:52 -0400)]
sunrpc: remove "ec" argument from encrypt_v2 operation
It's always 0.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 16 Jul 2014 10:52:21 +0000 (06:52 -0400)]
sunrpc: clean up sparse endianness warnings in gss_krb5_wrap.c
Fix the endianness handling in gss_wrap_kerberos_v1 and drop the memset
call there in favor of setting the filler bytes directly.
In gss_wrap_kerberos_v2, get rid of the "ec" variable which is always
zero, and drop the endianness conversion of 0. Sparse handles 0 as a
special case, so it's not necessary.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 16 Jul 2014 10:52:20 +0000 (06:52 -0400)]
sunrpc: clean up sparse endianness warnings in gss_krb5_seal.c
Use u16 pointer in setup_token and setup_token_v2. None of the fields
are actually handled as __be16, so this simplifies the code a bit. Also
get rid of some unneeded pointer increments.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 16 Jul 2014 10:52:19 +0000 (06:52 -0400)]
sunrpc: fix RCU handling of gc_ctx field
The handling of the gc_ctx pointer only seems to be partially RCU-safe.
The assignment and freeing are done using RCU, but many places in the
code seem to dereference that pointer without proper RCU safeguards.
Fix them to use rcu_dereference and to rcu_read_lock/unlock, and to
properly handle the case where the pointer is NULL.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 16 Jul 2014 10:52:18 +0000 (06:52 -0400)]
sunrpc: remove __rcu annotation from struct gss_cl_ctx->gc_gss_ctx
Commit
5b22216e11f7 (nfs: __rcu annotations) added a __rcu annotation to
the gc_gss_ctx field. I see no rationale for adding that though, as that
field does not seem to be managed via RCU at all.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NeilBrown [Mon, 21 Jul 2014 03:28:28 +0000 (13:28 +1000)]
NFS: nfs4_do_open should add negative results to the dcache.
If you have an NFSv4 mounted directory which does not container 'foo'
and:
ls -l foo
ssh $server touch foo
cat foo
then the 'cat' will fail (usually, depending a bit on the various
cache ages). This is correct as negative looks are cached by default.
However with the same initial conditions:
cat foo
ssh $server touch foo
cat foo
will usually succeed. This is because an "open" does not add a
negative dentry to the dcache, while a "lookup" does.
This can have negative performance effects. When "gcc" searches for
an include file, it will try to "open" the file in every director in
the search path. Without caching of negative "open" results, this
generates much more traffic to the server than it should (or than
NFSv3 does).
The root of the problem is that _nfs4_open_and_get_state() will call
d_add_unique() on a positive result, but not on a negative result.
Compare with nfs_lookup() which calls d_materialise_unique on both
a positive result and on ENOENT.
This patch adds a call d_add() in the ENOENT case for
_nfs4_open_and_get_state() and also calls nfs_set_verifier().
With it, many fewer "open" requests for known-non-existent files are
sent to the server.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Andrey Utkin [Sat, 26 Jul 2014 11:58:01 +0000 (14:58 +0300)]
nfs3_list_one_acl(): check get_acl() result with IS_ERR_OR_NULL
There was a check for result being not NULL. But get_acl() may return
NULL, or ERR_PTR, or actual pointer.
The purpose of the function where current change is done is to "list
ACLs only when they are available", so any error condition of get_acl()
mustn't be elevated, and returning 0 there is still valid.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81111
Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 74adf83f5d77 (nfs: only show Posix ACLs in listxattr if actually...)
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Sun, 3 Aug 2014 21:04:51 +0000 (17:04 -0400)]
Merge branch 'nfs-rdma' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next
* 'nfs-rdma' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (916 commits)
xprtrdma: Handle additional connection events
xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro
xprtrdma: Make rpcrdma_ep_disconnect() return void
xprtrdma: Schedule reply tasklet once per upcall
xprtrdma: Allocate each struct rpcrdma_mw separately
xprtrdma: Rename frmr_wr
xprtrdma: Disable completions for LOCAL_INV Work Requests
xprtrdma: Disable completions for FAST_REG_MR Work Requests
xprtrdma: Don't post a LOCAL_INV in rpcrdma_register_frmr_external()
xprtrdma: Reset FRMRs after a flushed LOCAL_INV Work Request
xprtrdma: Reset FRMRs when FAST_REG_MR is flushed by a disconnect
xprtrdma: Properly handle exhaustion of the rb_mws list
xprtrdma: Chain together all MWs in same buffer pool
xprtrdma: Back off rkey when FAST_REG_MR fails
xprtrdma: Unclutter struct rpcrdma_mr_seg
xprtrdma: Don't invalidate FRMRs if registration fails
xprtrdma: On disconnect, don't ignore pending CQEs
xprtrdma: Update rkeys after transport reconnect
xprtrdma: Limit data payload size for ALLPHYSICAL
xprtrdma: Protect ia->ri_id when unmapping/invalidating MRs
...
Trond Myklebust [Mon, 21 Jul 2014 17:53:48 +0000 (13:53 -0400)]
NFS: Enforce an upper limit on the number of cached access call
This may be used to limit the number of cached credentials building up
inside the access cache.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Mon, 21 Jul 2014 17:32:42 +0000 (13:32 -0400)]
SUNRPC: Enforce an upper limit on the number of cached credentials
In some cases where the credentials are not often reused, we may want
to limit their total number just in order to make the negative lookups
in the hash table more manageable.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Chuck Lever [Tue, 29 Jul 2014 21:26:12 +0000 (17:26 -0400)]
xprtrdma: Handle additional connection events
Commit
38ca83a5 added RDMA_CM_EVENT_TIMEWAIT_EXIT. But that status
is relevant only for consumers that re-use their QPs on new
connections. xprtrdma creates a fresh QP on reconnection, so that
event should be explicitly ignored.
Squelch the alarming "unexpected CM event" message.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:26:04 +0000 (17:26 -0400)]
xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro
Clean up.
RPCRDMA_PERSISTENT_REGISTRATION was a compile-time switch between
RPCRDMA_REGISTER mode and RPCRDMA_ALLPHYSICAL mode. Since
RPCRDMA_REGISTER has been removed, there's no need for the extra
conditional compilation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:25:55 +0000 (17:25 -0400)]
xprtrdma: Make rpcrdma_ep_disconnect() return void
Clean up: The return code is used only for dprintk's that are
already redundant.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:25:46 +0000 (17:25 -0400)]
xprtrdma: Schedule reply tasklet once per upcall
Minor optimization: grab rpcrdma_tk_lock_g and disable hard IRQs
just once after clearing the receive completion queue.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:25:38 +0000 (17:25 -0400)]
xprtrdma: Allocate each struct rpcrdma_mw separately
Currently rpcrdma_buffer_create() allocates struct rpcrdma_mw's as
a single contiguous area of memory. It amounts to quite a bit of
memory, and there's no requirement for these to be carved from a
single piece of contiguous memory.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:25:29 +0000 (17:25 -0400)]
xprtrdma: Rename frmr_wr
Clean up: Name frmr_wr after the opcode of the Work Request,
consistent with the send and local invalidation paths.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:25:20 +0000 (17:25 -0400)]
xprtrdma: Disable completions for LOCAL_INV Work Requests
Instead of relying on a completion to change the state of an FRMR
to FRMR_IS_INVALID, set it in advance. If an error occurs, a completion
will fire anyway and mark the FRMR FRMR_IS_STALE.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:25:12 +0000 (17:25 -0400)]
xprtrdma: Disable completions for FAST_REG_MR Work Requests
Instead of relying on a completion to change the state of an FRMR
to FRMR_IS_VALID, set it in advance. If an error occurs, a completion
will fire anyway and mark the FRMR FRMR_IS_STALE.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:25:03 +0000 (17:25 -0400)]
xprtrdma: Don't post a LOCAL_INV in rpcrdma_register_frmr_external()
Any FRMR arriving in rpcrdma_register_frmr_external() is now
guaranteed to be either invalid, or to be targeted by a queued
LOCAL_INV that will invalidate it before the adapter processes
the FAST_REG_MR being built here.
The problem with current arrangement of chaining a LOCAL_INV to the
FAST_REG_MR is that if the transport is not connected, the LOCAL_INV
is flushed and the FAST_REG_MR is flushed. This leaves the FRMR
valid with the old rkey. But rpcrdma_register_frmr_external() has
already bumped the in-memory rkey.
Next time through rpcrdma_register_frmr_external(), a LOCAL_INV and
FAST_REG_MR is attempted again because the FRMR is still valid. But
the rkey no longer matches the hardware's rkey, and a memory
management operation error occurs.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:24:54 +0000 (17:24 -0400)]
xprtrdma: Reset FRMRs after a flushed LOCAL_INV Work Request
When a LOCAL_INV Work Request is flushed, it leaves an FRMR in the
VALID state. This FRMR can be returned by rpcrdma_buffer_get(), and
must be knocked down in rpcrdma_register_frmr_external() before it
can be re-used.
Instead, capture these in rpcrdma_buffer_get(), and reset them.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:24:45 +0000 (17:24 -0400)]
xprtrdma: Reset FRMRs when FAST_REG_MR is flushed by a disconnect
FAST_REG_MR Work Requests update a Memory Region's rkey. Rkey's are
used to block unwanted access to the memory controlled by an MR. The
rkey is passed to the receiver (the NFS server, in our case), and is
also used by xprtrdma to invalidate the MR when the RPC is complete.
When a FAST_REG_MR Work Request is flushed after a transport
disconnect, xprtrdma cannot tell whether the WR actually hit the
adapter or not. So it is indeterminant at that point whether the
existing rkey is still valid.
After the transport connection is re-established, the next
FAST_REG_MR or LOCAL_INV Work Request against that MR can sometimes
fail because the rkey value does not match what xprtrdma expects.
The only reliable way to recover in this case is to deregister and
register the MR before it is used again. These operations can be
done only in a process context, so handle it in the transport
connect worker.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:24:36 +0000 (17:24 -0400)]
xprtrdma: Properly handle exhaustion of the rb_mws list
If the rb_mws list is exhausted, clean up and return NULL so that
call_allocate() will delay and try again.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:24:28 +0000 (17:24 -0400)]
xprtrdma: Chain together all MWs in same buffer pool
During connection loss recovery, need to visit every MW in a
buffer pool. Any MW that is in use by an RPC will not be on the
rb_mws list.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:24:19 +0000 (17:24 -0400)]
xprtrdma: Back off rkey when FAST_REG_MR fails
If posting a FAST_REG_MR Work Reqeust fails, revert the rkey update
to avoid subsequent IB_WC_MW_BIND_ERR completions.
Suggested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:24:09 +0000 (17:24 -0400)]
xprtrdma: Unclutter struct rpcrdma_mr_seg
Clean ups:
- make it obvious that the rl_mw field is a pointer -- allocated
separately, not as part of struct rpcrdma_mr_seg
- promote "struct {} frmr;" to a named type
- promote the state enum to a named type
- name the MW state field the same way other fields in
rpcrdma_mw are named
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:24:01 +0000 (17:24 -0400)]
xprtrdma: Don't invalidate FRMRs if registration fails
If FRMR registration fails, it's likely to transition the QP to the
error state. Or, registration may have failed because the QP is
_already_ in ERROR.
Thus calling rpcrdma_deregister_external() in
rpcrdma_create_chunks() is useless in FRMR mode: the LOCAL_INVs just
get flushed.
It is safe to leave existing registrations: when FRMR registration
is tried again, rpcrdma_register_frmr_external() checks if each FRMR
is already/still VALID, and knocks it down first if it is.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:23:52 +0000 (17:23 -0400)]
xprtrdma: On disconnect, don't ignore pending CQEs
xprtrdma is currently throwing away queued completions during
a reconnect. RPC replies posted just before connection loss, or
successful completions that change the state of an FRMR, can be
missed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:23:43 +0000 (17:23 -0400)]
xprtrdma: Update rkeys after transport reconnect
Various reports of:
rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0
ep
ffff8800bfd3e848
Ensure that rkeys in already-marshalled RPC/RDMA headers are
refreshed after the QP has been replaced by a reconnect.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=249
Suggested-by: Selvin Xavier <Selvin.Xavier@Emulex.Com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:23:34 +0000 (17:23 -0400)]
xprtrdma: Limit data payload size for ALLPHYSICAL
When the client uses physical memory registration, each page in the
payload gets its own array entry in the RPC/RDMA header's chunk list.
Therefore, don't advertise a maximum payload size that would require
more array entries than can fit in the RPC buffer where RPC/RDMA
headers are built.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=248
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:23:25 +0000 (17:23 -0400)]
xprtrdma: Protect ia->ri_id when unmapping/invalidating MRs
Ensure ia->ri_id remains valid while invoking dma_unmap_page() or
posting LOCAL_INV during a transport reconnect. Otherwise,
ia->ri_id->device or ia->ri_id->qp is NULL, which triggers a panic.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=259
Fixes: ec62f40 'xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting'
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 29 Jul 2014 21:23:17 +0000 (17:23 -0400)]
xprtrdma: Fix panic in rpcrdma_register_frmr_external()
seg1->mr_nsegs is not yet initialized when it is used to unmap
segments during an error exit. Use the same unmapping logic for
all error exits.
"if (frmr_wr.wr.fast_reg.length < len) {" used to be a BUG_ON check.
The broken code will never be executed under normal operation.
Fixes: c977dea (xprtrdma: Remove BUG_ON() call sites)
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Yan Burman [Thu, 19 Jun 2014 13:06:30 +0000 (16:06 +0300)]
xprtrdma: Fix DMA-API-DEBUG warning by checking dma_map result
Fix the following warning when DMA-API debug is enabled by checking ib_dma_map_single result:
[ 1455.345548] ------------[ cut here ]------------
[ 1455.346863] WARNING: CPU: 3 PID: 3929 at /home/yanb/kernel/net-next/lib/dma-debug.c:1140 check_unmap+0x4e5/0x990()
[ 1455.349350] mlx4_core 0000:00:07.0: DMA-API: device driver failed to check map error[device address=0x000000007c9f2090] [size=2656 bytes] [mapped as single]
[ 1455.349350] Modules linked in: xprtrdma netconsole configfs nfsv3 nfs_acl ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm autofs4 auth_rpcgss oid_registry nfsv4 nfs fscache lockd sunrpc dm_mirror dm_region_hash dm_log microcode pcspkr mlx4_ib ib_sa ib_mad ib_core ib_addr mlx4_en ipv6 ptp pps_core vxlan mlx4_core virtio_balloon cirrus ttm drm_kms_helper drm sysimgblt sysfillrect syscopyarea i2c_piix4 i2c_core button ext3 jbd virtio_blk virtio_net virtio_pci virtio_ring virtio uhci_hcd ata_generic ata_piix libata
[ 1455.349350] CPU: 3 PID: 3929 Comm: mount.nfs Not tainted 3.15.0-rc1-dbg+ #13
[ 1455.349350] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[ 1455.349350]
0000000000000474 ffff880069dcf628 ffffffff8151c341 ffffffff817b69d8
[ 1455.349350]
ffff880069dcf678 ffff880069dcf668 ffffffff8105b5fc 0000000069dcf658
[ 1455.349350]
ffff880069dcf778 ffff88007b0c9f00 ffffffff8255ec40 0000000000000a60
[ 1455.349350] Call Trace:
[ 1455.349350] [<
ffffffff8151c341>] dump_stack+0x52/0x81
[ 1455.349350] [<
ffffffff8105b5fc>] warn_slowpath_common+0x8c/0xc0
[ 1455.349350] [<
ffffffff8105b6e6>] warn_slowpath_fmt+0x46/0x50
[ 1455.349350] [<
ffffffff812e6305>] check_unmap+0x4e5/0x990
[ 1455.349350] [<
ffffffff81521fb0>] ? _raw_spin_unlock_irq+0x30/0x60
[ 1455.349350] [<
ffffffff812e6a0a>] debug_dma_unmap_page+0x5a/0x60
[ 1455.349350] [<
ffffffffa0389583>] rpcrdma_deregister_internal+0xb3/0xd0 [xprtrdma]
[ 1455.349350] [<
ffffffffa038a639>] rpcrdma_buffer_destroy+0x69/0x170 [xprtrdma]
[ 1455.349350] [<
ffffffffa03872ff>] xprt_rdma_destroy+0x3f/0xb0 [xprtrdma]
[ 1455.349350] [<
ffffffffa04a95ff>] xprt_destroy+0x6f/0x80 [sunrpc]
[ 1455.349350] [<
ffffffffa04a9625>] xprt_put+0x15/0x20 [sunrpc]
[ 1455.349350] [<
ffffffffa04a899a>] rpc_free_client+0x8a/0xe0 [sunrpc]
[ 1455.349350] [<
ffffffffa04a8a58>] rpc_release_client+0x68/0xa0 [sunrpc]
[ 1455.349350] [<
ffffffffa04a9060>] rpc_shutdown_client+0xb0/0xc0 [sunrpc]
[ 1455.349350] [<
ffffffffa04a8f5d>] ? rpc_ping+0x5d/0x70 [sunrpc]
[ 1455.349350] [<
ffffffffa04a91ab>] rpc_create_xprt+0xbb/0xd0 [sunrpc]
[ 1455.349350] [<
ffffffffa04a9273>] rpc_create+0xb3/0x160 [sunrpc]
[ 1455.349350] [<
ffffffff81129749>] ? __probe_kernel_read+0x69/0xb0
[ 1455.349350] [<
ffffffffa053851c>] nfs_create_rpc_client+0xdc/0x100 [nfs]
[ 1455.349350] [<
ffffffffa0538cfa>] nfs_init_client+0x3a/0x90 [nfs]
[ 1455.349350] [<
ffffffffa05391c8>] nfs_get_client+0x478/0x5b0 [nfs]
[ 1455.349350] [<
ffffffffa0538e50>] ? nfs_get_client+0x100/0x5b0 [nfs]
[ 1455.349350] [<
ffffffff81172c6d>] ? kmem_cache_alloc_trace+0x24d/0x260
[ 1455.349350] [<
ffffffffa05393f3>] nfs_create_server+0xf3/0x4c0 [nfs]
[ 1455.349350] [<
ffffffffa0545ff0>] ? nfs_request_mount+0xf0/0x1a0 [nfs]
[ 1455.349350] [<
ffffffffa031c0c3>] nfs3_create_server+0x13/0x30 [nfsv3]
[ 1455.349350] [<
ffffffffa0546293>] nfs_try_mount+0x1f3/0x230 [nfs]
[ 1455.349350] [<
ffffffff8108ea21>] ? get_parent_ip+0x11/0x50
[ 1455.349350] [<
ffffffff812d6343>] ? __this_cpu_preempt_check+0x13/0x20
[ 1455.349350] [<
ffffffff810d632b>] ? try_module_get+0x6b/0x190
[ 1455.349350] [<
ffffffffa05449f7>] nfs_fs_mount+0x187/0x9d0 [nfs]
[ 1455.349350] [<
ffffffffa0545940>] ? nfs_clone_super+0x140/0x140 [nfs]
[ 1455.349350] [<
ffffffffa0543b20>] ? nfs_auth_info_match+0x40/0x40 [nfs]
[ 1455.349350] [<
ffffffff8117e360>] mount_fs+0x20/0xe0
[ 1455.349350] [<
ffffffff811a1c16>] vfs_kern_mount+0x76/0x160
[ 1455.349350] [<
ffffffff811a29a8>] do_mount+0x428/0xae0
[ 1455.349350] [<
ffffffff811a30f0>] SyS_mount+0x90/0xe0
[ 1455.349350] [<
ffffffff8152af52>] system_call_fastpath+0x16/0x1b
[ 1455.349350] ---[ end trace
f1f31572972e211d ]---
Signed-off-by: Yan Burman <yanb@mellanox.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Linus Torvalds [Mon, 21 Jul 2014 04:04:16 +0000 (21:04 -0700)]
Linux 3.16-rc6
Linus Torvalds [Mon, 21 Jul 2014 03:44:53 +0000 (20:44 -0700)]
Merge tag 'staging-3.16-rc6' of git://git./linux/kernel/git/gregkh/staging
Pull more IIO driver fixes from Greg KH:
"Here are two IIO driver fixes for 3.16-rc6 that resolve some reported
issues"
* tag 'staging-3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: mma8452: Use correct acceleration units.
iio:core: Handle error when mask type is not separate
Linus Torvalds [Mon, 21 Jul 2014 03:44:18 +0000 (20:44 -0700)]
Merge tag 'usb-3.16-rc6' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are two USB patches that resolve some reported issues, one with
an odd HUB, and one in the chipidea driver"
* tag 'usb-3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: Check if port status is equal to RxDetect
usb: chipidea: udc: Disable auto ZLP generation on ep0
Linus Torvalds [Mon, 21 Jul 2014 03:43:46 +0000 (20:43 -0700)]
Merge tag 'driver-core-3.16-rc6' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is a single driver core fix that reverts an older patch that has
been causing a number of reported problems with the platform devices.
This revert has been in linux-next for a while with no reported issues"
* tag 'driver-core-3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
platform_get_irq: Revert to platform_get_resource if of_irq_get fails
Linus Torvalds [Mon, 21 Jul 2014 03:43:14 +0000 (20:43 -0700)]
Merge tag 'char-misc-3.16-rc6' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc fix from Greg KH:
"Here's a single hyper-v driver fix for a reported issue"
* tag 'char-misc-3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Drivers: hv: hv_fcopy: fix a race condition for SMP guest
Linus Torvalds [Mon, 21 Jul 2014 03:39:28 +0000 (20:39 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull intel drm fixes from Dave Airlie:
"Intel fixes came in late, but since I debugged one of them I'll send
them on,
Two reverts, a quirk and one warn regression"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
Revert "drm/i915: reverse dp link param selection, prefer fast over wide again"
drm/i915: Track the primary plane correctly when reassigning planes
drm/i915: Ignore VBT backlight presence check on HP Chromebook 14
Revert "drm/i915: Don't set the 8to6 dither flag when not scaling"
Linus Torvalds [Mon, 21 Jul 2014 03:28:04 +0000 (20:28 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/rw/uml
Pull UML fixes from Richard Weinberger:
"Four fixes, all discovered by Trinity"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: segv: Save regs only in case of a kernel mode fault
um: Fix hung task in fix_range_common()
um: Ensure that a stub page cannot get unmapped
Revert "um: Fix wait_stub_done() error handling"
Linus Torvalds [Mon, 21 Jul 2014 03:21:05 +0000 (20:21 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"We have two more fixes in my for-linus branch.
I was hoping to also include a fix for a btrfs deadlock with
compression enabled, but we're still nailing that one down"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: test for valid bdev before kobj removal in btrfs_rm_device
Btrfs: fix abnormal long waiting in fsync
Linus Torvalds [Mon, 21 Jul 2014 02:55:44 +0000 (19:55 -0700)]
Merge tag 'nfs-for-3.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
"Apologies for the relative lateness of this pull request, however the
commits fix some issues with the NFS read/write code updates in
3.16-rc1 that can cause serious Oopsing when using small r/wsize. The
delay was mainly due to extra testing to make sure that the fixes
behave correctly.
Highlights include;
- Stable fix for an NFSv3 posix ACL regression
- Multiple fixes for regressions to the NFS generic read/write code:
- Fix page splitting bugs that come into play when a small
rsize/wsize read/write needs to be sent again (due to error
conditions or page redirty)
- Fix nfs_wb_page_cancel, which is called by the "invalidatepage"
method
- Fix 2 compile warnings about unused variables
- Fix a performance issue affecting unstable writes"
* tag 'nfs-for-3.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Don't reset pg_moreio in __nfs_pageio_add_request
NFS: Remove 2 unused variables
nfs: handle multiple reqs in nfs_wb_page_cancel
nfs: handle multiple reqs in nfs_page_async_flush
nfs: change find_request to find_head_request
nfs: nfs_page should take a ref on the head req
nfs: mark nfs_page reqs with flag for extra ref
nfs: only show Posix ACLs in listxattr if actually present
Richard Weinberger [Sun, 20 Jul 2014 11:39:27 +0000 (13:39 +0200)]
um: segv: Save regs only in case of a kernel mode fault
...otherwise me lose user mode regs and the resulting
stack trace is useless.
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Sun, 20 Jul 2014 11:16:20 +0000 (13:16 +0200)]
um: Fix hung task in fix_range_common()
If do_ops() fails we have to release current->mm->mmap_sem
otherwise the failing task will never terminate.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Sun, 20 Jul 2014 11:09:15 +0000 (13:09 +0200)]
um: Ensure that a stub page cannot get unmapped
Trinity discovered an execution path such that a task
can unmap his stub page.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Sun, 20 Jul 2014 10:56:34 +0000 (12:56 +0200)]
Revert "um: Fix wait_stub_done() error handling"
This reverts commit
0974a9cadc7886f7baaa458bb0c89f5c5f9d458e.
The real for for that issue is to release current->mm->mmap_sem in
fix_range_common().
Signed-off-by: Richard Weinberger <richard@nod.at>
Eric Sandeen [Mon, 7 Jul 2014 17:34:49 +0000 (12:34 -0500)]
btrfs: test for valid bdev before kobj removal in btrfs_rm_device
commit
99994cd btrfs: dev delete should remove sysfs entry
added a btrfs_kobj_rm_device, which dereferences device->bdev...
right after we check whether device->bdev might be NULL.
I don't honestly know if it's possible to have a NULL device->bdev
here, but assuming that it is (given the test), we need to move
the kobject removal to be under that test.
(Coverity spotted this)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>
Liu Bo [Thu, 17 Jul 2014 08:08:36 +0000 (16:08 +0800)]
Btrfs: fix abnormal long waiting in fsync
xfstests generic/127 detected this problem.
With commit
7fc34a62ca4434a79c68e23e70ed26111b7a4cf8, now fsync will only flush
data within the passed range. This is the cause of the above problem,
-- btrfs's fsync has a stage called 'sync log' which will wait for all the
ordered extents it've recorded to finish.
In xfstests/generic/127, with mixed operations such as truncate, fallocate,
punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will
mmap, and then msync. And I find that msync will wait for quite a long time
(about 20s in my case), thanks to ftrace, it turns out that the previous
fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the
range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants,
there can be some ordered extents created but not getting corresponding pages
flushed, then they're left in memory until we fsync which runs into the
stage 'sync log', and fsync will just wait for the system writeback thread
to flush those pages and get ordered extents finished, so the latency is
inevitable.
This adds a flush similar to btrfs_start_ordered_extent() in
btrfs_wait_logged_extents() to fix that.
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Linus Torvalds [Sat, 19 Jul 2014 16:27:55 +0000 (06:27 -1000)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"The locking department delivers:
- A rather large and intrusive bundle of fixes to address serious
performance regressions introduced by the new rwsem / mcs
technology. Simpler solutions have been discussed, but they would
have been ugly bandaids with more risk than doing the right thing.
- Make the rwsem spin on owner technology opt-in for architectures
and enable it only on the known to work ones.
- A few fixes to the lockdep userspace library"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER
locking/mutex: Disable optimistic spinning on some architectures
locking/rwsem: Reduce the size of struct rw_semaphore
locking/rwsem: Rename 'activity' to 'count'
locking/spinlocks/mcs: Micro-optimize osq_unlock()
locking/spinlocks/mcs: Introduce and use init macro and function for osq locks
locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overhead
locking/spinlocks/mcs: Rename optimistic_spin_queue() to optimistic_spin_node()
locking/rwsem: Allow conservative optimistic spinning when readers have lock
tools/liblockdep: Account for bitfield changes in lockdeps lock_acquire
tools/liblockdep: Remove debug print left over from development
tools/liblockdep: Fix comparison of a boolean value with a value of 2
Linus Torvalds [Sat, 19 Jul 2014 16:26:43 +0000 (06:26 -1000)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"Prevent a possible divide by zero in the debugging code"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix possible divide by zero in avg_atom() calculation
Linus Torvalds [Sat, 19 Jul 2014 16:26:01 +0000 (06:26 -1000)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Three patches addressing shortcomings in the ARM gic interrupt chip
driver"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: gic: Fix core ID calculation when topology is read from DT
irqchip: gic: Add binding probe for ARM GIC400
irqchip: gic: Add support for cortex a7 compatible string
Linus Torvalds [Sat, 19 Jul 2014 16:25:03 +0000 (06:25 -1000)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for a long standing issue in the alarm timer subsystem,
which was noticed recently when people finally started to use alarm
timers for serious work"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
alarmtimer: Fix bug where relative alarm timers were treated as absolute
Linus Torvalds [Sat, 19 Jul 2014 16:23:27 +0000 (06:23 -1000)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull RCU fixes from Thomas Gleixner:
"Two RCU patches:
- Address a serious performance regression on open/close caused by
commit
ac1bea85781e ("Make cond_resched() report RCU quiescent
states")
- Export RCU debug functions. Not a regression, but enablement to
address a serious recursion bug in the sl*b allocators in 3.17"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcu: Reduce overhead of cond_resched() checks for RCU
rcu: Export debug_init_rcu_head() and and debug_init_rcu_head()
Linus Torvalds [Sat, 19 Jul 2014 06:49:47 +0000 (20:49 -1000)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"A smaller set of fixes this week, and all regression fixes:
- a handful of issues fixed on at91 with common clock conversion
- a set of fixes for Marvell mvebu (SMP, coherency, PM)
- a clock fix for i.MX6Q.
- ... and a SMP/hotplug fix for Exynos"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: EXYNOS: Fix core ID used by platsmp and hotplug code
ARM: at91/dt: add missing clocks property to pwm node in sam9x5.dtsi
ARM: at91/dt: fix usb0 clocks definition in sam9n12 dtsi
ARM: at91: at91sam9x5: correct typo error for ohci clock
ARM: clk-imx6q: parent lvds_sel input from upstream clock gates
ARM: mvebu: Fix coherency bus notifiers by using separate notifiers
ARM: mvebu: Fix the operand list in the inline asm of armada_370_xp_pmsu_idle_enter
ARM: mvebu: fix SMP boot for Armada 38x and Armada 375 Z1 in big endian
Dave Airlie [Sat, 19 Jul 2014 06:48:38 +0000 (16:48 +1000)]
Merge tag 'drm-intel-fixes-2014-07-18' of git://anongit.freedesktop.org/drm-intel
But in any case nothing really shocking in
here, 2 reverts, 1 quirk and a regression fix a WARN.
* tag 'drm-intel-fixes-2014-07-18' of git://anongit.freedesktop.org/drm-intel:
Revert "drm/i915: reverse dp link param selection, prefer fast over wide again"
drm/i915: Track the primary plane correctly when reassigning planes
drm/i915: Ignore VBT backlight presence check on HP Chromebook 14
Revert "drm/i915: Don't set the 8to6 dither flag when not scaling"
Linus Torvalds [Sat, 19 Jul 2014 06:46:55 +0000 (20:46 -1000)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"A couple of key fixes and a few less critical ones. The main ones
are:
- add a .bss section to the PE/COFF headers when building with EFI
stub
- invoke the correct paravirt magic when building the espfix page
tables
Unfortunately both of these areas also have at least one additional
fix each still in thie pipeline, but which are not yet ready to push"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Remove unused variable "polling"
x86/espfix/xen: Fix allocation of pages for paravirt page tables
x86/efi: Include a .bss section within the PE/COFF headers
efi: fdt: Do not report an error during boot if UEFI is not available
efi/arm64: efistub: remove local copy of linux_banner
Linus Torvalds [Sat, 19 Jul 2014 06:39:34 +0000 (20:39 -1000)]
Merge tag 'rdma-for-linus' of git://git./linux/kernel/git/roland/infiniband
Pull infiniband/rdma fixes from Roland Dreier:
- cxgb4 hardware driver regression fixes
- mlx5 hardware driver regression fixes
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/mlx5: Enable "block multicast loopback" for kernel consumers
RDMA/cxgb4: Call iwpm_init() only once
mlx5_core: Fix possible race between mr tree insert/delete
RDMA/cxgb4: Initialize the device status page
RDMA/cxgb4: Clean up connection on ARP error
RDMA/cxgb4: Fix skb_leak in reject_cr()
Linus Torvalds [Sat, 19 Jul 2014 06:37:24 +0000 (20:37 -1000)]
Merge tag 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"More fallout from module tests and code inspection.
Fixes to temperature limit write operations in adt7470 driver. Also,
dashes are not allowed in hwmon 'name' attributes. Fix drivers where
necessary"
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (adt7470) Fix writes to temperature limit registers
hwmon: (da9055) Don't use dash in the name attribute
hwmon: (da9052) Don't use dash in the name attribute
Linus Torvalds [Sat, 19 Jul 2014 06:36:13 +0000 (20:36 -1000)]
Merge tag 'iommu-fixes-v3.16-rc5' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"A couple of fixes for the Freescale PAMU driver queued up:
- fix PAMU window size check.
- fix the device domain attach condition.
- fix the error condition during iommu group"
* tag 'iommu-fixes-v3.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/fsl: Fix the error condition during iommu group
iommu/fsl: Fix the device domain attach condition.
iommu/fsl: Fix PAMU window size check.
Linus Torvalds [Sat, 19 Jul 2014 06:28:27 +0000 (20:28 -1000)]
Merge tag 'pm+acpi-3.16-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These are a few recent regression fixes, a revert of the ACPI video
commit I promised, a system resume fix related to request_firmware(),
an ACPI video quirk for one more Win8-oriented BIOS, an ACPI device
enumeration documentation update and a few fixes for ARM cpufreq
drivers.
Specifics:
- Fix for a recently introduced NULL pointer dereference in the core
system suspend code occuring when platforms without ACPI attempt to
use the "freeze" sleep state from Zhang Rui.
- Fix for a recently introduced build warning in cpufreq headers from
Brian W Hart.
- Fix for a 3.13 cpufreq regression related to sysem resume that
triggers on some systems with multiple CPU clusters from Viresh
Kumar.
- Fix for a 3.4 regression in request_firmware() resulting in
WARN_ON()s on some systems during system resume from Takashi Iwai.
- Revert of the ACPI video commit that changed the default value of
the video.brightness_switch_enabled command line argument to 0 as
it has been reported to break existing setups.
- ACPI device enumeration documentation update to take recent code
changes into account and make the documentation match the code
again from Darren Hart.
- Fixes for the sa1110, imx6q, kirkwood, and cpu0 cpufreq drivers
from Linus Walleij, Nicolas Del Piano, Quentin Armitage, Viresh
Kumar.
- New ACPI video blacklist entry for HP ProBook 4540s from Hans de
Goede"
* tag 'pm+acpi-3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: make table sentinel macros unsigned to match use
cpufreq: move policy kobj to policy->cpu at resume
cpufreq: cpu0: OPPs can be populated at runtime
cpufreq: kirkwood: Reinstate cpufreq driver for ARCH_KIRKWOOD
cpufreq: imx6q: Select PM_OPP
cpufreq: sa1110: set memory type for h3600
ACPI / video: Add use_native_backlight quirk for HP ProBook 4540s
PM / sleep: fix freeze_ops NULL pointer dereferences
PM / sleep: Fix request_firmware() error at resume
Revert "ACPI / video: change acpi-video brightness_switch_enabled default to 0"
ACPI / documentation: Remove reference to acpi_platform_device_ids from enumeration.txt
Linus Torvalds [Sat, 19 Jul 2014 06:27:23 +0000 (20:27 -1000)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"One nouveau deadlock fix, one qxl irq handling fix, and a set of
radeon pageflipping changes that fix regressions in pageflipping since
-rc1 along with a leak and backlight fix.
The pageflipping fixes are a bit bigger than I'd like, but there has
been a few people focused on testing them"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: Make classic pageflip completion path less racy.
drm/radeon: Add missing vblank_put in pageflip ioctl error path.
drm/radeon: Remove redundant fence unref in pageflip path.
drm/radeon: Complete page flip even if waiting on the BO fence fails
drm/radeon: Move pinning the BO back to radeon_crtc_page_flip()
drm/radeon: Prevent too early kms-pageflips triggered by vblank.
drm/radeon: set default bl level to something reasonable
drm/radeon: avoid leaking edid data
drm/qxl: return IRQ_NONE if it was not our irq
drm/nouveau/therm: fix a potential deadlock in the therm monitoring code
Linus Torvalds [Sat, 19 Jul 2014 06:26:46 +0000 (20:26 -1000)]
Merge tag 'random_for_linus_stable' of git://git./linux/kernel/git/tytso/random
Pull /dev/random fix from Ted Ts'o:
"Fix a BUG splat found by trinity"
* tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: check for increase of entropy_count because of signed conversion
Linus Torvalds [Sat, 19 Jul 2014 06:25:54 +0000 (20:25 -1000)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This push fixes a boot hang in virt guests when the virtio RNG is
enabled"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
hwrng: virtio - ensure reads happen after successful probe
hwrng: fetch randomness only after device init
Hannes Frederic Sowa [Fri, 18 Jul 2014 21:26:41 +0000 (17:26 -0400)]
random: check for increase of entropy_count because of signed conversion
The expression entropy_count -= ibytes << (ENTROPY_SHIFT + 3) could
actually increase entropy_count if during assignment of the unsigned
expression on the RHS (mind the -=) we reduce the value modulo
2^width(int) and assign it to entropy_count. Trinity found this.
[ Commit modified by tytso to add an additional safety check for a
negative entropy_count -- which should never happen, and to also add
an additional paranoia check to prevent overly large count values to
be passed into urandom_read(). ]
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Tomasz Figa [Tue, 15 Jul 2014 17:59:18 +0000 (02:59 +0900)]
ARM: EXYNOS: Fix core ID used by platsmp and hotplug code
When CPU topology is specified in device tree, cpu_logical_map() does
not return core ID anymore, but rather full MPIDR value. This breaks
existing calculation of PMU register offsets on Exynos SoCs.
This patch fixes the problem by adjusting the code to use only core ID
bits of the value returned by cpu_logical_map() to allow CPU topology to
be specified in device tree on Exynos SoCs.
Signed-off-by: Tomasz Figa <t.figa@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Fri, 18 Jul 2014 21:40:17 +0000 (14:40 -0700)]
Merge tag 'imx-fixes-3.16-2' of git://git./linux/kernel/git/shawnguo/linux into fixes
Merge "ARM: imx: fixes for 3.16, 2nd take" from Shawn Guo:
The i.MX fixes for 3.16, 2nd take:
It fixes a hard machine hang regression for boards where only pcie is
active but no sata, as the latest imx6-pcie driver is no longer enabling
the upstream clock directly but only lvds clk out.
* tag 'imx-fixes-3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: clk-imx6q: parent lvds_sel input from upstream clock gates
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Fri, 18 Jul 2014 21:39:18 +0000 (14:39 -0700)]
Merge tag 'at91-fixes' of git://github.com/at91linux/linux-at91 into fixes
Merge "at91: fixes for 3.16 #2" from Nicolas Ferre:
Second AT91 fixes series for 3.16
- fix clock definitions after the move to CCF for:
* at91sam9n12 (ohci)
* at91sam9x5 (ohci, pwm)
* tag 'at91-fixes' of git://github.com/at91linux/linux-at91:
ARM: at91/dt: add missing clocks property to pwm node in sam9x5.dtsi
ARM: at91/dt: fix usb0 clocks definition in sam9n12 dtsi
ARM: at91: at91sam9x5: correct typo error for ohci clock
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Fri, 18 Jul 2014 21:38:28 +0000 (14:38 -0700)]
Merge tag 'mvebu-fixes-3.16-3' of git://git.infradead.org/linux-mvebu into fixes
Merge "mvebu fixes for v3.16 (round 3)" from Jason Cooper:
- Fix SMP boot on 38x/375 in big endian
- Fix operand list for pmsu on 370/XP
- Fix coherency bus notifiers
* tag 'mvebu-fixes-3.16-3' of git://git.infradead.org/linux-mvebu:
ARM: mvebu: Fix coherency bus notifiers by using separate notifiers
ARM: mvebu: Fix the operand list in the inline asm of armada_370_xp_pmsu_idle_enter
ARM: mvebu: fix SMP boot for Armada 38x and Armada 375 Z1 in big endian
Signed-off-by: Olof Johansson <olof@lixom.net>
Linus Torvalds [Fri, 18 Jul 2014 16:26:04 +0000 (06:26 -1000)]
Merge tag 'gfs2-fixes' of git://git./linux/kernel/git/steve/gfs2-3.0-fixes
Pull gfs2 fixes from Steven Whitehouse:
"This patch set contains two minor docs/spelling fixes, some fixes for
flock, a change to use GFP_NOFS to avoid recursion on a rarely used
code path and a fix for a race relating to the glock lru"
* tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixes
GFS2: memcontrol: Spelling s/invlidate/invalidate/
GFS2: Allow caching of glocks for flock
GFS2: Allow flocks to use normal glock dq rather than dq_wait
GFS2: replace count*size kzalloc by kcalloc
GFS2: Use GFP_NOFS when allocating glocks
GFS2: Fix race in glock lru glock disposal
GFS2: Only wait for demote when last holder is dequeued
Linus Torvalds [Fri, 18 Jul 2014 16:25:05 +0000 (06:25 -1000)]
Merge tag 'dm-3.16-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"Fix the dm-thinp and dm-cache targets to disallow changing the data
device's block size"
* tag 'dm-3.16-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache metadata: do not allow the data block size to change
dm thin metadata: do not allow the data block size to change
Linus Torvalds [Fri, 18 Jul 2014 16:23:34 +0000 (06:23 -1000)]
Merge tag 'upstream-3.16-rc6' of git://git.infradead.org/linux-ubifs
Pull UBI fixes from Artem Bityutskiy:
"Two UBI fastmap-related fixes for v3.16:
- fix UBI fastmap support which we broke in 3.16-rc1 by reversing the
volumes RB-tree sorting criteria.
- make sure that we scrub all PEBs where we see bit-flips - we were
missing some of them when the fastmap feature was enabled"
* tag 'upstream-3.16-rc6' of git://git.infradead.org/linux-ubifs:
UBI: fastmap: do not miss bit-flips
UBI: fix the volumes tree sorting criteria
Linus Torvalds [Fri, 18 Jul 2014 16:21:43 +0000 (06:21 -1000)]
Merge tag 'xfs-for-linus-3.16-rc5' of git://oss.sgi.com/xfs/xfs
Pull xfs fixes from Dave Chinner:
"Fixes for low memory perforamnce regressions and a quota inode
handling regression.
These are regression fixes for issues recently introduced - the change
in the stack switch location is fairly important, so I've held off
sending this update until I was sure that it still addresses the stack
usage problem the original solved. So while the commits in the xfs
tree are recent, it has been under tested for several weeks now"
* tag 'xfs-for-linus-3.16-rc5' of git://oss.sgi.com/xfs/xfs:
xfs: null unused quota inodes when quota is on
xfs: refine the allocation stack switch
Revert "xfs: block allocation work needs to be kswapd aware"
Boris BREZILLON [Thu, 17 Jul 2014 19:03:58 +0000 (21:03 +0200)]
ARM: at91/dt: add missing clocks property to pwm node in sam9x5.dtsi
The pwm driver requires a clocks property referencing the pwm peripheral
clk.
Signed-off-by: Boris BREZILLON <boris.brezillon@free-electrons.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Boris BREZILLON [Mon, 14 Jul 2014 06:39:27 +0000 (08:39 +0200)]
ARM: at91/dt: fix usb0 clocks definition in sam9n12 dtsi
udphs_clk (USB Device Controller clock) is referenced instead of
uhphs_clk (USB Host Controller clock).
Signed-off-by: Boris BREZILLON <boris.brezillon@free-electrons.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Bo Shen [Mon, 14 Jul 2014 03:08:14 +0000 (11:08 +0800)]
ARM: at91: at91sam9x5: correct typo error for ohci clock
Correct the typo error for the second "uhphs_clk".
Signed-off-by: Bo Shen <voice.shen@atmel.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tomasz Figa [Thu, 17 Jul 2014 15:23:44 +0000 (17:23 +0200)]
irqchip: gic: Fix core ID calculation when topology is read from DT
Certain GIC implementation, namely those found on earlier, single
cluster, Exynos SoCs, have registers mapped without per-CPU banking,
which means that the driver needs to use different offset for each CPU.
Currently the driver calculates the offset by multiplying value returned
by cpu_logical_map() by CPU offset parsed from DT. This is correct when
CPU topology is not specified in DT and aforementioned function returns
core ID alone. However when DT contains CPU topology, the function
changes to return cluster ID as well, which is non-zero on mentioned
SoCs and so breaks the calculation in GIC driver.
This patch fixes this by masking out cluster ID in CPU offset
calculation so that only core ID is considered. Multi-cluster Exynos
SoCs already have banked GIC implementations, so this simple fix should
be enough.
Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reported-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Tomasz Figa <t.figa@samsung.com>
Fixes: db0d4db22a78d ("ARM: gic: allow GIC to support non-banked setups")
Cc: <stable@vger.kernel.org> # v3.3+
Link: https://lkml.kernel.org/r/1405610624-18722-1-git-send-email-t.figa@samsung.com
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Fabian Frederick [Wed, 2 Jul 2014 20:05:27 +0000 (22:05 +0200)]
GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixes
Cc: cluster-devel@redhat.com
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Geert Uytterhoeven [Sun, 29 Jun 2014 10:21:39 +0000 (12:21 +0200)]
GFS2: memcontrol: Spelling s/invlidate/invalidate/
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: cluster-devel@redhat.com
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Thu, 26 Jun 2014 14:47:48 +0000 (10:47 -0400)]
GFS2: Allow caching of glocks for flock
This patch removes the GLF_NOCACHE flag from the glocks associated with
flocks. There should be no good reason not to cache glocks for flocks:
they only force the glock to be demoted before they can be reacquired,
which can slow down performance and even cause glock hangs, especially
in cases where the flocks are held in Shared (SH) mode.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Thu, 26 Jun 2014 14:46:25 +0000 (10:46 -0400)]
GFS2: Allow flocks to use normal glock dq rather than dq_wait
This patch allows flock glocks to use a non-blocking dequeue rather
than dq_wait. It also reverts the previous patch I had posted regarding
dq_wait. The reverted patch isn't necessarily a bad idea, but I decided
this might avoid unforeseen side effects, and was therefore safer.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fabian Frederick [Wed, 25 Jun 2014 18:40:45 +0000 (20:40 +0200)]
GFS2: replace count*size kzalloc by kcalloc
kcalloc manages count*sizeof overflow.
Cc: cluster-devel@redhat.com
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Mon, 23 Jun 2014 13:50:20 +0000 (14:50 +0100)]
GFS2: Use GFP_NOFS when allocating glocks
Normally GFP_KERNEL is ok here, but there is now a rarely used code path
relating to deallocation of unlinked inodes (in certain corner cases)
which if hit at times of memory shortage can cause recursion while
trying to free memory.
One solution would be to try and move the gfs2_glock_get() call so
that it is no longer called while another glock is held, but that
doesn't look at all easy, so GFP_NOFS is the best solution for the
time being.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Mon, 23 Jun 2014 13:43:32 +0000 (14:43 +0100)]
GFS2: Fix race in glock lru glock disposal
We must not leave items on the LRU list with GLF_LOCK set, since
they can be removed if the glock is brought back into use, which
may then potentially result in a hang, waiting for GLF_LOCK to
clear.
It doesn't happen very often, since it requires a glock that has
not been used for a long time to be brought back into use at the
same moment that the shrinker is part way through disposing of
glocks.
The fix is to set GLF_LOCK at a later time, when we already know
that the other locks can be obtained. Also, we now only release
the lru_lock in case a resched is needed, rather than on every
iteration.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Fri, 20 Jun 2014 13:36:41 +0000 (09:36 -0400)]
GFS2: Only wait for demote when last holder is dequeued
Function gfs2_glock_dq_wait is supposed to dequeue a glock and then
wait for the lock to be demoted. The problem is, if this is a shared
lock, its demote will depend on the other holders, which means you
might end up waiting forever because the other process is blocked.
This problem is especially apparent when dealing with nested flocks.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Lucas Stach [Thu, 17 Jul 2014 10:20:14 +0000 (12:20 +0200)]
ARM: clk-imx6q: parent lvds_sel input from upstream clock gates
The i.MX6 reference manual doesn't make a clear distinction
between the fixed clock divider and the enable gate for the
pcie and sata reference clocks. This lead to the lvds mux
inputs in the imx6q clk driver to be parented from the
ref clock (which is the divider) instead of the actual gate,
which in turn prevents the upstream clock to actually be
enabled when lvds clk out is active.
This fixes a hard machine hang regression in kernel 3.16 for
boards where only pcie is active but no sata, as with this
kernel version the imx6-pcie driver is no longer enabling
the upstream clock directly but only lvds clk out.
Reported-by: Arne Ruhnau <arne.ruhnau@target-sg.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Arne Ruhnau <arne.ruhnau@target-sg.com>
Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Dexuan Cui [Wed, 16 Jul 2014 07:00:45 +0000 (00:00 -0700)]
Drivers: hv: hv_fcopy: fix a race condition for SMP guest
We should schedule the 5s "timer work" before starting the data transfer,
otherwise, the data transfer code may finish so fast on another
virtual cpu that when the code(fcopy_write()) trying to cancel the 5s
"timer work" can occasionally fail because the "timer work" may haven't
been scheduled yet and as a result the fcopy process will be aborted
wrongly by fcopy_work_func() in 5s.
Thank Liz Zhang <lizzha@microsoft.com> for the initial investigation
on the bug.
This addresses https://bugzilla.redhat.com/show_bug.cgi?id=
1118123
Tested-by: Liz Zhang <lizzha@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>