firefly-linux-kernel-4.4.55.git
12 years agosvcrpc: support multiple-fragment rpc's
J. Bruce Fields [Mon, 3 Dec 2012 20:50:38 +0000 (15:50 -0500)]
svcrpc: support multiple-fragment rpc's

Over TCP, RPC's are preceded by a single 4-byte field telling you how
long the rpc is (in bytes).  The spec also allows you to send an RPC in
multiple such records (the high bit of the length field is used to tell
you whether this is the final record).

We've survived for years without supporting this because in practice the
clients we care about don't use it.  But the userland rpc libraries do,
and every now and then an experimental client will run into this.  (Most
recently I noticed it while trying to write a pynfs check.)  And we're
really on the wrong side of the spec here--let's fix this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agosvcrpc: track rpc data length separately from sk_tcplen
J. Bruce Fields [Mon, 3 Dec 2012 21:45:35 +0000 (16:45 -0500)]
svcrpc: track rpc data length separately from sk_tcplen

Keep a separate field, sk_datalen, that tracks only the data contained
in a fragment, not including the fragment header.

For now, this is always just max(0, sk_tcplen - 4), but after we allow
multiple fragments sk_datalen will accumulate the total rpc data size
while sk_tcplen only tracks progress receiving the current fragment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agosvcrpc: fix off-by-4 error in "incomplete TCP record" dprintk
J. Bruce Fields [Mon, 3 Dec 2012 21:35:35 +0000 (16:35 -0500)]
svcrpc: fix off-by-4 error in "incomplete TCP record" dprintk

The full reclen doesn't include the fragment header, but sk_tcplen does.
Fix this to make it an apples-to-apples comparison.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agosvcrpc: delay minimum-rpc-size check till later
J. Bruce Fields [Mon, 3 Dec 2012 21:30:42 +0000 (16:30 -0500)]
svcrpc: delay minimum-rpc-size check till later

Soon we want to support multiple fragments, in which case it may be
legal for a single fragment to be smaller than 8 bytes, so we'll want to
delay this check till we've reached the last fragment.

Also fix an outdated comment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agosvcrpc: don't byte-swap sk_reclen in place
J. Bruce Fields [Mon, 3 Dec 2012 21:11:13 +0000 (16:11 -0500)]
svcrpc: don't byte-swap sk_reclen in place

Byte-swapping in place is always a little dubious.

Let's instead define this field to always be big-endian, and do the
swapping on demand where we need it.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoNFSD: Forget state for a specific client
Bryan Schumaker [Thu, 29 Nov 2012 16:40:46 +0000 (11:40 -0500)]
NFSD: Forget state for a specific client

Write the client's ip address to any state file and all appropriate
state for that client will be forgotten.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoNFSD: Add a custom file operations structure for fault injection
Bryan Schumaker [Thu, 29 Nov 2012 16:40:45 +0000 (11:40 -0500)]
NFSD: Add a custom file operations structure for fault injection

Controlling the read and write functions allows me to add in "forget
client w.x.y.z", since we won't be limited to reading and writing only
u64 values.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoNFSD: Reading a fault injection file prints a state count
Bryan Schumaker [Thu, 29 Nov 2012 16:40:44 +0000 (11:40 -0500)]
NFSD: Reading a fault injection file prints a state count

I also log basic information that I can figure out about the type of
state (such as number of locks for each client IP address).  This can be
useful for checking that state was actually dropped and later for
checking if the client was able to recover.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoNFSD: Fault injection operations take a per-client forget function
Bryan Schumaker [Thu, 29 Nov 2012 16:40:43 +0000 (11:40 -0500)]
NFSD: Fault injection operations take a per-client forget function

The eventual goal is to forget state based on ip address, so it makes
sense to call this function in a for-each-client loop until the correct
amount of state is forgotten.  I also use this patch as an opportunity
to rename the forget function from "func()" to "forget()".

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoNFSD: Clean up forgetting and recalling delegations
Bryan Schumaker [Thu, 29 Nov 2012 16:40:42 +0000 (11:40 -0500)]
NFSD: Clean up forgetting and recalling delegations

Once I have a client, I can easily use its delegation list rather than
searching the file hash table for delegations to remove.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoNFSD: Clean up forgetting openowners
Bryan Schumaker [Thu, 29 Nov 2012 16:40:41 +0000 (11:40 -0500)]
NFSD: Clean up forgetting openowners

Using "forget_n_state()" forces me to implement the code needed to
forget a specific client's openowners.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoNFSD: Clean up forgetting locks
Bryan Schumaker [Thu, 29 Nov 2012 16:40:40 +0000 (11:40 -0500)]
NFSD: Clean up forgetting locks

I use the new "forget_n_state()" function to iterate through each client
first when searching for locks.  This may slow down forgetting locks a
little bit, but it implements most of the code needed to forget a
specified client's locks.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoNFSD: Clean up forgetting clients
Bryan Schumaker [Thu, 29 Nov 2012 16:40:39 +0000 (11:40 -0500)]
NFSD: Clean up forgetting clients

I added in a generic for-each loop that takes a pass over the client_lru
list for the current net namespace and calls some function.  The next few
patches will update other operations to use this function as well.  A value
of 0 still means "forget everything that is found".

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoNFSD: Lock state before calling fault injection function
Bryan Schumaker [Thu, 29 Nov 2012 16:40:38 +0000 (11:40 -0500)]
NFSD: Lock state before calling fault injection function

Each function touches state in some way, so getting the lock earlier
can help simplify code.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: discard some unused nfsd4_verify xdr code
J. Bruce Fields [Fri, 30 Nov 2012 22:24:18 +0000 (17:24 -0500)]
nfsd4: discard some unused nfsd4_verify xdr code

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoNFSD: Fold fault_inject.h into state.h
Bryan Schumaker [Tue, 27 Nov 2012 14:35:10 +0000 (09:35 -0500)]
NFSD: Fold fault_inject.h into state.h

There were only a small number of functions in this file and since they
all affect stored state I think it makes sense to put them in state.h
instead.  I also dropped most static inline declarations since there are
no callers when fault injection is not enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make NFSv4 grace time per net
Stanislav Kinsbursky [Tue, 27 Nov 2012 11:11:49 +0000 (14:11 +0300)]
nfsd: make NFSv4 grace time per net

Grace time is a part of NFSv4 state engine, which is constructed per network
namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make NFSv4 lease time per net
Stanislav Kinsbursky [Tue, 27 Nov 2012 11:11:44 +0000 (14:11 +0300)]
nfsd: make NFSv4 lease time per net

Lease time is a part of NFSv4 state engine, which is constructed per network
namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: remove redundant declarations
Stanislav Kinsbursky [Tue, 27 Nov 2012 11:42:20 +0000 (14:42 +0300)]
nfsd: remove redundant declarations

This is a cleanup patch. Functions nfsd_pool_stats_open() and
nfsd_pool_stats_release() are declared in fs/nfsd/nfsd.h.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: recovery - make in_grace per net
Stanislav Kinsbursky [Mon, 26 Nov 2012 13:16:30 +0000 (16:16 +0300)]
nfsd: recovery - make in_grace per net

Flag in_grace is a part of client tracking state, which is network namesapce
aware. So let'a replace global static variable with per-net one.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: recovery - make rec_file per net
Stanislav Kinsbursky [Mon, 26 Nov 2012 13:16:25 +0000 (16:16 +0300)]
nfsd: recovery - make rec_file per net

Opening and closing of this file is done in client tracking init and exit
operations.
Client tracking is done in network namespace context already. So let's make
this file opened and closed per network context - this will simlify it's
management.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: call state init and shutdown twice
Stanislav Kinsbursky [Mon, 26 Nov 2012 12:22:18 +0000 (15:22 +0300)]
nfsd: call state init and shutdown twice

Split NFSv4 state init and shutdown into two different calls: per-net one and
generic one.
Per-net cwinit/shutdown pair have to be called for any namespace, generic pair
- only once on NSFd kthreads start and shutdown respectively.

Refresh of diff-nfsd-call-state-init-twice

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: cleanup NFSd state start a bit
Stanislav Kinsbursky [Mon, 26 Nov 2012 12:22:13 +0000 (15:22 +0300)]
nfsd: cleanup NFSd state start a bit

This patch renames nfs4_state_start_net() into nfs4_state_create_net(), where
get_net() now performed.
Also it introduces new nfs4_state_start_net(), which is now responsible for
state creation and initializing all per-net data and which is now called from
nfs4_state_start().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: cleanup NFSd state shutdown a bit
Stanislav Kinsbursky [Mon, 26 Nov 2012 12:22:08 +0000 (15:22 +0300)]
nfsd: cleanup NFSd state shutdown a bit

This patch renames __nfs4_state_shutdown_net() into nfs4_state_shutdown_net(),
__nfs4_state_shutdown() into nfs4_state_shutdown_net() and moves all network
related shutdown operations to nfs4_state_shutdown_net().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make delegations shutdown network namespace aware
Stanislav Kinsbursky [Mon, 26 Nov 2012 12:22:03 +0000 (15:22 +0300)]
nfsd: make delegations shutdown network namespace aware

NFSv4 delegations are stored in global list. But they are nfs4_client
dependent, which is network namespace aware already.
State shutdown and laundromat are done per network namespace as well.
So, delegations unhash have to be done in network namespace context.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make client_lock per net
Stanislav Kinsbursky [Mon, 26 Nov 2012 12:21:58 +0000 (15:21 +0300)]
nfsd: make client_lock per net

This lock protects the client lru list and session hash table, which are
allocated per network namespace already.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: remove state lock from nfs4_state_shutdown
Stanislav Kinsbursky [Wed, 21 Nov 2012 15:07:38 +0000 (18:07 +0300)]
nfsd4: remove state lock from nfs4_state_shutdown

Protection of __nfs4_state_shutdown() with nfs4_lock_state() looks redundant.

This function is called by the last NFSd thread on it's exit and state lock
protects actually two functions (del_recall_lru is protected by recall_lock):
1) nfsd4_client_tracking_exit
2) __nfs4_state_shutdown_net

"nfsd4_client_tracking_exit" doesn't require state lock protection, because it's
state can be modified only by tracker callbacks.
Here a re they:
1) create: is called only from nfsd4_proc_compound.
2) remove: is called from either nfsd4_proc_compound or nfs4_laundromat.
3) check: is called only from nfsd4_proc_compound.
4) grace_done; called only from nfs4_laundromat.

nfsd4_proc_compound is called onll by NFSd kthread, which is exiting right
now.
nfs4_laundromat is called by laundry_wq. But laundromat_work was canceled
already.

"__nfs4_state_shutdown_net" also doesn't require state lock protection,
because all NFSd kthreads are dead, and no race can happen with NFSd start,
because "nfsd_up" flag is still set.
Moreover, all Nfsd shutdown is protected with global nfsd_mutex.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: remove state lock from nfsd4_load_reboot_recovery_data
J. Bruce Fields [Fri, 16 Nov 2012 16:45:12 +0000 (11:45 -0500)]
nfsd4: remove state lock from nfsd4_load_reboot_recovery_data

That function is only called under nfsd_mutex: we know that because the
only caller is nfsd_svc, via

        nfsd_svc
          nfsd_startup
            nfs4_state_start
              nfsd4_client_tracking_init
                client_tracking_ops->init == nfsd4_load_reboot_recovery_data

The shared state accessed here includes:

        - user_recovery_dirname: used here, modified only by
          nfs4_reset_recoverydir, which can be verified to only be
          called under nfsd_mutex.
        - filesystem state, protected by i_mutex (handwaving slightly
  here)
        - rec_file, reclaim_str_hashtbl, reclaim_str_hashtbl_size: other
          than here, used only from code called from nfsd or laundromat
          threads, both of which should be started only after this runs
          (see nfsd_svc) and stopped before this could run again (see
          nfsd_shutdown, called from nfsd_last_thread).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: return badname, not inval, on "." or "..", or "/"
J. Bruce Fields [Sun, 25 Nov 2012 21:31:00 +0000 (16:31 -0500)]
nfsd4: return badname, not inval, on "." or "..", or "/"

The spec requires badname, not inval, in these cases.

Some callers want us to return enoent, but I can see no justification
for that.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: downgrade some fs/nfsd/nfs4state.c BUG's
J. Bruce Fields [Sun, 25 Nov 2012 19:48:10 +0000 (14:48 -0500)]
nfsd4: downgrade some fs/nfsd/nfs4state.c BUG's

Linus has pointed out that indiscriminate use of BUG's can make it
harder to diagnose bugs because they can bring a machine down, often
before we manage to get any useful debugging information to the logs.
(Consider, for example, a BUG() that fires in a workqueue, or while
holding a spinlock).

Most of these BUG's won't do much more than kill an nfsd thread, but it
would still probably be safer to get out the warning without dying.

There's still more of this to do in nfsd/.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: delay filling in write iovec array till after xdr decoding
J. Bruce Fields [Thu, 15 Nov 2012 19:52:19 +0000 (14:52 -0500)]
nfsd4: delay filling in write iovec array till after xdr decoding

Our server rejects compounds containing more than one write operation.
It's unclear whether this is really permitted by the spec; with 4.0,
it's possibly OK, with 4.1 (which has clearer limits on compound
parameters), it's probably not OK.  No client that we're aware of has
ever done this, but in theory it could be useful.

The source of the limitation: we need an array of iovecs to pass to the
write operation.  In the worst case that array of iovecs could have
hundreds of elements (the maximum rwsize divided by the page size), so
it's too big to put on the stack, or in each compound op.  So we instead
keep a single such array in the compound argument.

We fill in that array at the time we decode the xdr operation.

But we decode every op in the compound before executing any of them.  So
once we've used that array we can't decode another write.

If we instead delay filling in that array till the time we actually
perform the write, we can reuse it.

Another option might be to switch to decoding compound ops one at a
time.  I considered doing that, but it has a number of other side
effects, and I'd rather fix just this one problem for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: move more write parameters into xdr argument
J. Bruce Fields [Fri, 16 Nov 2012 19:16:46 +0000 (14:16 -0500)]
nfsd4: move more write parameters into xdr argument

In preparation for moving some of this elsewhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: reorganize write decoding
J. Bruce Fields [Fri, 16 Nov 2012 15:01:30 +0000 (10:01 -0500)]
nfsd4: reorganize write decoding

In preparation for moving some of it elsewhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: simplify reading of opnum
J. Bruce Fields [Sat, 17 Nov 2012 03:28:38 +0000 (22:28 -0500)]
nfsd4: simplify reading of opnum

The comment here is totally bogus:
- OP_WRITE + 1 is RELEASE_LOCKOWNER.  Maybe there was some older
  version of the spec in which that served as a sort of
  OP_ILLEGAL?  No idea, but it's clearly wrong now.
- In any case, I can't see that the spec says anything about
  what to do if the client sends us less ops than promised.
  It's clearly nutty client behavior, and we should do
  whatever's easiest: returning an xdr error (even though it
  won't be consistent with the error on the last op returned)
  seems fine to me.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: no, we're not going to check tags for utf8
J. Bruce Fields [Sat, 17 Nov 2012 02:53:58 +0000 (21:53 -0500)]
nfsd4: no, we're not going to check tags for utf8

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: fix v4 reply caching
J. Bruce Fields [Fri, 16 Nov 2012 20:22:43 +0000 (15:22 -0500)]
nfsd: fix v4 reply caching

Very embarassing: 1091006c5eb15cba56785bd5b498a8d0b9546903 "nfsd: turn
on reply cache for NFSv4" missed a line, effectively leaving the reply
cache off in the v4 case.  I thought I'd tested that, but I guess not.

This time, wrote a pynfs test to confirm it works.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make laundromat network namespace aware
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:22:17 +0000 (18:22 +0300)]
nfsd: make laundromat network namespace aware

This patch moves laundromat_work to nfsd per-net context, thus allowing to run
multiple laundries.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: pass nfsd_net instead of net to grace enders
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:22:12 +0000 (18:22 +0300)]
nfsd: pass nfsd_net instead of net to grace enders

Passing net context looks as overkill.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: use service net instead of hard-coded init_net
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:22:07 +0000 (18:22 +0300)]
nfsd: use service net instead of hard-coded init_net

This patch replaces init_net by SVC_NET(), where possible and also passes
proper context to nested functions where required.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make close_lru list per net
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:22:01 +0000 (18:22 +0300)]
nfsd: make close_lru list per net

This list holds nfs4 clients (open) stateowner queue for last close replay,
which are network namespace aware. So let's make this list per network
namespace too.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make client_lru list per net
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:21:56 +0000 (18:21 +0300)]
nfsd: make client_lru list per net

This list holds nfs4 clients queue for lease renewal, which are network
namespace aware. So let's make this list per network namespace too.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make sessionid_hashtbl allocated per net
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:21:51 +0000 (18:21 +0300)]
nfsd: make sessionid_hashtbl allocated per net

This hash holds established sessions state and closely associated with
nfs4_clients info, which are network namespace aware. So let's make it
allocated per network namespace too.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make lockowner_ino_hashtbl allocated per net
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:21:46 +0000 (18:21 +0300)]
nfsd: make lockowner_ino_hashtbl allocated per net

This hash holds file lock owners and closely associated with nfs4_clients info,
which are network namespace aware. So let's make it allocated per network
namespace too.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make ownerstr_hashtbl allocated per net
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:21:41 +0000 (18:21 +0300)]
nfsd: make ownerstr_hashtbl allocated per net

This hash holds open owner state and closely associated with nfs4_clients
info, which are network namespace aware. So let's make it allocated per
network namespace too.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make unconf_name_tree per net
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:21:36 +0000 (18:21 +0300)]
nfsd: make unconf_name_tree per net

This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make unconf_id_hashtbl allocated per net
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:21:31 +0000 (18:21 +0300)]
nfsd: make unconf_id_hashtbl allocated per net

This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make conf_name_tree per net
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:21:26 +0000 (18:21 +0300)]
nfsd: make conf_name_tree per net

This tree holds nfs4_clients info, which are network namespace aware.
So let's make it per network namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make conf_id_hashtbl allocated per net
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:21:21 +0000 (18:21 +0300)]
nfsd: make conf_id_hashtbl allocated per net

This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make reclaim_str_hashtbl allocated per net
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:21:16 +0000 (18:21 +0300)]
nfsd: make reclaim_str_hashtbl allocated per net

This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Note: this hash is used only by legacy tracker. So let's allocate hash in
tracker init.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make nfs4_client network namespace dependent
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:21:10 +0000 (18:21 +0300)]
nfsd: make nfs4_client network namespace dependent

And use it's net where possible.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: use service net instead of hard-coded net where possible
Stanislav Kinsbursky [Wed, 14 Nov 2012 15:21:05 +0000 (18:21 +0300)]
nfsd: use service net instead of hard-coded net where possible

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agosvcrpc: Revert "sunrpc/cache.h: replace simple_strtoul"
J. Bruce Fields [Wed, 14 Nov 2012 15:48:05 +0000 (10:48 -0500)]
svcrpc: Revert "sunrpc/cache.h: replace simple_strtoul"

Commit bbf43dc888833ac0539e437dbaeb28bfd4fbab9f "sunrpc/cache.h: replace
simple_strtoul" introduced new range-checking which could cause get_int
to fail on unsigned integers too large to be represented as an int.

We could parse them as unsigned instead--but it turns out svcgssd is
actually passing down "-1" in some cases.  Which is perhaps stupid, but
there's nothing we can do about it now.

So just revert back to the previous "sloppy" behavior that accepts
either representation.

Cc: stable@vger.kernel.org
Reported-by: Sven Geggus <lists@fuchsschwanzdomain.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: get_backchannel_cred should be static
Fengguang Wu [Tue, 13 Nov 2012 20:41:27 +0000 (15:41 -0500)]
nfsd4: get_backchannel_cred should be static

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: init_session should be declared static
Fengguang Wu [Sat, 10 Nov 2012 12:20:25 +0000 (07:20 -0500)]
nfsd4: init_session should be declared static

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: release the legacy reclaimable clients list in grace_done
Jeff Layton [Mon, 12 Nov 2012 20:00:58 +0000 (15:00 -0500)]
nfsd: release the legacy reclaimable clients list in grace_done

The current code holds on to this list until nfsd is shut down, but it's
never touched once the grace period ends. Release that memory back into
the wild when the grace period ends.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: get rid of cl_recdir field
Jeff Layton [Mon, 12 Nov 2012 20:00:57 +0000 (15:00 -0500)]
nfsd: get rid of cl_recdir field

Remove the cl_recdir field from the nfs4_client struct. Instead, just
compute it on the fly when and if it's needed, which is now only when
the legacy client tracking code is in effect.

The error handling in the legacy client tracker is also changed to
handle the case where md5 is unavailable. In that case, we'll warn
the admin with a KERN_ERR message and disable the client tracking.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: move the confirmed and unconfirmed hlists to a rbtree
Jeff Layton [Mon, 12 Nov 2012 20:00:56 +0000 (15:00 -0500)]
nfsd: move the confirmed and unconfirmed hlists to a rbtree

The current code requires that we md5 hash the name in order to store
the client in the confirmed and unconfirmed trees. Change it instead
to store the clients in a pair of rbtrees, and simply compare the
cl_names directly instead of hashing them. This also necessitates that
we add a new flag to the clp->cl_flags field to indicate which tree
the client is currently in.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: don't search for client by hash on legacy reboot recovery gracedone
Jeff Layton [Mon, 12 Nov 2012 20:00:55 +0000 (15:00 -0500)]
nfsd: don't search for client by hash on legacy reboot recovery gracedone

When nfsd starts, the legacy reboot recovery code creates a tracking
struct for each directory in the v4recoverydir. When the grace period
ends, it basically does a "readdir" on the directory again, and matches
each dentry in there to an existing client id to see if it should be
removed or not. If the matching client doesn't exist, or hasn't
reclaimed its state then it will remove that dentry.

This is pretty inefficient since it involves doing a lot of hash-bucket
searching. It also means that we have to keep relying on being able to
search for a nfs4_client by md5 hashed cl_recdir name.

Instead, add a pointer to the nfs4_client that indicates the association
between the nfs4_client_reclaim and nfs4_client. When a reclaim operation
comes in, we set the pointer to make that association. On gracedone, the
legacy client tracker will keep the recdir around iff:

1/ there is a reclaim record for the directory

...and...

2/ there's an association between the reclaim record and a client record
-- that is, a create or check operation was performed on the client that
matches that directory.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: make nfs4_client_to_reclaim return a pointer to the reclaim record
Jeff Layton [Mon, 12 Nov 2012 20:00:54 +0000 (15:00 -0500)]
nfsd: make nfs4_client_to_reclaim return a pointer to the reclaim record

Later callers will need to make changes to the record.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: break out reclaim record removal into separate function
Jeff Layton [Mon, 12 Nov 2012 20:00:53 +0000 (15:00 -0500)]
nfsd: break out reclaim record removal into separate function

We'll need to be able to call this from nfs4recover.c eventually.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: have nfsd4_find_reclaim_client take a char * argument
Jeff Layton [Mon, 12 Nov 2012 20:00:52 +0000 (15:00 -0500)]
nfsd: have nfsd4_find_reclaim_client take a char * argument

Currently, it takes a client pointer, but later we're going to need to
search for these records without knowing whether a matching client even
exists.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: warn about impending removal of nfsdcld upcall
Jeff Layton [Mon, 12 Nov 2012 20:00:51 +0000 (15:00 -0500)]
nfsd: warn about impending removal of nfsdcld upcall

Let's shoot for removing the nfsdcld upcall in 3.10. Most likely,
no one is actually using it so I don't expect this warning to
fire often (except maybe on misconfigured systems).

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: pass info about the legacy recoverydir in environment variables
Jeff Layton [Mon, 12 Nov 2012 20:00:50 +0000 (15:00 -0500)]
nfsd: pass info about the legacy recoverydir in environment variables

The usermodehelper upcall program can then decide to use this info as
a (one-way) transition mechanism to the new scheme. When a "check"
upcall occurs and the client doesn't exist in the database, we can
look to see whether the directory exists. If it does, then we'd add
the client to the database, remove the legacy recdir, and return
success to the kernel to allow the recovery to proceed.

For gracedone, we simply pass the v4recovery "topdir" so that the
upcall can clean it out prior to returning to the kernel.

A module parm is also added to disable the legacy conversion if
the admin chooses.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: change heuristic for selecting the client_tracking_ops
Jeff Layton [Mon, 12 Nov 2012 20:00:49 +0000 (15:00 -0500)]
nfsd: change heuristic for selecting the client_tracking_ops

First, try to use the new usermodehelper upcall. It should succeed or
fail quickly, so there's little cost to doing so.

If it fails, and the legacy tracking dir exists, use that. If it
doesn't exist then fall back to using nfsdcld.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: add a usermodehelper upcall for NFSv4 client ID tracking
Jeff Layton [Mon, 12 Nov 2012 20:00:48 +0000 (15:00 -0500)]
nfsd: add a usermodehelper upcall for NFSv4 client ID tracking

Add a new client tracker upcall type that uses call_usermodehelper to
call out to a program. This seems to be the preferred method of
calling out to usermode these days for seldom-called upcalls. It's
simple and doesn't require a running daemon, so it should "just work"
as long as the binary is installed.

The client tracking exit operation is also changed to check for a
NULL pointer before running. The UMH upcall doesn't need to do anything
at module teardown time.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: remove unused argument to nfs4_has_reclaimed_state
Jeff Layton [Fri, 9 Nov 2012 20:06:38 +0000 (15:06 -0500)]
nfsd: remove unused argument to nfs4_has_reclaimed_state

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: fix error handling in nfsd4_remove_clid_dir
Jeff Layton [Fri, 9 Nov 2012 20:31:53 +0000 (15:31 -0500)]
nfsd: fix error handling in nfsd4_remove_clid_dir

If the credential save fails, then we'll leak our mnt_want_write_file
reference.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: update documentation on 4.1 progress
J. Bruce Fields [Thu, 8 Nov 2012 00:41:51 +0000 (19:41 -0500)]
nfsd4: update documentation on 4.1 progress

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: backchannel should use client-provided security flavor
J. Bruce Fields [Mon, 5 Nov 2012 21:01:48 +0000 (16:01 -0500)]
nfsd4: backchannel should use client-provided security flavor

For now this only adds support for AUTH_NULL.  (Previously we assumed
AUTH_UNIX.)  We'll also need AUTH_GSS, which is trickier.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: common helper to initialize callback work
J. Bruce Fields [Mon, 5 Nov 2012 20:10:26 +0000 (15:10 -0500)]
nfsd4: common helper to initialize callback work

I've found it confusing having the only references to
nfsd4_do_callback_rpc() in a different file.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: implement backchannel_ctl operation
J. Bruce Fields [Thu, 1 Nov 2012 22:09:48 +0000 (18:09 -0400)]
nfsd4: implement backchannel_ctl operation

This operation is mandatory for servers to implement.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: use callback security parameters in create_session
J. Bruce Fields [Thu, 1 Nov 2012 20:31:02 +0000 (16:31 -0400)]
nfsd4: use callback security parameters in create_session

We're currently ignoring the callback security parameters specified in
create_session, and just assuming the client wants auth_sys, because
that's all the current linux client happens to care about.  But this
could cause us callbacks to fail to a client that wanted something
different.

For now, all we're doing is no longer ignoring the uid and gid passed in
the auth_sys case.  Further patches will add support for auth_null and
gss (and possibly use more of the auth_sys information; the spec wants
us to use exactly the credential we're passed, though it's hard to
imagine why a client would care).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: clean up callback security parsing
J. Bruce Fields [Tue, 27 Mar 2012 18:50:26 +0000 (14:50 -0400)]
nfsd4: clean up callback security parsing

Move the callback parsing into a separate function.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: use vfs_fsync_range(), not O_SYNC, for stable writes
J. Bruce Fields [Fri, 26 Oct 2012 20:12:31 +0000 (16:12 -0400)]
nfsd: use vfs_fsync_range(), not O_SYNC, for stable writes

NFSv4 shares the same struct file across multiple writes.  (And we'd
like NFSv2 and NFSv3 to do that as well some day.)

So setting O_SYNC on the struct file as a way to request a synchronous
write doesn't work.

Instead, do a vfs_fsync_range() in that case.

Reported-by: Peter Staubach <pstaubach@exagrid.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd: assume writeable exportabled filesystems have f_sync
J. Bruce Fields [Fri, 26 Oct 2012 20:04:08 +0000 (16:04 -0400)]
nfsd: assume writeable exportabled filesystems have f_sync

I don't really see how you could claim to support nfsd and not support
fsync somehow.

And in practice a quick look through the exportable filesystems suggests
the only ones without an ->fsync are read-only (efs, isofs, squashfs) or
in-memory (shmem).

Also, performing a write and then returning an error if the sync fails
(as we would do here in the wgather case) seems unhelpful to clients.

Also remove an incorrect comment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: don't BUG in delegation break callback
J. Bruce Fields [Tue, 16 Oct 2012 16:39:33 +0000 (12:39 -0400)]
nfsd4: don't BUG in delegation break callback

These conditions would indeed indicate bugs in the code, but if we want
to hear about them we're likely better off warning and returning than
immediately dying while holding file_lock_lock.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agosvcrpc: demote some printks to a dprintk
J. Bruce Fields [Tue, 9 Oct 2012 22:33:38 +0000 (18:33 -0400)]
svcrpc: demote some printks to a dprintk

In general I'd rather random bad behavior on the network won't trigger a
printk.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: remove unused init_session return
J. Bruce Fields [Thu, 1 Nov 2012 20:54:01 +0000 (16:54 -0400)]
nfsd4: remove unused init_session return

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: helper function for getting mounted_on ino
J. Bruce Fields [Mon, 1 Oct 2012 21:50:56 +0000 (17:50 -0400)]
nfsd4: helper function for getting mounted_on ino

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfs: fix wrong object type in lockowner_slab
Yanchuan Nian [Wed, 24 Oct 2012 06:44:19 +0000 (14:44 +0800)]
nfs: fix wrong object type in lockowner_slab

The object type in the cache of lockowner_slab is wrong, and it is
better to fix it.

Cc: stable@vger.kernel.org
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agonfsd4: remove unused variable in nfsd4_delegreturn()
Wei Yongjun [Thu, 18 Oct 2012 14:44:21 +0000 (22:44 +0800)]
nfsd4: remove unused variable in nfsd4_delegreturn()

The variable inode is initialized but never used
otherwise, so remove the unused variable.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoexportfs: add FILEID_INVALID to indicate invalid fid_type
Namjae Jeon [Wed, 29 Aug 2012 14:10:10 +0000 (10:10 -0400)]
exportfs: add FILEID_INVALID to indicate invalid fid_type

This commit adds FILEID_INVALID = 0xff in fid_type to
indicate invalid fid_type

It avoids using magic number 255

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
12 years agoLinux 3.7-rc2
Linus Torvalds [Sat, 20 Oct 2012 19:11:32 +0000 (12:11 -0700)]
Linux 3.7-rc2

12 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas...
Linus Torvalds [Sat, 20 Oct 2012 16:48:10 +0000 (09:48 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/cmarinas/linux-aarch64

Pull arm64 fixes from Catalin Marinas:
 "Main changes:
   - AArch64 Linux compilation fixes following 3.7-rc1 changes
     (MODULES_USE_ELF_RELA, update_vsyscall() prototype)
   - Unnecessary register setting in start_thread() (thanks to Al Viro)
   - ptrace fixes"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
  arm64: fix alignment padding in assembly code
  arm64: ptrace: use HW_BREAKPOINT_EMPTY type for disabled breakpoints
  arm64: ptrace: make structure padding explicit for debug registers
  arm64: No need to set the x0-x2 registers in start_thread()
  arm64: Ignore memory blocks below PHYS_OFFSET
  arm64: Fix the update_vsyscall() prototype
  arm64: Select MODULES_USE_ELF_RELA
  arm64: Remove duplicate inclusion of mmu_context.h in smp.c

12 years agoarm64: fix alignment padding in assembly code
Marc Zyngier [Fri, 19 Oct 2012 16:33:27 +0000 (17:33 +0100)]
arm64: fix alignment padding in assembly code

An interesting effect of using the generic version of linkage.h
is that the padding is defined in terms of x86 NOPs, which can have
even more interesting effects when the assembly code looks like this:

ENTRY(func1)
mov x0, xzr
ENDPROC(func1)
// fall through
ENTRY(func2)
mov x0, #1
ret
ENDPROC(func2)

Admittedly, the code is not very nice. But having code from another
architecture doesn't look completely sane either.

The fix is to add arm64's version of linkage.h, which causes the insertion
of proper AArch64 NOPs.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
12 years agouse clamp_t in UNAME26 fix
Kees Cook [Sat, 20 Oct 2012 01:45:53 +0000 (18:45 -0700)]
use clamp_t in UNAME26 fix

The min/max call needed to have explicit types on some architectures
(e.g. mn10300). Use clamp_t instead to avoid the warning:

  kernel/sys.c: In function 'override_release':
  kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks a cast [enabled by default]

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 20 Oct 2012 01:39:36 +0000 (18:39 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Assorted small fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf python: Properly link with libtraceevent
  perf hists browser: Add back callchain folding symbol
  perf tools: Fix build on sparc.
  perf python: Link with libtraceevent
  perf python: Initialize 'page_size' variable
  tools lib traceevent: Fix missed freeing of subargs in free_arg() in filter
  lib tools traceevent: Add back pevent assignment in __pevent_parse_format()
  perf hists browser: Fix off-by-two bug on the first column
  perf tools: Remove warnings on JIT samples for srcline sort key
  perf tools: Fix segfault when using srcline sort key
  perf: Require exclude_guest to use PEBS - kernel side enforcement
  perf tool: Precise mode requires exclude_guest

12 years agoperf python: Properly link with libtraceevent
Arnaldo Carvalho de Melo [Thu, 18 Oct 2012 14:38:35 +0000 (11:38 -0300)]
perf python: Properly link with libtraceevent

Namhyung Kim reported that the build fails with:

  GEN python/perf.so
  gcc: error: python_ext_build/tmp//../../libtraceevent.a: No such file or directory
  error: command 'gcc' failed with exit status 1
  cp: cannot stat `python_ext_build/lib/perf.so': No such file or directory
  make: *** [python/perf.so] Error 1

We need to propagate the TE_PATH variable to the setup.py file.

Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-8umiPbm4sxpknKivbjgykhut@git.kernel.org
[ Fixed superfluous variable build error. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agoMerge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git...
Ingo Molnar [Sat, 20 Oct 2012 00:32:56 +0000 (02:32 +0200)]
Merge tag 'perf-urgent-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

* The python binding needs to link with libtraceevent and to initialize
  the 'page_size' variable so that mmaping works again.

* The callchain folding character that appears on the TUI just before
  the overhead had disappeared due to recent changes, add it back.

* Intel PEBS in VT-x context uses the DS address as a guest linear address,
  even though its programmed by the host as a host linear address. This either
  results in guest memory corruption and or the hardware faulting and 'crashing'
  the virtual machine.  Therefore we have to disable PEBS on VT-x enter and
  re-enable on VT-x exit, enforcing a strict exclude_guest.

  Kernel side enforcement fix by Peter Zijlstra, tooling side fix by David Ahern.

* Fix build on sparc due to UAPI, fix from David Miller.

* Fixes for the srclike sort key for unresolved symbols and when processing
  samples in JITted code, where we don't have an ELF file, just an special
  symbol table, fixes from Namhyung Kim.

* Fix some leaks in libtraceevent, from Steven Rostedt.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agoMerge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Sat, 20 Oct 2012 00:32:37 +0000 (17:32 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM soc fixes from Olof Johansson:
 "A set of fixes and some minor cleanups for -rc2:

   - A series from Arnd that fixes warnings in drivers and other code
     included by ARM defconfigs.  Most have been acked by corresponding
     maintainers (and seem quite hard to argue not picking up anyway in
     the few exception cases).
   - A few misc patches from the list for integrator/vt8500/i.MX
   - A batch of fixes to OMAP platforms, fixing:
     - boot problems on beaglebone,
     - regression fixes for local timers
     - clockdomain locking fixes
     - a few boot/sparse warnings
   - For Tegra:
     - Clock rate calculation overflow fix
     - Revert a change that removed timer clocks and a fix for symbol
       name clashes
   - For Renesas:
     - IO accessor / annotation cleanups to remove warnings
   - For Kirkwood/Dove/mvebu:
     - Fixes for device trees for Dove (some minor cleanups, some fixes)
     - Fixes for the mvebu gpio driver
     - Fix build problem for Feroceon due to missing ifdefs
     - Fix lsxl DTS files"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (31 commits)
  ARM: kirkwood: fix buttons on lsxl boards
  ARM: kirkwood: fix LEDs names for lsxl boards
  ARM: Kirkwood: fix disabling CACHE_FEROCEON_L2
  gpio: mvebu: Add missing breaks in mvebu_gpio_irq_set_type
  ARM: dove: Add crypto engine to DT
  ARM: dove: Remove watchdog from DT
  ARM: dove: Restructure SoC device tree descriptor
  ARM: dove: Fix clock names of sata and gbe
  ARM: dove: Fix tauros2 device tree init
  ARM: dove: Add pcie clock support
  ARM: OMAP2+: Allow kernel to boot even if GPMC fails to reserve memory
  ARM: OMAP: clockdomain: Fix locking on _clkdm_clk_hwmod_enable / disable
  ARM: s3c: mark s3c2440_clk_add as __init_refok
  spi/s3c64xx: use correct dma_transfer_direction type
  ARM: OMAP4: devices: fixup OMAP4 DMIC platform device error message
  ARM: OMAP2+: clock data: Add dev-id for the omap-gpmc dummy fck
  ARM: OMAP: resolve sparse warning concerning debug_card_init()
  ARM: OMAP4: Fix twd_local_timer_register regression
  ARM: tegra: add tegra_timer clock
  ARM: tegra: rename tegra system timer
  ...

12 years agoMODSIGN: Move the magic string to the end of a module and eliminate the search
David Howells [Sat, 20 Oct 2012 00:19:29 +0000 (01:19 +0100)]
MODSIGN: Move the magic string to the end of a module and eliminate the search

Emit the magic string that indicates a module has a signature after the
signature data instead of before it.  This allows module_sig_check() to
be made simpler and faster by the elimination of the search for the
magic string.  Instead we just need to do a single memcmp().

This works because at the end of the signature data there is the
fixed-length signature information block.  This block then falls
immediately prior to the magic number.

From the contents of the information block, it is trivial to calculate
the size of the signature data and thus the size of the actual module
data.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge tag 'kirkwood_fixes_for_v3.7' of git://git.infradead.org/users/jcooper/linux...
Olof Johansson [Fri, 19 Oct 2012 23:17:51 +0000 (16:17 -0700)]
Merge tag 'kirkwood_fixes_for_v3.7' of git://git.infradead.org/users/jcooper/linux into fixes

From Jason Cooper:
 - improve #ifdef logic to prevent linker errors with CACHE_FEROCEON_L2
 - lsxl board dts fixes

* tag 'kirkwood_fixes_for_v3.7' of git://git.infradead.org/users/jcooper/linux:
  ARM: kirkwood: fix buttons on lsxl boards
  ARM: kirkwood: fix LEDs names for lsxl boards
  ARM: Kirkwood: fix disabling CACHE_FEROCEON_L2

12 years agoMODSIGN: Cleanup .gitignore
David Howells [Fri, 19 Oct 2012 22:56:45 +0000 (23:56 +0100)]
MODSIGN: Cleanup .gitignore

The module build process no longer creates intermediate files for module
signing, so remove them from .gitignore.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMODSIGN: perlify sign-file and merge in x509keyid
David Howells [Fri, 19 Oct 2012 22:56:37 +0000 (23:56 +0100)]
MODSIGN: perlify sign-file and merge in x509keyid

Turn sign-file into perl and merge in x509keyid.  The latter doesn't
need to be a separate script as it doesn't actually need to work out the
SHA1 sum of the X.509 certificate itself, since it can get that from the
X.509 certificate.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge branch 'testing/driver-warnings' of git://git.kernel.org/pub/scm/linux/kernel...
Olof Johansson [Fri, 19 Oct 2012 22:40:18 +0000 (15:40 -0700)]
Merge branch 'testing/driver-warnings' of git://git./linux/kernel/git/arm/arm-soc into fixes

A collection of warning fixes on non-ARM code from Arnd Bergmann:

* 'testing/driver-warnings' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: s3c: mark s3c2440_clk_add as __init_refok
  spi/s3c64xx: use correct dma_transfer_direction type
  pcmcia: sharpsl: don't discard sharpsl_pcmcia_ops
  USB: EHCI: mark ehci_orion_conf_mbus_windows __devinit
  mm/slob: use min_t() to compare ARCH_SLAB_MINALIGN
  SCSI: ARM: make fas216_dumpinfo function conditional
  SCSI: ARM: ncr5380/oak uses no interrupts

12 years agohold task->mempolicy while numa_maps scans.
KAMEZAWA Hiroyuki [Fri, 19 Oct 2012 08:00:55 +0000 (17:00 +0900)]
hold task->mempolicy while numa_maps scans.

  /proc/<pid>/numa_maps scans vma and show mempolicy under
  mmap_sem. It sometimes accesses task->mempolicy which can
  be freed without mmap_sem and numa_maps can show some
  garbage while scanning.

This patch tries to take reference count of task->mempolicy at reading
numa_maps before calling get_vma_policy(). By this, task->mempolicy
will not be freed until numa_maps reaches its end.

V2->v3
  -  updated comments to be more verbose.
  -  removed task_lock() in numa_maps code.
V1->V2
  -  access task->mempolicy only once and remember it.  Becase kernel/exit.c
     can overwrite it.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Linus Torvalds [Fri, 19 Oct 2012 21:15:16 +0000 (14:15 -0700)]
Merge branch 'x86/urgent' of git://git./linux/kernel/git/tip/tip

Pull miscellaneous x86 fixes from Peter Anvin:
 "The biggest ones are fixing suspend/resume breakage on 32 bits, and an
  interrim fix for mapping over holes that allows AMD kit with more than
  1 TB.

  A final solution for the latter is in the works, but involves some
  fairly invasive changes that will probably mean it will only be
  appropriate for 3.8."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, MCE: Remove bios_cmci_threshold sysfs attribute
  x86, amd, mce: Avoid NULL pointer reference on CPU northbridge lookup
  x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  x86/cache_info: Use ARRAY_SIZE() in amd_l3_attrs()
  x86/reboot: Remove quirk entry for SBC FITPC
  x86, suspend: Correct the restore of CR4, EFER; skip computing EFLAGS.ID

12 years agoMerge branch 'akpm' (Fixes from Andrew)
Linus Torvalds [Fri, 19 Oct 2012 21:07:55 +0000 (14:07 -0700)]
Merge branch 'akpm' (Fixes from Andrew)

Merge misc fixes from Andrew Morton:
 "Seven fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (7 patches)
  lib/dma-debug.c: fix __hash_bucket_find()
  mm: compaction: correct the nr_strict va isolated check for CMA
  firmware/memmap: avoid type conflicts with the generic memmap_init()
  pidns: remove recursion from free_pid_ns()
  drivers/video/backlight/lm3639_bl.c: return proper error in lm3639_bled_mode_store() error paths
  kernel/sys.c: fix stack memory content leak via UNAME26
  linux/coredump.h needs asm/siginfo.h

12 years agolib/dma-debug.c: fix __hash_bucket_find()
Ming Lei [Fri, 19 Oct 2012 20:57:01 +0000 (13:57 -0700)]
lib/dma-debug.c: fix __hash_bucket_find()

If there is only one match, the unique matched entry should be returned.

Without the fix, the upcoming dma debug interfaces ("dma-debug: new
interfaces to debug dma mapping errors") can't work reliably because
only device and dma_addr are passed to dma_mapping_error().

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Tested-by: Shuah Khan <shuah.khan@hp.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm: compaction: correct the nr_strict va isolated check for CMA
Mel Gorman [Fri, 19 Oct 2012 20:56:57 +0000 (13:56 -0700)]
mm: compaction: correct the nr_strict va isolated check for CMA

Thierry reported that the "iron out" patch for isolate_freepages_block()
had problems due to the strict check being too strict with "mm:
compaction: Iron out isolate_freepages_block() and
isolate_freepages_range() -fix1".  It's possible that more pages than
necessary are isolated but the check still fails and I missed that this
fix was not picked up before RC1.  This same problem has been identified
in 3.7-RC1 by Tony Prisk and should be addressed by the following patch.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Tony Prisk <linux@prisktech.co.nz>
Reported-by: Thierry Reding <thierry.reding@avionic-design.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Richard Davies <richard@arachsys.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>