firefly-linux-kernel-4.4.55.git
14 years agoceph: make mds requests killable, not interruptible
Sage Weil [Mon, 24 May 2010 18:15:51 +0000 (11:15 -0700)]
ceph: make mds requests killable, not interruptible

The underlying problem is that many mds requests can't be restarted.  For
example, a restarted create() would return -EEXIST if the original request
succeeds.  However, we do not want a hung MDS to hang the client too.  So,
use the _killable wait_for_completion variants to abort on SIGKILL but
nothing else.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agosched: add wait_for_completion_killable_timeout
Sage Weil [Sat, 29 May 2010 16:12:30 +0000 (09:12 -0700)]
sched: add wait_for_completion_killable_timeout

Add missing _killable_timeout variant for wait_for_completion that will
return when a timeout expires or the task is killed.

CC: Ingo Molnar <mingo@elte.hu>
CC: Andreas Herrmann <andreas.herrmann3@amd.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: reuse mon subscribe message instead of allocated anew
Sage Weil [Fri, 21 May 2010 21:57:25 +0000 (14:57 -0700)]
ceph: reuse mon subscribe message instead of allocated anew

Use the same message, allocated during startup.  No need to reallocate a
new one each time around (and potentially ENOMEM).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: avoid resending queued message to monitor
Sage Weil [Fri, 21 May 2010 19:31:49 +0000 (12:31 -0700)]
ceph: avoid resending queued message to monitor

The auth_reply handler will (re)send any pending requests.  For the
initial mon authenticate phase, that's correct, but when a auth ticket
renewal races with an in-flight request, we may resend a request message
that is already in flight.  Avoid this by revoking the message before
sending it.

We should also avoid resending requests at all during ticket renewal; that
will come soon.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: Storage class should be before const qualifier
Tobias Klauser [Thu, 20 May 2010 08:40:19 +0000 (10:40 +0200)]
ceph: Storage class should be before const qualifier

The C99 specification states in section 6.11.5:

The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: all allocation functions should get gfp_mask
Yehuda Sadeh [Tue, 6 Apr 2010 21:33:58 +0000 (14:33 -0700)]
ceph: all allocation functions should get gfp_mask

This is essential, as for the rados block device we'll need
to run in different contexts that would need flags that
are other than GFP_NOFS.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: specify max_bytes on readdir replies
Sage Weil [Fri, 14 May 2010 20:06:30 +0000 (13:06 -0700)]
ceph: specify max_bytes on readdir replies

Specify max bytes in request to bound size of reply.  Add associated
mount option with default value of 512 KB.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: cleanup pool op strings
Sage Weil [Fri, 14 May 2010 18:36:48 +0000 (11:36 -0700)]
ceph: cleanup pool op strings

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: Use kzalloc
Julia Lawall [Thu, 13 May 2010 20:07:29 +0000 (22:07 +0200)]
ceph: Use kzalloc

Use kzalloc rather than the combination of kmalloc and memset.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,size,flags;
statement S;
@@

-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
 if (x == NULL) S
-memset(x, 0, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: use common helper for aborted dir request invalidation
Sage Weil [Fri, 14 May 2010 17:02:57 +0000 (10:02 -0700)]
ceph: use common helper for aborted dir request invalidation

We invalidate I_COMPLETE and dentry leases in two places: on aborted mds
request and on request replay.  Use common helper to avoid duplicate code.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: cope with out of order (unsafe after safe) mds reply
Sage Weil [Thu, 13 May 2010 16:06:02 +0000 (09:06 -0700)]
ceph: cope with out of order (unsafe after safe) mds reply

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: save peer feature bits in connection structure
Sage Weil [Wed, 12 May 2010 22:23:30 +0000 (15:23 -0700)]
ceph: save peer feature bits in connection structure

These are used for adjusting behavior, such as conditionally encoding a
newer message format.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: resync headers with userland
Sage Weil [Wed, 12 May 2010 21:48:20 +0000 (14:48 -0700)]
ceph: resync headers with userland

Notable changes include pool op defines and types, FLOCK feature bit, and
new CMPXATTR osd ops.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: use ceph. prefix for virtual xattrs
Sage Weil [Tue, 11 May 2010 18:40:25 +0000 (11:40 -0700)]
ceph: use ceph. prefix for virtual xattrs

Drop the 'user.' prefix and use just 'ceph.' for fs virtual xattrs.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: throw out dirty caps metadata, data on session teardown
Sage Weil [Mon, 10 May 2010 23:12:25 +0000 (16:12 -0700)]
ceph: throw out dirty caps metadata, data on session teardown

The remove_session_caps() helper is called when an MDS closes out our
session (either normally, or as a result of a failed reconnect), and when
we tear down state for umount.  If we remove the last cap, and there are
no cap migrations in progress, then there is little hope of us flushing
out that data to the mds (without heroic efforts to reconnect and flush).

So, to avoid leaving inodes pinned (due to dirty state) and crashing after
umount, throw out dirty caps state and unpin the inodes.  Print a warning
to the console so we know something was lost.

NOTE: Although we drop wrbuffer refs, we don't actually mark pages clean;
maybe a truncate should be queued?

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: attempt mds reconnect if mds closes our session
Sage Weil [Thu, 18 Mar 2010 20:59:12 +0000 (13:59 -0700)]
ceph: attempt mds reconnect if mds closes our session

Currently, if our session is closed (due to a timeout, or explicit close,
or whatever), we just sit there doing nothing unless/until the MDS
restarts, at which point we try to reconnect.

Change client to attempt an immediate reconnect if our session is closed.

Note that currently the MDS doesn't support this, and our attempt will
fail.  We'll get a session CLOSE, our caps and dirty cap state will be
dropped, and the client will be free to attempt to reconnect.  That's
clearly not as nice as a successful reconnect, but it at least allows us
to try to carry on, and in the future the MDS will support a reconnect
and we will fare better.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: clean up send_mds_reconnect interface
Sage Weil [Mon, 10 May 2010 23:31:25 +0000 (16:31 -0700)]
ceph: clean up send_mds_reconnect interface

Pass a ceph_mds_session, since the caller has it.

Remove the dead code for sending empty reconnects.  It used to be used
when the MDS contacted _us_ to solicit a reconnect, and we could reply
saying "go away, I have no session."  Now we only send reconnects based
on the mds map, and only when we do in fact have an open session.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: wait for mds OPEN reply to indicate reconnect success
Sage Weil [Thu, 18 Mar 2010 21:45:05 +0000 (14:45 -0700)]
ceph: wait for mds OPEN reply to indicate reconnect success

We used to infer reconnect success by watching the MDS state, essentially
assuming that hearing nothing meant things were ok.  That wasn't
particularly reliable.  Instead, the MDS replies with an explicit OPEN
message to indicate success.

Strictly speaking, this is a protocol change, but it is a backwards
compatible one that does not break new clients + old servers or old
clients + new servers.  At least not yet.

Drop unused @all argument from kick_requests while we're at it.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: only send cap releases when mds is OPEN|HUNG
Sage Weil [Wed, 17 Mar 2010 23:30:21 +0000 (16:30 -0700)]
ceph: only send cap releases when mds is OPEN|HUNG

On OPENING we shouldn't have any caps (or releases).
On CLOSING, we should wait until we succeed (and throw it all out), or
don't (and are OPEN again).
On RECONNECTING we can wait until we are OPEN.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: dicard cap releases on mds restart
Sage Weil [Mon, 10 May 2010 22:36:44 +0000 (15:36 -0700)]
ceph: dicard cap releases on mds restart

If the MDS restarts, the expire caps state is no longer shared, and can be
thrown out.  Caps state will be rebuilt on the MDS during the reconnect
process that follows.  Zero out any release messages and adjust the
release counter accordingly.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: make mon client statfs handling more generic
Yehuda Sadeh [Thu, 22 Apr 2010 22:40:37 +0000 (15:40 -0700)]
ceph: make mon client statfs handling more generic

This is being done so that we could reuse the statfs
infrastructure with other requests that return values.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: drop src address(es) from message header [new protocol feature]
Sage Weil [Thu, 25 Mar 2010 22:45:38 +0000 (15:45 -0700)]
ceph: drop src address(es) from message header [new protocol feature]

The CEPH_FEATURE_NOSRCADDR protocol feature avoids putting the full source
address in each message header (twice).  This patch switches the client to
the new scheme, and _requires_ this feature on the server.  The server
will support both the old and new schemes.  That means an old client will
work with a new server, but a new client will not work with an old server.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: cleanup: remove unused assignement
Dan Carpenter [Fri, 7 May 2010 08:27:14 +0000 (10:27 +0200)]
ceph: cleanup: remove unused assignement

We don't ever use "dirty" so we can remove it.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: clean up cap release loop vs spinlock
Sage Weil [Wed, 5 May 2010 22:51:35 +0000 (15:51 -0700)]
ceph: clean up cap release loop vs spinlock

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: name bdi ceph-%d instead of major:minor
Sage Weil [Tue, 4 May 2010 23:39:35 +0000 (16:39 -0700)]
ceph: name bdi ceph-%d instead of major:minor

The bdi_setup_and_register() helper doesn't help us since we bdi_init() in
create_client() and bdi_register() only when sget() succeeds.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: skip mds sync on forced unmount
Sage Weil [Mon, 3 May 2010 22:22:00 +0000 (15:22 -0700)]
ceph: skip mds sync on forced unmount

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: adjust masked struct_v variable names
Sage Weil [Fri, 30 Apr 2010 19:45:02 +0000 (12:45 -0700)]
ceph: adjust masked struct_v variable names

Reported-by: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: clean up mount options, ->show_options()
Sage Weil [Thu, 29 Apr 2010 23:38:32 +0000 (16:38 -0700)]
ceph: clean up mount options, ->show_options()

Ensure all options are included in /proc/mounts.  Some cleanup.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: set dn offset when spliced
Sage Weil [Tue, 4 May 2010 05:08:02 +0000 (22:08 -0700)]
ceph: set dn offset when spliced

We want to assign an offset when the dentry goes from null to linked, which
is always done by splice_dentry().  Notably, we should NOT assign an
offset when a dentry is first created and is still null.

BUG if we try to splice a non-null dentry (we shouldn't).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: don't clobber i_max_offset on already complete dir
Sage Weil [Fri, 16 Apr 2010 19:58:02 +0000 (12:58 -0700)]
ceph: don't clobber i_max_offset on already complete dir

This can screw up offsets assigned to new dentries and break dcache
readdir results.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: skip set_dentry_offset work if directory not I_COMPLETE
Sage Weil [Thu, 15 Apr 2010 21:08:49 +0000 (14:08 -0700)]
ceph: skip set_dentry_offset work if directory not I_COMPLETE

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: set next_offset on readdir finish
Sage Weil [Tue, 4 May 2010 04:50:39 +0000 (21:50 -0700)]
ceph: set next_offset on readdir finish

Set next_offset to 2 (always 2!), not 0, on readdir finish.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: listxattr should compare version by >=
Henry C Chang [Thu, 29 Apr 2010 16:32:28 +0000 (09:32 -0700)]
ceph: listxattr should compare version by >=

If the version hasn't changed, don't rebuild the index.

Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: fix xattr dangling pointer / double free
Sage Weil [Thu, 29 Apr 2010 16:28:11 +0000 (09:28 -0700)]
ceph: fix xattr dangling pointer / double free

If we use the xattr_blob, clear the pointer so we don't release the memory
at the bottom of the fuction.

Reported-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: close messenger race
Sage Weil [Wed, 28 Apr 2010 20:51:50 +0000 (13:51 -0700)]
ceph: close messenger race

Simplify messenger locking, and close race between ceph_con_close() setting
the CLOSED bit and con_work() checking the bit, then taking the mutex.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: name msgpools; useful error messages
Sage Weil [Sat, 24 Apr 2010 16:56:35 +0000 (09:56 -0700)]
ceph: name msgpools; useful error messages

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: fix memory leak due to possible dentry init race
Sage Weil [Fri, 23 Apr 2010 18:36:54 +0000 (11:36 -0700)]
ceph: fix memory leak due to possible dentry init race

Free dentry_info in error path.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: include auth method in error messages
Sage Weil [Fri, 14 May 2010 16:55:18 +0000 (09:55 -0700)]
ceph: include auth method in error messages

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: osdtimeout=0 for now timeout
Sage Weil [Wed, 21 Apr 2010 18:09:38 +0000 (11:09 -0700)]
ceph: osdtimeout=0 for now timeout

Allow the osd reset timeout to be disabled.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: d_obtain_alias() returns ERR_PTR()
Dan Carpenter [Wed, 21 Apr 2010 10:31:13 +0000 (12:31 +0200)]
ceph: d_obtain_alias() returns ERR_PTR()

d_obtain_alias() doesn't return NULL, it returns an ERR_PTR().

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: wake up mount thread when getting osdmap
Yehuda Sadeh [Tue, 13 Apr 2010 18:34:26 +0000 (19:34 +0100)]
ceph: wake up mount thread when getting osdmap

Now that the mount thread waits for the osdmap, it needs
to be awaken.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
14 years agoceph: remove unused #includes
Huang Weiyi [Thu, 8 Apr 2010 11:48:57 +0000 (19:48 +0800)]
ceph: remove unused #includes

Remove unused #include's in
  fs/ceph/super.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: wait for both monmap and osdmap when opening session
Sage Weil [Wed, 7 Apr 2010 18:23:20 +0000 (11:23 -0700)]
ceph: wait for both monmap and osdmap when opening session

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
14 years agoceph: clean up connection reset
Sage Weil [Fri, 2 Apr 2010 23:16:34 +0000 (16:16 -0700)]
ceph: clean up connection reset

Reset out_keepalive_pending and peer_global_seq, and drop unused var.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: simplify ceph_msg_new
Sage Weil [Thu, 1 Apr 2010 23:07:23 +0000 (16:07 -0700)]
ceph: simplify ceph_msg_new

We only need to pass in front_len.  Callers can attach any other payload
pieces (middle, data) as they see fit.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: make ceph_msg_new return NULL on failure; clean up, fix callers
Sage Weil [Thu, 1 Apr 2010 23:06:19 +0000 (16:06 -0700)]
ceph: make ceph_msg_new return NULL on failure; clean up, fix callers

Returning ERR_PTR(-ENOMEM) is useless extra work.  Return NULL on failure
instead, and fix up the callers (about half of which were wrong anyway).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: rewrite msgpool using mempool_t
Sage Weil [Thu, 1 Apr 2010 22:23:14 +0000 (15:23 -0700)]
ceph: rewrite msgpool using mempool_t

Since we don't need to maintain large pools of messages, we can just
use the standard mempool_t.  We maintain a msgpool 'wrapper' because we
need the mempool_t* in the alloc function, and mempool gives us only
pool_data.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: use ceph_sb_to_client instead of ceph_client
Cheng Renquan [Fri, 26 Mar 2010 09:40:33 +0000 (17:40 +0800)]
ceph: use ceph_sb_to_client instead of ceph_client

ceph_sb_to_client and ceph_client are really identical, we need to dump
one; while function ceph_client is confusing with "struct ceph_client",
ceph_sb_to_client's definition is more clear; so we'd better switch all
call to ceph_sb_to_client.

  -static inline struct ceph_client *ceph_client(struct super_block *sb)
  -{
  - return sb->s_fs_info;
  -}

Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: handle kzalloc() failure
Cheng Renquan [Fri, 26 Mar 2010 10:04:40 +0000 (18:04 +0800)]
ceph: handle kzalloc() failure

Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: drop unnecessary msgpool for mon_client subscribe_ack
Sage Weil [Thu, 25 Mar 2010 04:52:30 +0000 (21:52 -0700)]
ceph: drop unnecessary msgpool for mon_client subscribe_ack

Preallocate a single message to reuse instead.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: drop unnecessary msgpool for mon_client auth_reply
Sage Weil [Thu, 25 Mar 2010 04:48:05 +0000 (21:48 -0700)]
ceph: drop unnecessary msgpool for mon_client auth_reply

Preallocate a single reply message that we can reuse instead.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: clean up statfs
Sage Weil [Thu, 25 Mar 2010 04:43:33 +0000 (21:43 -0700)]
ceph: clean up statfs

Avoid unnecessary msgpool.  Preallocate reply.  Fix use-after-free race.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: fix theoretically possible double-put on connection
Sage Weil [Thu, 25 Mar 2010 04:30:19 +0000 (21:30 -0700)]
ceph: fix theoretically possible double-put on connection

This would only trigger if we bailed out before resetting r_con_filling_msg
because the server reply was corrupt (oversized).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: cleanup: remove dead code
Dan Carpenter [Sat, 20 Mar 2010 13:01:27 +0000 (16:01 +0300)]
ceph: cleanup: remove dead code

"xattr" is never NULL here.  We took care of that in the previous
if statement block.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: reduce build_path debug output
Sage Weil [Thu, 18 Mar 2010 17:14:30 +0000 (10:14 -0700)]
ceph: reduce build_path debug output

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: use __page_cache_alloc and add_to_page_cache_lru
Yehuda Sadeh [Wed, 17 Mar 2010 20:54:02 +0000 (13:54 -0700)]
ceph: use __page_cache_alloc and add_to_page_cache_lru

Following Nick Piggin patches in btrfs, pagecache pages should be
allocated with __page_cache_alloc, so they obey pagecache memory
policies.

Also, using add_to_page_cache_lru instead of using a private
pagevec where applicable.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: update for removal of kref_set
Stephen Rothwell [Wed, 17 Mar 2010 15:53:04 +0000 (08:53 -0700)]
ceph: update for removal of kref_set

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: simplify page setup for incoming data
Sage Weil [Thu, 4 Mar 2010 18:22:59 +0000 (10:22 -0800)]
ceph: simplify page setup for incoming data

Drop largely useless helper __prepare_pages(), and simplify sanity checks.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: invalidate affected dentry leases on aborted requests
Sage Weil [Fri, 14 May 2010 16:35:38 +0000 (09:35 -0700)]
ceph: invalidate affected dentry leases on aborted requests

If we abort a request, we return to caller, but the request may still
complete.  And if we hold the dir FILE_EXCL bit, we may not release a
lease when sending a request.  A simple un-tar, control-c, un-tar again
will reproduce the bug (manifested as a 'Cannot open: File exists').

Ensure we invalidate affected dentry leases (as well dir I_COMPLETE) so
we don't have valid (but incorrect) leases.  Do the same, consistently, at
other sites where I_COMPLETE is similarly cleared.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: fix race between aborted requests and fill_trace
Sage Weil [Thu, 13 May 2010 19:01:13 +0000 (12:01 -0700)]
ceph: fix race between aborted requests and fill_trace

When we abort requests we need to prevent fill_trace et al from doing
anything that relies on locks held by the VFS caller.  This fixes a race
between the reply handler and the abort code, ensuring that continue
holding the dir mutex until the reply handler completes.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: clean up mds reply, error handling
Sage Weil [Thu, 13 May 2010 18:19:06 +0000 (11:19 -0700)]
ceph: clean up mds reply, error handling

We would occasionally BUG out in the reply handler because r_reply was
nonzero, due to a race with ceph_mdsc_do_request temporarily setting
r_reply to an ERR_PTR value.  This is unnecessary, messy, and also wrong
in the EIO case.

Clean up by consistently using r_err for errors and r_reply for messages.
Also fix the abort logic to trigger consistently for all errors that return
to the caller early (e.g., EIO from timeout case).  If an abort races with
a reply, use the result from the reply.

Also fix locking for r_err, r_reply update in the reply handler.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoLinus 2.6.34
Linus Torvalds [Sun, 16 May 2010 21:17:36 +0000 (14:17 -0700)]
Linus 2.6.34

14 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Sun, 16 May 2010 18:11:53 +0000 (11:11 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  rtnetlink: make SR-IOV VF interface symmetric
  sctp: delete active ICMP proto unreachable timer when free transport
  tcp: fix MD5 (RFC2385) support

14 years agoMerge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
Linus Torvalds [Sun, 16 May 2010 18:11:31 +0000 (11:11 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  MIPS: Oprofile: Fix Loongson irq handler
  MIPS: N32: Use compat version for sys_ppoll.
  MIPS FPU emulator: allow Cause bits of FCSR to be writeable by ctc1

14 years agortnetlink: make SR-IOV VF interface symmetric
Chris Wright [Sun, 16 May 2010 08:05:45 +0000 (01:05 -0700)]
rtnetlink: make SR-IOV VF interface symmetric

Now we have a set of nested attributes:

  IFLA_VFINFO_LIST (NESTED)
    IFLA_VF_INFO (NESTED)
      IFLA_VF_MAC
      IFLA_VF_VLAN
      IFLA_VF_TX_RATE

This allows a single set to operate on multiple attributes if desired.
Among other things, it means a dump can be replayed to set state.

The current interface has yet to be released, so this seems like
something to consider for 2.6.34.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosctp: delete active ICMP proto unreachable timer when free transport
Wei Yongjun [Sun, 9 May 2010 16:56:07 +0000 (16:56 +0000)]
sctp: delete active ICMP proto unreachable timer when free transport

transport may be free before ICMP proto unreachable timer expire, so
we should delete active ICMP proto unreachable timer when transport
is going away.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotcp: fix MD5 (RFC2385) support
Eric Dumazet [Sun, 16 May 2010 07:34:04 +0000 (00:34 -0700)]
tcp: fix MD5 (RFC2385) support

TCP MD5 support uses percpu data for temporary storage. It currently
disables preemption so that same storage cannot be reclaimed by another
thread on same cpu.

We also have to make sure a softirq handler wont try to use also same
context. Various bug reports demonstrated corruptions.

Fix is to disable preemption and BH.

Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years ago MIPS: Oprofile: Fix Loongson irq handler
Wu Zhangjin [Thu, 6 May 2010 16:59:46 +0000 (00:59 +0800)]
MIPS: Oprofile: Fix Loongson irq handler

    The interrupt enable bit for the performance counters is in the Control
    Register $24, not in the counter register.
    loongson2_perfcount_handler(), we need to use

Reported-by: Xu Hengyang <hengyang@mail.ustc.edu.cn>
Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
Cc: linux-mips@linux-mips.org
    Patchwork: http://patchwork.linux-mips.org/patch/1198/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
---

14 years ago MIPS: N32: Use compat version for sys_ppoll.
Chandrakala Chavva [Tue, 11 May 2010 00:11:54 +0000 (17:11 -0700)]
MIPS: N32: Use compat version for sys_ppoll.

    The sys_ppoll() takes struct 'struct timespec'. This is different for the
    N32 and N64 ABIs. Use the compat version to do the proper conversions.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
    To: linux-mips@linux-mips.org
    Patchwork: http://patchwork.linux-mips.org/patch/1210/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
---

14 years ago MIPS FPU emulator: allow Cause bits of FCSR to be writeable by ctc1
Shane McDonald [Fri, 7 May 2010 05:26:57 +0000 (23:26 -0600)]
MIPS FPU emulator: allow Cause bits of FCSR to be writeable by ctc1

    In the FPU emulator code of the MIPS, the Cause bits of the FCSR register
    are not currently writeable by the ctc1 instruction.  In odd corner cases,
    this can cause problems.  For example, a case existed where a divide-by-zero
    exception was generated by the FPU, and the signal handler attempted to
    restore the FPU registers to their state before the exception occurred.  In
    this particular setup, writing the old value to the FCSR register would
    cause another divide-by-zero exception to occur immediately.  The solution
    is to change the ctc1 instruction emulator code to allow the Cause bits of
    the FCSR register to be writeable.  This is the behaviour of the hardware
    that the code is emulating.

    This problem was found by Shane McDonald, but the credit for the fix goes
    to Kevin Kissell.  In Kevin's words:

    I submit that the bug is indeed in that ctc_op:  case of the emulator.  The
    Cause bits (17:12) are supposed to be writable by that instruction, but the
    CTC1 emulation won't let them be updated by the instruction.  I think that
    actually if you just completely removed lines 387-388 [...] things would
    work a good deal better.  At least, it would be a more accurate emulation of
    the architecturally defined FPU.  If I wanted to be really, really pedantic
    (which I sometimes do), I'd also protect the reserved bits that aren't
    necessarily writable.

Signed-off-by: Shane McDonald <mcdonald.shane@gmail.com>
    To: anemo@mba.ocn.ne.jp
    To: kevink@paralogos.com
    To: sshtylyov@mvista.com
    Patchwork: http://patchwork.linux-mips.org/patch/1205/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
---

14 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
Linus Torvalds [Sat, 15 May 2010 19:55:31 +0000 (12:55 -0700)]
Merge git://git./linux/kernel/git/mason/btrfs-unstable

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: check for read permission on src file in the clone ioctl

14 years agolib/btree: fix possible NULL pointer dereference
kirjanov@gmail.com [Sat, 15 May 2010 16:32:34 +0000 (12:32 -0400)]
lib/btree: fix possible NULL pointer dereference

mempool_alloc() can return null in atomic case.

Signed-off-by: Denis Kirjanov <kirjanov@gmail.com>
Cc: Joern Engel <joern@logfs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agommc: at91_mci: modify cache flush routines
Nicolas Ferre [Sat, 15 May 2010 16:32:31 +0000 (12:32 -0400)]
mmc: at91_mci: modify cache flush routines

As we were using an internal dma flushing routine, this patch changes to
the DMA API flush_kernel_dcache_page().  Driver is able to compile now.

[akpm@linux-foundation.org: flush_kernel_dcache_page() comes before kunmap_atomic()]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoBtrfs: check for read permission on src file in the clone ioctl
Dan Rosenberg [Sat, 15 May 2010 15:27:37 +0000 (11:27 -0400)]
Btrfs: check for read permission on src file in the clone ioctl

The existing code would have allowed you to clone a file that was
only open for writing

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
Linus Torvalds [Sat, 15 May 2010 16:03:15 +0000 (09:03 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  JFS: Free sbi memory in error path
  fs/sysv: dereferencing ERR_PTR()
  Fix double-free in logfs
  Fix the regression created by "set S_DEAD on unlink()..." commit

14 years agoMerge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 15 May 2010 16:03:02 +0000 (09:03 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf record: Add a fallback to the reference relocation symbol

14 years agoJFS: Free sbi memory in error path
Jan Blunck [Mon, 12 Apr 2010 23:44:08 +0000 (16:44 -0700)]
JFS: Free sbi memory in error path

I spotted the missing kfree() while removing the BKL.

[akpm@linux-foundation.org: avoid multiple returns so it doesn't happen again]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs/sysv: dereferencing ERR_PTR()
Dan Carpenter [Wed, 21 Apr 2010 10:30:32 +0000 (12:30 +0200)]
fs/sysv: dereferencing ERR_PTR()

I moved the dir_put_page() inside the if condition so we don't dereference
"page", if it's an ERR_PTR().

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agoFix double-free in logfs
Al Viro [Thu, 29 Apr 2010 00:57:02 +0000 (20:57 -0400)]
Fix double-free in logfs

iput() is needed *until* we'd done successful d_alloc_root()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agoFix the regression created by "set S_DEAD on unlink()..." commit
Al Viro [Fri, 30 Apr 2010 21:17:09 +0000 (17:17 -0400)]
Fix the regression created by "set S_DEAD on unlink()..." commit

1) i_flags simply doesn't work for mount/unlink race prevention;
we may have many links to file and rm on one of those obviously
shouldn't prevent bind on top of another later on.  To fix it
right way we need to mark _dentry_ as unsuitable for mounting
upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and
i_mutex on the inode in question.  Set it (with dont_mount(dentry))
in unlink/rmdir/etc., check (with cant_mount(dentry)) in places
in namespace.c that used to check for S_DEAD.  Setting S_DEAD
is still needed in places where we used to set it (for directories
getting killed), since we rely on it for readdir/rmdir race
prevention.

2) rename()/mount() protection has another bogosity - we unhash
the target before we'd checked that it's not a mountpoint.  Fixed.

3) ancient bogosity in pivot_root() - we locked i_mutex on the
right directory, but checked S_DEAD on the different (and wrong)
one.  Noticed and fixed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agoMerge master.kernel.org:/home/rmk/linux-2.6-arm
Linus Torvalds [Sat, 15 May 2010 04:28:42 +0000 (21:28 -0700)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm

* master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: 6126/1: ARM mpcore_wdt: fix build failure and other fixes
  ARM: 6125/1: ARM TWD: move TWD registers to common header
  ARM: 6110/1: Fix Thumb-2 kernel builds when UACCESS_WITH_MEMCPY is enabled
  ARM: 6112/1: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP
  ARM: 6111/1: Implement read/write for ownership in the ARMv6 DMA cache ops
  ARM: 6106/1: Implement copy_to_user_page() for noMMU
  ARM: 6105/1: Fix the __arm_ioremap_caller() definition in nommu.c

14 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 15 May 2010 04:28:23 +0000 (21:28 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mrst: Don't blindly access extended config space

14 years agoprofile: fix stats and data leakage
Hugh Dickins [Sat, 15 May 2010 02:44:10 +0000 (19:44 -0700)]
profile: fix stats and data leakage

If the kernel is large or the profiling step small, /proc/profile
leaks data and readprofile shows silly stats, until readprofile -r
has reset the buffer: clear the prof_buffer when it is vmalloc()ed.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agohughd: update email address
Hugh Dickins [Sat, 15 May 2010 02:40:35 +0000 (19:40 -0700)]
hughd: update email address

My old address will shut down in a couple of weeks: update the tree.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agox86, mrst: Don't blindly access extended config space
H. Peter Anvin [Fri, 14 May 2010 20:55:57 +0000 (13:55 -0700)]
x86, mrst: Don't blindly access extended config space

Do not blindly access extended configuration space unless we actively
know we're on a Moorestown platform.  The fixed-size BAR capability
lives in the extended configuration space, and thus is not applicable
if the configuration space isn't appropriately sized.

This fixes booting certain VMware configurations with CONFIG_MRST=y.

Moorestown will add a fake PCI-X 266 capability to advertise the
presence of extended configuration space.

Reported-and-tested-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Jacob Pan <jacob.jun.pan@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <AANLkTiltKUa3TrKR1M51eGw8FLNoQJSLT0k0_K5X3-OJ@mail.gmail.com>

14 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 14 May 2010 19:20:09 +0000 (12:20 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
  x86, k8: Fix build error when K8_NB is disabled
  x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs
  x86: Fix fake apicid to node mapping for numa emulation

14 years agox86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
Frank Arnold [Thu, 22 Apr 2010 14:06:59 +0000 (16:06 +0200)]
x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments

When running a quest kernel on xen we get:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
PGD 0
Oops: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 0, comm: swapper Tainted: G        W  2.6.34-rc3 #1 /HVM domU
RIP: 0010:[<ffffffff8142f2fb>]  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x
2ca/0x3df
RSP: 0018:ffff880002203e08  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060
RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38
R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0
R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68
FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020)
Stack:
 ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30
<0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70
<0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b
Call Trace:
 <IRQ>
 [<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c
 [<ffffffff81059140>] ? mod_timer+0x23/0x25
 [<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36
 [<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3
 [<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d
 [<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232
 [<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82
 [<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a
 [<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0
 [<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27
 [<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20
 <EOI>
 [<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63
 [<ffffffff810295c6>] ? native_safe_halt+0xc/0xd
 [<ffffffff810114eb>] ? default_idle+0x36/0x53
 [<ffffffff81008c22>] cpu_idle+0xaa/0xe4
 [<ffffffff81423a9a>] rest_init+0x7e/0x80
 [<ffffffff81b10dd2>] start_kernel+0x40e/0x419
 [<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7
 [<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107
Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b
 00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff
 ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb
RIP  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
 RSP <ffff880002203e08>
CR2: 0000000000000038
---[ end trace a7919e7f17c0a726 ]---

The L3 cache index disable feature of AMD CPUs has to be disabled if the
kernel is running as guest on top of a hypervisor because northbridge
devices are not available to the guest. Currently, this fixes a boot
crash on top of Xen. In the future this will become an issue on KVM as
well.

Check if northbridge devices are present and do not enable the feature
if there are none.

[ hpa: backported to 2.6.34 ]

Signed-off-by: Frank Arnold <frank.arnold@amd.com>
LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
14 years agox86, k8: Fix build error when K8_NB is disabled
Borislav Petkov [Sat, 24 Apr 2010 07:56:53 +0000 (09:56 +0200)]
x86, k8: Fix build error when K8_NB is disabled

K8_NB depends on PCI and when the last is disabled (allnoconfig) we fail
at the final linking stage due to missing exported num_k8_northbridges.
Add a header stub for that.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100503183036.GJ26107@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
14 years agoMerge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
Linus Torvalds [Fri, 14 May 2010 18:49:42 +0000 (11:49 -0700)]
Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify

* 'for-linus' of git://git.infradead.org/users/eparis/notify:
  inotify: don't leak user struct on inotify release
  inotify: race use after free/double free in inotify inode marks
  inotify: clean up the inotify_add_watch out path
  Inotify: undefined reference to `anon_inode_getfd'

Manual merge to remove duplicate "select ANON_INODES" from Kconfig file

14 years agoMerge branch 'davinci-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 14 May 2010 18:43:52 +0000 (11:43 -0700)]
Merge branch 'davinci-fixes-for-linus-2' of git://git./linux/kernel/git/khilman/linux-davinci

* 'davinci-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-davinci:
  DA830: fix USB 2.0 clock entry

14 years agoDA830: fix USB 2.0 clock entry
Sergei Shtylyov [Thu, 13 May 2010 18:51:51 +0000 (22:51 +0400)]
DA830: fix USB 2.0 clock entry

DA8xx OHCI driver fails to load due to failing clk_get() call for the USB 2.0
clock. Arrange matching USB 2.0 clock by the clock name instead of the device.
(Adding another CLK() entry for "ohci.0" device won't do -- in the future I'll
also have to enable USB 2.0 clock to configure CPPI 4.1 module, in which case
I won't have any device at all.)

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
14 years agoinotify: don't leak user struct on inotify release
Pavel Emelyanov [Wed, 12 May 2010 22:34:07 +0000 (15:34 -0700)]
inotify: don't leak user struct on inotify release

inotify_new_group() receives a get_uid-ed user_struct and saves the
reference on group->inotify_data.user.  The problem is that free_uid() is
never called on it.

Issue seem to be introduced by 63c882a0 (inotify: reimplement inotify
using fsnotify) after 2.6.30.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Eric Paris <eparis@parisplace.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
14 years agoinotify: race use after free/double free in inotify inode marks
Eric Paris [Tue, 11 May 2010 21:17:40 +0000 (17:17 -0400)]
inotify: race use after free/double free in inotify inode marks

There is a race in the inotify add/rm watch code.  A task can find and
remove a mark which doesn't have all of it's references.  This can
result in a use after free/double free situation.

Task A Task B
------------ -----------
inotify_new_watch()
 allocate a mark (refcnt == 1)
 add it to the idr
inotify_rm_watch()
 inotify_remove_from_idr()
  fsnotify_put_mark()
      refcnt hits 0, free
 take reference because we are on idr
 [at this point it is a use after free]
 [time goes on]
 refcnt may hit 0 again, double free

The fix is to take the reference BEFORE the object can be found in the
idr.

Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: <stable@kernel.org>
14 years agoinotify: clean up the inotify_add_watch out path
Eric Paris [Tue, 11 May 2010 21:16:23 +0000 (17:16 -0400)]
inotify: clean up the inotify_add_watch out path

inotify_add_watch explictly frees the unused inode mark, but it can just
use the generic code.  Just do that.

Signed-off-by: Eric Paris <eparis@redhat.com>
14 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Fri, 14 May 2010 14:56:45 +0000 (07:56 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  vhost: fix barrier pairing

14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Fri, 14 May 2010 14:55:42 +0000 (07:55 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  mmap_min_addr check CAP_SYS_RAWIO only for write

14 years agoMerge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze
Linus Torvalds [Fri, 14 May 2010 14:29:29 +0000 (07:29 -0700)]
Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze

* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Fix module loading on system with WB cache
  microblaze: export assembly functions used by modules
  microblaze: Remove powerpc code from Microblaze port
  microblaze: Remove compilation warnings in cache macro
  microblaze: export assembly functions used by modules
  microblaze: fix get_user/put_user side-effects
  microblaze: re-enable interrupts before calling schedule

14 years agoMerge branch 'net-2.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
David S. Miller [Fri, 14 May 2010 10:42:49 +0000 (03:42 -0700)]
Merge branch 'net-2.6' of git://git./linux/kernel/git/mst/vhost

14 years agommap_min_addr check CAP_SYS_RAWIO only for write
Kees Cook [Thu, 22 Apr 2010 19:19:17 +0000 (12:19 -0700)]
mmap_min_addr check CAP_SYS_RAWIO only for write

Redirecting directly to lsm, here's the patch discussed on lkml:
http://lkml.org/lkml/2010/4/22/219

The mmap_min_addr value is useful information for an admin to see without
being root ("is my system vulnerable to kernel NULL pointer attacks?") and
its setting is trivially easy for an attacker to determine by calling
mmap() in PAGE_SIZE increments starting at 0, so trying to keep it private
has no value.

Only require CAP_SYS_RAWIO if changing the value, not reading it.

Comment from Serge :

  Me, I like to write my passwords with light blue pen on dark blue
  paper, pasted on my window - if you're going to get my password, you're
  gonna get a headache.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
(cherry picked from commit 822cceec7248013821d655545ea45d1c6a9d15b3)

14 years agomicroblaze: Fix module loading on system with WB cache
Michal Simek [Fri, 14 May 2010 05:40:46 +0000 (07:40 +0200)]
microblaze: Fix module loading on system with WB cache

There is necessary to flush whole dcache. Icache work should be
done in kernel/module.c.

Signed-off-by: Michal Simek <monstr@monstr.eu>