firefly-linux-kernel-4.4.55.git
15 years agolockdep: generate the state bit definitions
Peter Zijlstra [Thu, 22 Jan 2009 13:38:38 +0000 (14:38 +0100)]
lockdep: generate the state bit definitions

Generate the state bit definitions from the lockdep_states.h file.

Also, move LOCK_USED to last, so that the

 USED_IN
 USED_IN_READ
 ENABLED
 ENABLED_READ

states are nicely bit aligned -- we're going to use that property

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agolockdep: move state bit definitions around
Peter Zijlstra [Thu, 22 Jan 2009 13:18:40 +0000 (14:18 +0100)]
lockdep: move state bit definitions around

For convenience later.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agolockdep: simplify mark_lock()
Peter Zijlstra [Thu, 22 Jan 2009 13:15:53 +0000 (14:15 +0100)]
lockdep: simplify mark_lock()

remove the state iteration

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agolockdep: simplify mark_held_locks
Peter Zijlstra [Thu, 22 Jan 2009 13:12:41 +0000 (14:12 +0100)]
lockdep: simplify mark_held_locks

remove the explicit state iteration

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agolockdep: lockdep_states.h
Peter Zijlstra [Thu, 22 Jan 2009 13:09:46 +0000 (14:09 +0100)]
lockdep: lockdep_states.h

Introduce a header file to generate all the states from.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agolockdep: sanitize reclaim bit names
Peter Zijlstra [Thu, 22 Jan 2009 12:13:11 +0000 (13:13 +0100)]
lockdep: sanitize reclaim bit names

s/HELD_OVER/ENABLED/g

so that its similar to the hard and soft-irq names.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agolockdep: sanitize bit names
Peter Zijlstra [Thu, 22 Jan 2009 12:10:52 +0000 (13:10 +0100)]
lockdep: sanitize bit names

s/\(LOCKF\?_ENABLED_[^ ]*\)S\(_READ\)\?\>/\1\2/g

So that the USED_IN and ENABLED have the same names.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agolockdep: annotate reclaim context (__GFP_NOFS)
Nick Piggin [Wed, 21 Jan 2009 07:12:39 +0000 (08:12 +0100)]
lockdep: annotate reclaim context (__GFP_NOFS)

Here is another version, with the incremental patch rolled up, and
added reclaim context annotation to kswapd, and allocation tracing
to slab allocators (which may only ever reach the page allocator
in rare cases, so it is good to put annotations here too).

Haven't tested this version as such, but it should be getting closer
to merge worthy ;)

--
After noticing some code in mm/filemap.c accidentally perform a __GFP_FS
allocation when it should not have been, I thought it might be a good idea to
try to catch this kind of thing with lockdep.

I coded up a little idea that seems to work. Unfortunately the system has to
actually be in __GFP_FS page reclaim, then take the lock, before it will mark
it. But at least that might still be some orders of magnitude more common
(and more debuggable) than an actual deadlock condition, so we have some
improvement I hope (the concept is no less complete than discovery of a lock's
interrupt contexts).

I guess we could even do the same thing with __GFP_IO (normal reclaim), and
even GFP_NOIO locks too... but filesystems will have the most locks and fiddly
code paths, so let's start there and see how it goes.

It *seems* to work. I did a quick test.

=================================
[ INFO: inconsistent lock state ]
2.6.28-rc6-00007-ged31348-dirty #26
---------------------------------
inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage.
modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
{in-reclaim-W} state was registered at:
  [<ffffffff80267bdb>] __lock_acquire+0x75b/0x1a60
  [<ffffffff80268f71>] lock_acquire+0x91/0xc0
  [<ffffffff8070f0e1>] mutex_lock_nested+0xb1/0x310
  [<ffffffffa002002b>] brd_init+0x2b/0x216 [brd]
  [<ffffffff8020903b>] _stext+0x3b/0x170
  [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
  [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
  [<ffffffffffffffff>] 0xffffffffffffffff
irq event stamp: 3929
hardirqs last  enabled at (3929): [<ffffffff8070f2b5>] mutex_lock_nested+0x285/0x310
hardirqs last disabled at (3928): [<ffffffff8070f089>] mutex_lock_nested+0x59/0x310
softirqs last  enabled at (3732): [<ffffffff8061f623>] sk_filter+0x83/0xe0
softirqs last disabled at (3730): [<ffffffff8061f5b6>] sk_filter+0x16/0xe0

other info that might help us debug this:
1 lock held by modprobe/8526:
 #0:  (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]

stack backtrace:
Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged31348-dirty #26
Call Trace:
 [<ffffffff80265483>] print_usage_bug+0x193/0x1d0
 [<ffffffff80266530>] mark_lock+0xaf0/0xca0
 [<ffffffff80266735>] mark_held_locks+0x55/0xc0
 [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
 [<ffffffff802667ca>] trace_reclaim_fs+0x2a/0x60
 [<ffffffff80285005>] __alloc_pages_internal+0x475/0x580
 [<ffffffff8070f29e>] ? mutex_lock_nested+0x26e/0x310
 [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
 [<ffffffffa002006a>] brd_init+0x6a/0x216 [brd]
 [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
 [<ffffffff8020903b>] _stext+0x3b/0x170
 [<ffffffff8070f8b9>] ? mutex_unlock+0x9/0x10
 [<ffffffff8070f83d>] ? __mutex_unlock_slowpath+0x10d/0x180
 [<ffffffff802669ec>] ? trace_hardirqs_on_caller+0x12c/0x190
 [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
 [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agotimer: implement lockdep deadlock detection
Johannes Berg [Thu, 29 Jan 2009 15:03:20 +0000 (16:03 +0100)]
timer: implement lockdep deadlock detection

This modifies the timer code in a way to allow lockdep to detect
deadlocks resulting from a lock being taken in the timer function
as well as around the del_timer_sync() call.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
15 years agoMerge branch 'linus' into core/locking
Ingo Molnar [Sat, 7 Feb 2009 17:31:54 +0000 (18:31 +0100)]
Merge branch 'linus' into core/locking

Conflicts:
fs/btrfs/locking.c

15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
Linus Torvalds [Sat, 7 Feb 2009 02:37:22 +0000 (18:37 -0800)]
Merge git://git./linux/kernel/git/mason/btrfs-unstable

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (37 commits)
  Btrfs: Make sure dir is non-null before doing S_ISGID checks
  Btrfs: Fix memory leak in cache_drop_leaf_ref
  Btrfs: don't return congestion in write_cache_pages as often
  Btrfs: Only prep for btree deletion balances when nodes are mostly empty
  Btrfs: fix btrfs_unlock_up_safe to walk the entire path
  Btrfs: change btrfs_del_leaf to drop locks earlier
  Btrfs: Change btrfs_truncate_inode_items to stop when it hits the inode
  Btrfs: Don't try to compress pages past i_size
  Btrfs: join the transaction in __btrfs_setxattr
  Btrfs: Handle SGID bit when creating inodes
  Btrfs: Make btrfs_drop_snapshot work in larger and more efficient chunks
  Btrfs: Change btree locking to use explicit blocking points
  Btrfs: hash_lock is no longer needed
  Btrfs: disable leak debugging checks in extent_io.c
  Btrfs: sort references by byte number during btrfs_inc_ref
  Btrfs: async threads should try harder to find work
  Btrfs: selinux support
  Btrfs: make btrfs acls selectable
  Btrfs: Catch missed bios in the async bio submission thread
  Btrfs: fix readdir on 32 bit machines
  ...

15 years agoeCryptfs: Regression in unencrypted filename symlinks
Tyler Hicks [Sat, 7 Feb 2009 00:06:51 +0000 (18:06 -0600)]
eCryptfs: Regression in unencrypted filename symlinks

The addition of filename encryption caused a regression in unencrypted
filename symlink support.  ecryptfs_copy_filename() is used when dealing
with unencrypted filenames and it reported that the new, copied filename
was a character longer than it should have been.

This caused the return value of readlink() to count the NULL byte of the
symlink target.  Most applications don't care about the extra NULL byte,
but a version control system (bzr) helped in discovering the bug.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge branch 'x86/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux...
Linus Torvalds [Sat, 7 Feb 2009 02:36:02 +0000 (18:36 -0800)]
Merge branch 'x86/fixes' of git://git./linux/kernel/git/frob/linux-2.6-roland

* 'x86/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
  x86-64: fix int $0x80 -ENOSYS return

15 years agox86-64: fix int $0x80 -ENOSYS return
Roland McGrath [Sat, 7 Feb 2009 02:15:18 +0000 (18:15 -0800)]
x86-64: fix int $0x80 -ENOSYS return

One of my past fixes to this code introduced a different new bug.
When using 32-bit "int $0x80" entry for a bogus syscall number,
the return value is not correctly set to -ENOSYS.  This only happens
when neither syscall-audit nor syscall tracing is enabled (i.e., never
seen if auditd ever started).  Test program:

/* gcc -o int80-badsys -m32 -g int80-badsys.c
   Run on x86-64 kernel.
   Note to reproduce the bug you need auditd never to have started.  */

#include <errno.h>
#include <stdio.h>

int
main (void)
{
  long res;
  asm ("int $0x80" : "=a" (res) : "0" (99999));
  printf ("bad syscall returns %ld\n", res);
  return res != -ENOSYS;
}

The fix makes the int $0x80 path match the sysenter and syscall paths.

Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Roland McGrath <roland@redhat.com>
15 years agoMerge branch 'to-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux...
Linus Torvalds [Sat, 7 Feb 2009 02:10:04 +0000 (18:10 -0800)]
Merge branch 'to-linus' of git://git./linux/kernel/git/frob/linux-2.6-roland

* 'to-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
  elf core dump: fix get_user use

15 years agoelf core dump: fix get_user use
Roland McGrath [Sat, 7 Feb 2009 01:34:07 +0000 (17:34 -0800)]
elf core dump: fix get_user use

The elf_core_dump() code does its work with set_fs(KERNEL_DS) in force,
so vma_dump_size() needs to switch back with set_fs(USER_DS) to safely
use get_user() for a normal user-space address.

Checking for VM_READ optimizes out the case where get_user() would fail
anyway.  The vm_file check here was already superfluous given the control
flow earlier in the function, so that is a cleanup/optimization unrelated
to other changes but an obvious and trivial one.

Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
15 years agovfs: Don't call attach_nobh_buffers() with an empty list
Dave Kleikamp [Fri, 6 Feb 2009 20:59:26 +0000 (14:59 -0600)]
vfs: Don't call attach_nobh_buffers() with an empty list

This is a modification of a patch by Bill Pemberton <wfp5p@virginia.edu>

nobh_write_end() could call attach_nobh_buffers() with head == NULL.
This would result in a trap when attach_nobh_buffers() attempted to
access bh->b_this_page.

This can be illustrated by running the writev01 testcase from LTP on jfs.

This error was introduced by commit 5b41e74a "vfs: fix data leak in
nobh_write_end()".  That patch did not take into account that if
PageMappedToDisk() is true upon entry to nobh_write_begin(), then no
buffers will be allocated for the page.  In that case, we won't have to
worry about a failed write leaving unitialized data in the page.

Of course, head != NULL implies !page_has_buffers(page), so no need to
test both.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Dmitri Monakhov <dmonakhov@openvz.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Fri, 6 Feb 2009 19:14:23 +0000 (11:14 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Add missing COEF initialization for ALC887
  ALSA: hda - Add missing initialization for ALC272
  sound: usb-audio: handle wMaxPacketSize for FIXED_ENDPOINT devices
  ALSA: hda - Fix misc workqueue issues
  ALSA: hda - Add quirk for FSC Amilo Xi2550

15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394...
Linus Torvalds [Fri, 6 Feb 2009 16:48:16 +0000 (08:48 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ieee1394/linux1394-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  ieee1394: dv1394: move deprecation message from module init to file open
  firewire: core: Remove card from list of cards when enable fails

15 years agoAdd Sascha Hauer to .mailmap
Uwe Kleine-König [Fri, 6 Feb 2009 13:53:18 +0000 (14:53 +0100)]
Add Sascha Hauer to .mailmap

This fixes the shortlog attribution e.g. for 106757b38fff

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoadd another mailmap entry for Uwe Kleine-König
Uwe Kleine-König [Fri, 6 Feb 2009 13:53:19 +0000 (14:53 +0100)]
add another mailmap entry for Uwe Kleine-König

I created commit 7971db5a4b4176ad5df590fce07a962c643a2740 on a machine
where I forgot to set user.name and user.email before.  The default
values were not optimal.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agofork.c: fix NULL pointer dereference when nr_threads == threads-max
Li Zefan [Fri, 6 Feb 2009 08:17:19 +0000 (08:17 +0000)]
fork.c: fix NULL pointer dereference when nr_threads == threads-max

I happened to forked lots of processes, and hit NULL pointer dereference.
It is because in copy_process() after checking max_threads, 0 is returned
but not -EAGAIN.

The bug is introduced by "CRED: Detach the credentials from task_struct"
(commit f1752eec6145c97163dbce62d17cf5d928e28a27).

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoBtrfs: Make sure dir is non-null before doing S_ISGID checks
Chris Mason [Fri, 6 Feb 2009 16:35:57 +0000 (11:35 -0500)]
Btrfs: Make sure dir is non-null before doing S_ISGID checks

The S_ISGID check in btrfs_new_inode caused an oops during subvol creation
because sometimes the dir is null.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
15 years agoMerge branch 'for-linus' of git://neil.brown.name/md
Linus Torvalds [Fri, 6 Feb 2009 15:41:10 +0000 (07:41 -0800)]
Merge branch 'for-linus' of git://neil.brown.name/md

* 'for-linus' of git://neil.brown.name/md:
  md: Ensure an md array never has too many devices.
  md: Fix a bug in linear.c causing which_dev() to return the wrong device.
  md: Allow read error in a single drive raid1 to be passed up.

15 years agoieee1394: dv1394: move deprecation message from module init to file open
Stefan Richter [Tue, 3 Feb 2009 16:54:31 +0000 (17:54 +0100)]
ieee1394: dv1394: move deprecation message from module init to file open

On many Linux installations, the dv1394 driver will be auto-loaded
whenever an AV/C device (e.g. camcorder or audio device) is plugged in.
An irritating message would then appear in the kernel log.

Defer this message to until a dv1394 character device file is actually
used by a program.  Also include the program name in the message and
update the message slightly.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
15 years agoMerge branch 'fix/usb-audio' into for-linus
Takashi Iwai [Fri, 6 Feb 2009 13:25:13 +0000 (14:25 +0100)]
Merge branch 'fix/usb-audio' into for-linus

15 years agoMerge branch 'fix/hda' into for-linus
Takashi Iwai [Fri, 6 Feb 2009 13:25:04 +0000 (14:25 +0100)]
Merge branch 'fix/hda' into for-linus

15 years agoALSA: hda - Add missing COEF initialization for ALC887
Takashi Iwai [Fri, 6 Feb 2009 11:46:59 +0000 (12:46 +0100)]
ALSA: hda - Add missing COEF initialization for ALC887

Signed-off-by: Takashi Iwai <tiwai@suse.de>
15 years agoALSA: hda - Add missing initialization for ALC272
Takashi Iwai [Fri, 6 Feb 2009 11:45:52 +0000 (12:45 +0100)]
ALSA: hda - Add missing initialization for ALC272

ALC272 needs EAPD for speaker outputs as well as other similar ALC
codecs.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
15 years agosound: usb-audio: handle wMaxPacketSize for FIXED_ENDPOINT devices
Clemens Ladisch [Fri, 6 Feb 2009 07:13:07 +0000 (08:13 +0100)]
sound: usb-audio: handle wMaxPacketSize for FIXED_ENDPOINT devices

For audio devices that do not have proper audio descriptors (e.g.,
Edirol UA-20), we use hardcoded parameters from our quirks list.
However, we must still read the maximum packet size from the standard
endpoint descriptor; otherwise, we might use packets that are too big
and therefore rejected by the USB core.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
15 years agomd: Ensure an md array never has too many devices.
NeilBrown [Fri, 6 Feb 2009 07:02:46 +0000 (18:02 +1100)]
md: Ensure an md array never has too many devices.

Each different metadata format supported by md supports a
different maximum number of devices.
We really should be enforcing this maximum in the kernel, but
we aren't quite doing that properly.

We currently only enforce it at the 'hot_add' point, which is an
older interface which is not used by current userspace.

We need to also enforce it at 'add_new_disk' time for active arrays
and at 'do_md_run' time when starting a new array.

So move the test from 'hot_add' into 'bind_rdev_to_array' which is
called from both 'hot_add' and 'add_new_disk, and add a new
test in 'analyse_sbs' which is called from 'do_md_run'.

This bug (or missing feature) has been around "forever" and so
the patch is suitable for any -stable that is currently maintained.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
15 years agomd: Fix a bug in linear.c causing which_dev() to return the wrong device.
Andre Noll [Fri, 6 Feb 2009 04:10:52 +0000 (15:10 +1100)]
md: Fix a bug in linear.c causing which_dev() to return the wrong device.

ab5bd5cbc8d4b868378d062eed3d4240930fbb86 introduced the following
bug in linear software raid for large arrays on 32 bit machines:

which_dev() computes the device holding a given sector by shifting
down the sector number to a 32 bit range, dividing by the array
spacing and looking up the resulting index in the hash table of
the array.

Because the computed index might be slightly too small, a loop at
the end of which_dev() increases the index until the given sector
actually falls into the range of the device associated with that index.

The changes of the above mentioned commit caused this loop to check
whether the _index_ rather than the sector number is small enough,
effectively bypassing the loop and thus possibly returning the wrong
device.

As reported by Simon Kirby, this leads to errors such as

linear_make_request: Sector 2340486136 out of bounds on dev sdi: 156301312 sectors, offset 2109870464

Fix this bug by introducing a local variable for the index so that
the variable containing the passed sector is left unchanged.

Cc: stable@kernel.org
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
15 years agomd: Allow read error in a single drive raid1 to be passed up.
NeilBrown [Fri, 6 Feb 2009 04:06:47 +0000 (15:06 +1100)]
md: Allow read error in a single drive raid1 to be passed up.

If a raid1 only has a single working device and gets a read error,
we choose to simply return that error up to the filesystem (or whatever)
rather than failing the whole array.

However the codes doesn't quite do that.  We attempt a readbalance
which allocates the same drive, so we retry the read - indefinitely.

Instead:  If read_balance in the error case chooses the same drive that just
failed, treat it as a failure and don't retry.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoprevent kprobes from catching spurious page faults
Masami Hiramatsu [Thu, 5 Feb 2009 22:12:39 +0000 (17:12 -0500)]
prevent kprobes from catching spurious page faults

Prevent kprobes from catching spurious faults which will cause infinite
recursive page-fault and memory corruption by stack overflow.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: <stable@kernel.org> [2.6.28.x]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agobraino in sg_ioctl_trans()
Al Viro [Fri, 6 Feb 2009 00:32:27 +0000 (00:32 +0000)]
braino in sg_ioctl_trans()

... and yes, gcc is insane enough to eat that without complaint.
We probably want sparse to scream on those...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfashe...
Linus Torvalds [Fri, 6 Feb 2009 00:12:38 +0000 (16:12 -0800)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/mfasheh/ocfs2

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  Revert "configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item()"

15 years agoMerge branch 'sh/for-2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal...
Linus Torvalds [Fri, 6 Feb 2009 00:11:54 +0000 (16:11 -0800)]
Merge branch 'sh/for-2.6.29' of git://git./linux/kernel/git/lethal/sh-2.6

* 'sh/for-2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: Fix up T-bit error handling in SH-4A mutex fastpath.
  sh: Fix up spurious syscall restarting.
  sh: fcnvds fix with denormalized numbers on SH-4 FPU.
  sh: Only reserve memory under CONFIG_ZERO_PAGE_OFFSET when it != 0.
  sh: Handle calling csum_partial with misaligned data
  sh: ap325rxa: Enable ov772x in defconfig.
  sh: ap325rxa: Add ov772x support.
  sh: ap325rxa: control camera power toggling.
  sh: mach-migor: Enable ov772x and tw9910 in defconfig.

15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Fri, 6 Feb 2009 00:11:32 +0000 (16:11 -0800)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  Revert "tcp: Always set urgent pointer if it's beyond snd_nxt"
  ipv6: Copy cork options in ip6_append_data
  udp: Fix UDP short packet false positive
  gianfar: Fix potential soft reset race
  gianfar: Fix BD_LENGTH_MASK definition
  cxgb3: Fix lro switch
  iwlwifi: save PCI state before suspend, restore after resume
  iwlwifi: clean key table in iwl_clear_stations_table

15 years agoRevert "tcp: Always set urgent pointer if it's beyond snd_nxt"
David S. Miller [Thu, 5 Feb 2009 23:38:31 +0000 (15:38 -0800)]
Revert "tcp: Always set urgent pointer if it's beyond snd_nxt"

This reverts commit 64ff3b938ec6782e6585a83d5459b98b0c3f6eb8.

Jeff Chua reports that it breaks rlogin for him.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoipv6: Copy cork options in ip6_append_data
Herbert Xu [Thu, 5 Feb 2009 23:15:50 +0000 (15:15 -0800)]
ipv6: Copy cork options in ip6_append_data

As the options passed to ip6_append_data may be ephemeral, we need
to duplicate it for corking.  This patch applies the simplest fix
which is to memdup all the relevant bits.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
David S. Miller [Thu, 5 Feb 2009 23:08:11 +0000 (15:08 -0800)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-2.6

15 years agoudp: Fix UDP short packet false positive
Jesper Dangaard Brouer [Thu, 5 Feb 2009 23:05:45 +0000 (15:05 -0800)]
udp: Fix UDP short packet false positive

The UDP header pointer assignment must happen after calling
pskb_may_pull().  As pskb_may_pull() can potentially alter the SKB
buffer.

This was exposted by running multicast traffic through the NIU driver,
as it won't prepull the protocol headers into the linear area on
receive.

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoseq_file: fix big-enough lseek() + read()
Alexey Dobriyan [Thu, 5 Feb 2009 21:30:05 +0000 (00:30 +0300)]
seq_file: fix big-enough lseek() + read()

lseek() further than length of the file will leave stale ->index
(second-to-last during iteration). Next seq_read() will not notice
that ->f_pos is big enough to return 0, but will print last item
as if ->f_pos is pointing to it.

Introduced in commit cb510b8172602a66467f3551b4be1911f5a7c8c2
aka "seq_file: more atomicity in traverse()".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoseq_file: move traverse so it can be used from seq_read
Eric Biederman [Wed, 4 Feb 2009 23:12:25 +0000 (15:12 -0800)]
seq_file: move traverse so it can be used from seq_read

In 2.6.25 some /proc files were converted to use the seq_file
infrastructure.  But seq_files do not correctly support pread(), which
broke some usersapce applications.

To handle pread correctly we can't assume that f_pos is where we left it
in seq_read.  So move traverse() so that we can eventually use it in
seq_read and do thus some day support pread().

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Cc: Paul Turner <pjt@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agosgi-xp: fix writing past the end of kzalloc()'d space
Dean Nelson [Wed, 4 Feb 2009 23:12:24 +0000 (15:12 -0800)]
sgi-xp: fix writing past the end of kzalloc()'d space

A missing type cast results in writing way beyond the end of a kzalloc()'d
memory segment resulting in slab corruption. But it seems like the better
solution is to define ->recv_msg_slots as a 'void *' rather than a
'struct xpc_notify_mq_msg_uv *' and add the type cast.

Signed-off-by: Dean Nelson <dcn@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoalpha: fixup BUG macro
Alexey Dobriyan [Wed, 4 Feb 2009 23:12:21 +0000 (15:12 -0800)]
alpha: fixup BUG macro

Do usual do {} while (0) dance, otherwise

fs/gfs2/util.c:99: error: expected expression before 'else'
drivers/scsi/lpfc/lpfc_sli.c:363: error: expected expression before 'else'

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agosx.c: fix missed unlock_kernel() on error path in sx_fw_ioctl()
Dan Carpenter [Wed, 4 Feb 2009 23:12:20 +0000 (15:12 -0800)]
sx.c: fix missed unlock_kernel() on error path in sx_fw_ioctl()

If we return directly with -EPERM then lock_kernel() is still held.

This was found with a code checker (http://repo.or.cz/w/smatch.git/).

[akpm@linux-foundation.org: fix another such path - missed func_exit()]
Signed-off-by: Dan Carpenter <error27@gmail.com>
Cc: <R.E.Wolff@BitWizard.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoatyfb: fix CONFIG_ namespace violations
Randy Dunlap [Wed, 4 Feb 2009 23:12:20 +0000 (15:12 -0800)]
atyfb: fix CONFIG_ namespace violations

Fix namespace violations by changing non-kconfig CONFIG_ names to CNFG_*.

Fixes breakage in staging/, which adds a real CONFIG_PANEL.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agortc-ds1390: fix compilation warnings in drivers/rtc/rtc-ds1390.c
Manish Katiyar [Wed, 4 Feb 2009 23:12:19 +0000 (15:12 -0800)]
rtc-ds1390: fix compilation warnings in drivers/rtc/rtc-ds1390.c

drivers/rtc/rtc-ds1390.c:125: warning: unused variable 'rtc'

Signed-off-by: Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agodrivers/video/backlight: rename da903x to da903x_bl
Mike Rapoport [Wed, 4 Feb 2009 23:12:18 +0000 (15:12 -0800)]
drivers/video/backlight: rename da903x to da903x_bl

Currently both da903x backlight and voltage reulator drivers have the
same name. Rename the backlight driver to allow use of both drivers as
modules.

Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Acked-by: Eric Miao <eric.miao@marvell.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoatmel-ssc: fix misuse of dev_dbg when requested ssc instance is not found
Hans-Christian Egtvedt [Wed, 4 Feb 2009 23:12:17 +0000 (15:12 -0800)]
atmel-ssc: fix misuse of dev_dbg when requested ssc instance is not found

The ssc pointer is not valid when the id is not found in the list.
Convert the message from a debug one into an error message and avoid
dereferencing the bad pointer.

Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Huang Weiyi <weiyi.huang@gmail.com>
Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agodo_wp_page: fix regression with execute in place
Carsten Otte [Wed, 4 Feb 2009 23:12:16 +0000 (15:12 -0800)]
do_wp_page: fix regression with execute in place

Fix do_wp_page for VM_MIXEDMAP mappings.

In the case where pfn_valid returns 0 for a pfn at the beginning of
do_wp_page and the mapping is not shared writable, the code branches to
label `gotten:' with old_page == NULL.

In case the vma is locked (vma->vm_flags & VM_LOCKED), lock_page,
clear_page_mlock, and unlock_page try to access the old_page.

This patch checks whether old_page is valid before it is dereferenced.

The regression was introduced by "mlock: mlocked pages are unevictable"
(commit b291f000393f5a0b679012b39d79fbc85c018233).

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org> [2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agowait: prevent exclusive waiter starvation
Johannes Weiner [Wed, 4 Feb 2009 23:12:14 +0000 (15:12 -0800)]
wait: prevent exclusive waiter starvation

With exclusive waiters, every process woken up through the wait queue must
ensure that the next waiter down the line is woken when it has finished.

Interruptible waiters don't do that when aborting due to a signal.  And if
an aborting waiter is concurrently woken up through the waitqueue, noone
will ever wake up the next waiter.

This has been observed with __wait_on_bit_lock() used by
lock_page_killable(): the first contender on the queue was aborting when
the actual lock holder woke it up concurrently.  The aborted contender
didn't acquire the lock and therefor never did an unlock followed by
waking up the next waiter.

Add abort_exclusive_wait() which removes the process' wait descriptor from
the waitqueue, iff still queued, or wakes up the next waiter otherwise.
It does so under the waitqueue lock.  Racing with a wake up means the
aborting process is either already woken (removed from the queue) and will
wake up the next waiter, or it will remove itself from the queue and the
concurrent wake up will apply to the next waiter after it.

Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and
__wait_on_bit_lock() when they were interrupted by other means than a wake
up through the queue.

[akpm@linux-foundation.org: coding-style fixes]
Reported-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Mentored-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org> ["after some testing"]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agomaintainers: general@lists.openfabrics.org is moderated
Randy Dunlap [Wed, 4 Feb 2009 23:12:13 +0000 (15:12 -0800)]
maintainers: general@lists.openfabrics.org is moderated

I got the "list is moderated message," so add it here.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agolis3lv02d: add axes knowledge for HP 6710
Martin Kebert [Wed, 4 Feb 2009 23:12:12 +0000 (15:12 -0800)]
lis3lv02d: add axes knowledge for HP 6710

Add support for the HP laptops of model 6710x for having correctly setup
axes.

Signed-off-by: Martin Kebert <gkmarty@gmail.com>
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agolis3lv02d: add axes knowledge for HP 6730
Pavel Herrmann [Wed, 4 Feb 2009 23:12:11 +0000 (15:12 -0800)]
lis3lv02d: add axes knowledge for HP 6730

Add support for the HP laptops of model 6730x for having correctly setup
axes.

Signed-off-by: Pavel Herrmann <morpheus.ibis@gmail.com>
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agolis3lv02d: add axes knowledge for HP 6530
Eric Piel [Wed, 4 Feb 2009 23:12:11 +0000 (15:12 -0800)]
lis3lv02d: add axes knowledge for HP 6530

Add support for the HP laptops of model 6530x for having correctly setup
axes.

Reported-by: Jerome Poulin <jeromepoulin@gmail.com>
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agolis3lv02d: add axes knowledge for HP 6510b
Jiri Tersel [Wed, 4 Feb 2009 23:12:09 +0000 (15:12 -0800)]
lis3lv02d: add axes knowledge for HP 6510b

According to dmesg my laptop model HP 6510b is not being recognized by this
driver. After I have modified "lis3lv02d.c" axes in Neverball are OK.

Signed-off-by: Jiri Tersel <tersel@mail.muni.cz>
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agohp-wmi: fix error path in hp_wmi_bios_setup()
Andrew Morton [Wed, 4 Feb 2009 23:12:07 +0000 (15:12 -0800)]
hp-wmi: fix error path in hp_wmi_bios_setup()

The error-path code can call rfkill_unregister() with a pointer which does
not contain the result of a call to rfkill_register().  It goes BUG().

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12560.

Cc: Frans Pop <elendil@planet.nl>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Matthew Garrett <mjg@redhat.com>
Reported-by: Helge Deller <deller@gmx.de>
Testted-by: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agorevert "rlimit: permit setting RLIMIT_NOFILE to RLIM_INFINITY"
Andrew Morton [Wed, 4 Feb 2009 23:12:06 +0000 (15:12 -0800)]
revert "rlimit: permit setting RLIMIT_NOFILE to RLIM_INFINITY"

Revert commit 0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f because it causes
(arguably poorly designed) existing userspace to spend interminable
periods closing billions of not-open file descriptors.

We could bring this back, with some sort of opt-in tunable in /proc, which
defaults to "off".

Peter's alanysis follows:

: I spent several hours trying to get to the bottom of a serious
: performance issue that appeared on one of our servers after upgrading to
: 2.6.28.  In the end it's what could be considered a userspace bug that
: was triggered by a change in 2.6.28.  Since this might also affect other
: people I figured I'd at least document what I found here, and maybe we
: can even do something about it:
:
:
: So, I upgraded some of debian.org's machines to 2.6.28.1 and immediately
: the team maintaining our ftp archive complained that one of their
: scripts that previously ran in a few minutes still hadn't even come
: close to being done after an hour or so.  Downgrading to 2.6.27 fixed
: that.
:
: Turns out that script is forking a lot and something in it or python or
: whereever closes all the file descriptors it doesn't want to pass on.
: That is, it starts at zero and goes up to ulimit -n/RLIMIT_NOFILE and
: closes them all with a few exceptions.
:
: Turns out that takes a long time when your limit -n is now 2^20 (1048576).
:
: With 2.6.27.* the ulimit -n was the standard 1024, but with 2.6.28 it is
: now a thousand times that.
:
: 2.6.28 included a patch titled "rlimit: permit setting RLIMIT_NOFILE to
: RLIM_INFINITY" (0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f)[1] that
: allows, as the title implies, to set the limit for number of files to
: infinity.
:
: Closer investigation showed that the broken default ulimit did not apply
: to "system" processes (like stuff started from init).  In the end I
: could establish that all processes that passed through pam_limit at one
: point had the bad resource limit.
:
: Apparently the pam library in Debian etch (4.0) initializes the limits
: to some default values when it doesn't have any settings in limit.conf
: to override them.  Turns out that for nofiles this is RLIM_INFINITY.
: Commenting out "case RLIMIT_NOFILE" in pam_limit.c:267 of our pam
: package version 0.79-5 fixes that - tho I'm not sure what side effects
: that has.
:
: Debian lenny (the upcoming 5.0 version) doesn't have this issue as it
: uses a different pam (version).

Reported-by: Peter Palfrader <weasel@debian.org>
Cc: Adam Tkac <vonsch@gmail.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <stable@kernel.org> [2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoshm: fix shmctl(SHM_INFO) lockup with !CONFIG_SHMEM
Tony Battersby [Wed, 4 Feb 2009 23:12:04 +0000 (15:12 -0800)]
shm: fix shmctl(SHM_INFO) lockup with !CONFIG_SHMEM

shm_get_stat() assumes that the inode is a "struct shmem_inode_info",
which is incorrect for !CONFIG_SHMEM (see fs/ramfs/inode.c:
ramfs_get_inode() vs.  mm/shmem.c: shmem_get_inode()).

This bad assumption can cause shmctl(SHM_INFO) to lockup when
shm_get_stat() tries to spin_lock(&info->lock).  Users of !CONFIG_SHMEM
may encounter this lockup simply by invoking the 'ipcs' command.

Reported by Jiri Olsa back in February 2008:
http://lkml.org/lkml/2008/2/29/74

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Reported-by: Jiri Olsa <olsajiri@gmail.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: <stable@kernel.org> [2.6.everything]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agofbmem: don't call copy_from/to_user() with mutex held
Andrea Righi [Wed, 4 Feb 2009 23:12:03 +0000 (15:12 -0800)]
fbmem: don't call copy_from/to_user() with mutex held

Avoid calling copy_from/to_user() with fb_info->lock mutex held in fbmem
ioctl().

fb_mmap() is called under mm->mmap_sem (A) held, that also acquires
fb_info->lock (B); fb_ioctl() takes fb_info->lock (B) and does
copy_from/to_user() that might acquire mm->mmap_sem (A), causing a
deadlock.

NOTE: it doesn't push down the fb_info->lock in each own driver's
fb_ioctl(), so there are still potential deadlocks elsewhere.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Johannes Weiner <hannes@saeurebad.de>
Cc: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agortc: rtc-dm355evm driver
David Brownell [Wed, 4 Feb 2009 23:12:01 +0000 (15:12 -0800)]
rtc: rtc-dm355evm driver

Simple RTC driver for the MSP430 firmware on the DM355 EVM board.  Other
than not supporting atomic reads/writes of all four bytes, this is
reasonable as a basic no-alarm RTC.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agomisc: dell-laptop should depend on POWER_SUPPLY
Matthew Garrett [Wed, 4 Feb 2009 23:12:00 +0000 (15:12 -0800)]
misc: dell-laptop should depend on POWER_SUPPLY

dell-laptop makes use of the power supply class information to choose
which backlight interface to change. Add a depends on it.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agogeneric swap(): don't return a value from swap()
Peter Zijlstra [Wed, 4 Feb 2009 23:11:59 +0000 (15:11 -0800)]
generic swap(): don't return a value from swap()

The swap() macro is accidentally retuning the value of its first argument.
Change it into a doesn't-return-anything macro before someone goes and
relies upon this behaviour.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agohpilo: open/close fix
David Altobelli [Wed, 4 Feb 2009 23:11:58 +0000 (15:11 -0800)]
hpilo: open/close fix

The device can take a while to respond to an open/close request, so
increase the time kernel will wait for response (1 ms to 10ms).

Also, properly clean up a channel on a failed open, by calling the channel
close routine.  Just freeing the memory isn't sufficient, the device needs
to be informed that the channel is no longer open, and the device memory
cleared of references to freed dma buffer.

Signed-off-by: David Altobelli <david.altobelli@hp.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agokernel/async.c: fix printk warnings
Andrew Morton [Wed, 4 Feb 2009 23:11:58 +0000 (15:11 -0800)]
kernel/async.c: fix printk warnings

alpha:

kernel/async.c: In function 'run_one_entry':
kernel/async.c:141: warning: format '%lli' expects type 'long long int', but argument 2 has type 'async_cookie_t'
kernel/async.c:149: warning: format '%lli' expects type 'long long int', but argument 2 has type 'async_cookie_t'
kernel/async.c:149: warning: format '%lld' expects type 'long long int', but argument 4 has type 's64'
kernel/async.c: In function 'async_synchronize_cookie_special':
kernel/async.c:250: warning: format '%lli' expects type 'long long int', but argument 3 has type 's64'

Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoBtrfs: Fix memory leak in cache_drop_leaf_ref
Chris Mason [Thu, 5 Feb 2009 14:08:14 +0000 (09:08 -0500)]
Btrfs: Fix memory leak in cache_drop_leaf_ref

The code wasn't doing a kfree on the sorted array

Signed-off-by: Chris Mason <chris.mason@oracle.com>
15 years agoALSA: hda - Fix misc workqueue issues
Takashi Iwai [Thu, 5 Feb 2009 06:34:28 +0000 (07:34 +0100)]
ALSA: hda - Fix misc workqueue issues

Some fixes regarding snd-hda-intel workqueue:
- Use create_singlethread_workqueue() instead of create_workqueue()
  as per-CPU work isn't required.
- Allocate workq name string properly
- Renamed the workq name to "hd-audio*" to be more obvious.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
15 years agogianfar: Fix potential soft reset race
Andy Fleming [Thu, 5 Feb 2009 00:38:05 +0000 (16:38 -0800)]
gianfar: Fix potential soft reset race

SOFT_RESET must be asserted for at least 3 TX clocks in order for it to work
properly.  The syncs in the gfar_write() commands have been hiding this, but
we need to guarantee it.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agogianfar: Fix BD_LENGTH_MASK definition
Andy Fleming [Thu, 5 Feb 2009 00:37:40 +0000 (16:37 -0800)]
gianfar: Fix BD_LENGTH_MASK definition

BD_LENGTH_MASK is supposed to catch the low 16-bits of the status field, not
the low byte.  The old way, we would never be able to clean up tx packets with
sizes divisible by 256.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agocxgb3: Fix lro switch
Divy Le Ray [Thu, 5 Feb 2009 00:31:39 +0000 (16:31 -0800)]
cxgb3: Fix lro switch

The LRO switch is always set to 1 in the rx processing loop.
It breaks the accelerated iSCSI receive traffic.
Fix its computation.

Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoALSA: hda - Add quirk for FSC Amilo Xi2550
Takashi Iwai [Wed, 4 Feb 2009 22:30:19 +0000 (23:30 +0100)]
ALSA: hda - Add quirk for FSC Amilo Xi2550

Added model=fujisu-pi2515 for FSC Amilo Xi2550 with ALC883 codec.

Refernece: Novell bnc#450979
https://bugzilla.novell.com/show_bug.cgi?id=450979

Signed-off-by: Takashi Iwai <tiwai@suse.de>
15 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 4 Feb 2009 21:58:50 +0000 (13:58 -0800)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: APIC: enable workaround on AMD Fam10h CPUs
  xen: disable interrupts before saving in percpu
  x86: add x86@kernel.org to MAINTAINERS
  x86: push old stack address on irqstack for unwinder
  irq, x86: fix lock status with numa_migrate_irq_desc
  x86: add cache descriptors for Intel Core i7
  x86/Voyager: make it build and boot

15 years agoMerge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 4 Feb 2009 21:58:37 +0000 (13:58 -0800)]
Merge branch 'sched-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: add missing kernel-doc in sched.h

15 years agoMerge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 4 Feb 2009 21:58:24 +0000 (13:58 -0800)]
Merge branch 'tracing-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  ftrace: do_each_pid_task() needs rcu lock

15 years agoiwlwifi: save PCI state before suspend, restore after resume
Reinette Chatre [Tue, 3 Feb 2009 18:20:03 +0000 (10:20 -0800)]
iwlwifi: save PCI state before suspend, restore after resume

This is the right thing to do and fixes the following warning:

[  115.012278] ------------[ cut here ]------------
[  115.012281] WARNING: at drivers/pci/pci-driver.c:370
pci_legacy_suspend+0x85/0xc2()
[  115.012285] Hardware name: Latitude D630
[  115.012301] PCI PM: Device state not saved by
iwl3945_pci_suspend+0x0/0x4c [iwl3945]
[  115.012304] Modules linked in: fuse nfsd lockd nfs_acl auth_rpcgss
exportfs sunrpc ipv6 acpi_cpufreq kvm_intel kvm snd_hda_codec_idt
snd_hda_intel snd_hda_codec snd_hwdep arc4 snd_seq_device snd_pcm_oss
snd_mixer_oss ecb snd_pcm cryptomgr aead snd_timer crypto_blkcipher
snd snd_page_alloc ohci1394 crypto_hash crypto_algapi ch341 ieee1394
usbserial thermal iwl3945 mac80211 led_class lib80211 tg3 processor
i2c_i801 i2c_core sg cfg80211 libphy usbhid battery ac button sr_mod
cdrom evdev dcdbas ata_generic ata_piix libata sd_mod scsi_mod ext3
jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded:
microcode]
[  115.012374] Pid: 4163, comm: pm-suspend Not tainted
2.6.29-rc3-00227-gf1dd849-dirty #67
[  115.012377] Call Trace:
[  115.012382]  [<ffffffff8023d04d>] warn_slowpath+0xb1/0xed
[  115.012387]  [<ffffffff80450b5e>] ? _spin_unlock_irqrestore+0x5c/0x78
[  115.012390]  [<ffffffff80254f08>] ? up+0x34/0x39
[  115.012394]  [<ffffffff80362319>] ? acpi_ut_release_mutex+0x5d/0x61
[  115.012397]  [<ffffffff803584b2>] ? acpi_get_data+0x5e/0x70
[  115.012400]  [<ffffffff80363dd9>] ? acpi_bus_get_device+0x25/0x39
[  115.012403]  [<ffffffff80363e98>] ? acpi_bus_power_manageable+0x11/0x29
[  115.012406]  [<ffffffff803462f7>] ? acpi_pci_power_manageable+0x17/0x19
[  115.012410]  [<ffffffff8033ddfd>] ? pci_set_power_state+0xcc/0x101
[  115.012418]  [<ffffffffa01f28e9>] ? iwl3945_pci_suspend+0x0/0x4c [iwl3945]
[  115.012422]  [<ffffffff803401e6>] pci_legacy_suspend+0x85/0xc2
[  115.012425]  [<ffffffff80340316>] pci_pm_suspend+0x34/0x86
[  115.012429]  [<ffffffff8039d7ce>] pm_op+0x52/0xe5
[  115.012432]  [<ffffffff8039dd78>] device_suspend+0x32a/0x451
[  115.012436]  [<ffffffff80269ec2>] suspend_devices_and_enter+0x3e/0x13a
[  115.012439]  [<ffffffff8026a128>] enter_state+0x110/0x164
[  115.012442]  [<ffffffff8026a233>] state_store+0xb7/0xd7
[  115.012446]  [<ffffffff8032f95f>] kobj_attr_store+0x17/0x19
[  115.012449]  [<ffffffff80307d64>] sysfs_write_file+0xe4/0x119
[  115.012453]  [<ffffffff802baa7a>] vfs_write+0xae/0x137
[  115.012456]  [<ffffffff802babc7>] sys_write+0x47/0x70
[  115.012459]  [<ffffffff8020b73a>] system_call_fastpath+0x16/0x1b
[  115.012467] ---[ end trace 829828966f6f24dc ]---

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Ming Lei <tom.leiming@gmail.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
15 years agoiwlwifi: clean key table in iwl_clear_stations_table
Reinette Chatre [Wed, 28 Jan 2009 17:38:30 +0000 (09:38 -0800)]
iwlwifi: clean key table in iwl_clear_stations_table

Cleans uCode key table bit map iwl_clear_stations_table
since all stations are cleared also the key table must be.

Since the keys are not removed properly on suspend by mac80211
this may result in exhausting key table on resume leading
to memory corruption during removal

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
15 years agoRevert "configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item()"
Mark Fasheh [Wed, 4 Feb 2009 07:12:34 +0000 (23:12 -0800)]
Revert "configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item()"

This reverts commit 0e0333429a6280e6eb3c98845e4eed90d5f8078a.

I committed this by accident - Joel and Louis are working with the lockdep
maintainer to provide a better solution than just turning lockdep off.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: <Joel Becker <joel.becker@oracle.com>
15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Wed, 4 Feb 2009 17:39:12 +0000 (09:39 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: pcm_oss: AFMT_S24_LE is set twice in return value
  ALSA: ASoC: email - update email addresses.
  OMAP: ASoC: Fix spinlock misuse in omap-pcm.c
  ALSA: hda - No widget selection for volume knob widgets in proc output
  ALSA: hda - Add support of iMac 24 Aluminium
  ALSA: alsa: time reaches -1, tested 0
  ALSA: hda - Add quirk for another HP dv5 model

15 years agoMerge branch 'fix/asoc' into for-linus
Takashi Iwai [Wed, 4 Feb 2009 17:19:11 +0000 (18:19 +0100)]
Merge branch 'fix/asoc' into for-linus

15 years agoMerge branch 'fix/hda' into for-linus
Takashi Iwai [Wed, 4 Feb 2009 17:19:07 +0000 (18:19 +0100)]
Merge branch 'fix/hda' into for-linus

15 years agoALSA: pcm_oss: AFMT_S24_LE is set twice in return value
Roel Kluin [Wed, 4 Feb 2009 17:14:55 +0000 (18:14 +0100)]
ALSA: pcm_oss: AFMT_S24_LE is set twice in return value

AFMT_S24_LE is set twice in return value

vi sound/core/oss/pcm_oss.c +640
#define AFMT_S24_LE      0x00008000
#define AFMT_S24_BE      0x00010000

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney...
Linus Torvalds [Wed, 4 Feb 2009 15:56:25 +0000 (07:56 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/cooloney/blackfin-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: (40 commits)
  Blackfin arch: Remove outdated code
  Blackfin arch: Fix udelay implementation
  Blackfin arch: Update Copyright information
  Blackfin arch: Add BF561 PPI POLS, POLC Masks
  Blackfin arch: Update CM-BF527 kernel config
  Blackfin arch: define bfin_memmap as static since it is only used here
  Blackfin arch: cplb mananger: use a do...while loop rather than a for loop
  Blackfin arch: fix bug - traps test case 19 for exception 0x2d fails
  Blackfin arch: add platform device bfin_mii-bus and KSZ8893M switch driver platform resources to board files
  Blackfin arch: build jtag tty driver as a module by default
  Blackfin arch: fix 2 bugs related to debug
  Blackfin arch: Add ANOMALY_05000380 to BF54x to kill the compile warning
  Blackfin arch: Fix bug - 561 SMP kernel can't boot from jffs2
  Blackfin arch: base SIC_IWR# programming on whether the MMR exists
  Blackfin arch: read SYSCR on newer parts that mirror the bits of SWRST in it
  Blackfin arch: fixup board init function name
  Blackfin arch: drop CONFIG_I2C_BOARDINFO ifdefs
  Blackfin arch: bfin_reset->_bfin_reset redirection no longer needed
  Blackfin arch: sync reboot handler with version in u-boot
  Blackfin arch: Faster Implementation of csum_tcpudp_nofold()
  ...

15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
Linus Torvalds [Wed, 4 Feb 2009 15:54:00 +0000 (07:54 -0800)]
Merge git://git./linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Kill bogus TPC/address truncation during 32-bit faults.
  sparc: fixup for sparseirq changes
  sparc64: Validate kernel generated fault addresses on sparc64.
  sparc64: On non-Niagara, need to touch NMI watchdog in NOHZ mode.
  sparc64: Implement NMI watchdog on capable cpus.
  sparc: Probe PMU type and record in sparc_pmu_type.
  sparc64: Move generic PCR support code to seperate file.

15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Wed, 4 Feb 2009 15:52:21 +0000 (07:52 -0800)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  sunrpc: fix rdma dependencies
  e1000: Fix PCI enable to honor the need_ioport flag
  sgi-xp: link XPNET's net_device_ops to its net_device structure
  pcnet_cs: Fix misuse of the equality operator.
  hso: add new device id's
  dca: redesign locks to fix deadlocks
  cassini/sungem: limit reaches -1, but 0 tested
  net: variables reach -1, but 0 tested
  qlge: bugfix: Add missing netif_napi_del call.
  qlge: bugfix: Add flash offset for second port.
  qlge: bugfix: Fix endian issue when reading flash.
  udp: increments sk_drops in __udp_queue_rcv_skb()
  net: Fix userland breakage wrt. linux/if_tunnel.h
  net: packet socket packet_lookup_frame fix

15 years agoMerge branch 'for-linus' of git://git.o-hand.com/linux-mfd
Linus Torvalds [Wed, 4 Feb 2009 15:40:54 +0000 (07:40 -0800)]
Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd

* 'for-linus' of git://git.o-hand.com/linux-mfd:
  mfd: Remove non exported references from pcf50633

15 years agoBtrfs: don't return congestion in write_cache_pages as often
Chris Mason [Wed, 4 Feb 2009 14:33:00 +0000 (09:33 -0500)]
Btrfs: don't return congestion in write_cache_pages as often

On fast devices that go from congested to uncongested very quickly, pdflush
is waiting too often in congestion_wait, and the FS is backing off to
easily in write_cache_pages.

For now, fix this on the btrfs side by only checking congestion after
some bios have already gone down.  Longer term a real fix is needed
for pdflush, but that is a larger project.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
15 years agoBtrfs: Only prep for btree deletion balances when nodes are mostly empty
Chris Mason [Wed, 4 Feb 2009 14:12:46 +0000 (09:12 -0500)]
Btrfs: Only prep for btree deletion balances when nodes are mostly empty

Whenever an item deletion is done, we need to balance all the nodes
in the tree to make sure we don't end up with an empty node if a pointer
is deleted.  This balance prep happens from the root of the tree down
so we can drop our locks as we go.

reada_for_balance was triggering read-ahead on neighboring nodes even
when no balancing was required.  This adds an extra check to avoid
calling balance_level() and avoid reada_for_balance() when a balance
won't be required.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
15 years agoBtrfs: fix btrfs_unlock_up_safe to walk the entire path
Chris Mason [Wed, 4 Feb 2009 14:31:42 +0000 (09:31 -0500)]
Btrfs: fix btrfs_unlock_up_safe to walk the entire path

btrfs_unlock_up_safe would break out at the first NULL node entry or
unlocked node it found in the path.

Some of the callers have missing nodes at the lower levels of the path, so this
commit fixes things to check all the nodes in the path before returning.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
15 years agoBtrfs: change btrfs_del_leaf to drop locks earlier
Chris Mason [Wed, 4 Feb 2009 14:31:28 +0000 (09:31 -0500)]
Btrfs: change btrfs_del_leaf to drop locks earlier

btrfs_del_leaf does two things.  First it removes the pointer in the
parent, and then it frees the block that has the leaf.  It has the
parent node locked for both operations.

But, it only needs the parent locked while it is deleting the pointer.
After that it can safely free the block without the parent locked.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
15 years agoBtrfs: Change btrfs_truncate_inode_items to stop when it hits the inode
Chris Mason [Wed, 4 Feb 2009 14:30:58 +0000 (09:30 -0500)]
Btrfs: Change btrfs_truncate_inode_items to stop when it hits the inode

btrfs_truncate_inode_items is setup to stop doing btree searches when
it has finished removing the items for the inode.  It used to detect the
end of the inode by looking for an objectid that didn't match the
one we were searching for.

But, this would result in an extra search through the btree, which
adds extra balancing and cow costs to the operation.

This commit adds a check to see if we found the inode item, which means
we can stop searching early.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
15 years agoBtrfs: Don't try to compress pages past i_size
Chris Mason [Wed, 4 Feb 2009 14:31:06 +0000 (09:31 -0500)]
Btrfs: Don't try to compress pages past i_size

The compression code had some checks to make sure we were only
compressing bytes inside of i_size, but it wasn't catching every
case.  To make things worse, some incorrect math about the number
of bytes remaining would make it try to compress more pages than the
file really had.

The fix used here is to fall back to the non-compression code in this
case, which does all the proper cleanup of delalloc and other accounting.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
15 years agoBtrfs: join the transaction in __btrfs_setxattr
Josef Bacik [Wed, 4 Feb 2009 14:18:33 +0000 (09:18 -0500)]
Btrfs: join the transaction in __btrfs_setxattr

With selinux on we end up calling __btrfs_setxattr when we create an inode,
which calls btrfs_start_transaction().  The problem is we've already called
that in btrfs_new_inode, and in btrfs_start_transaction we end up doing a
wait_current_trans().  If btrfs-transaction has started committing it will wait
for all handles to finish, while the other process is waiting for the
transaction to commit.  This is fixed by using btrfs_join_transaction, which
won't wait for the transaction to commit.  Thanks,

Signed-off-by: Josef Bacik <jbacik@redhat.com>
15 years agoBtrfs: Handle SGID bit when creating inodes
Chris Ball [Wed, 4 Feb 2009 14:29:54 +0000 (09:29 -0500)]
Btrfs: Handle SGID bit when creating inodes
Before this patch, new files/dirs would ignore the SGID bit on their
parent directory and always be owned by the creating user's uid/gid.

Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
15 years agoBtrfs: Make btrfs_drop_snapshot work in larger and more efficient chunks
Chris Mason [Wed, 4 Feb 2009 14:27:02 +0000 (09:27 -0500)]
Btrfs: Make btrfs_drop_snapshot work in larger and more efficient chunks

Every transaction in btrfs creates a new snapshot, and then schedules the
snapshot from the last transaction for deletion.  Snapshot deletion
works by walking down the btree and dropping the reference counts
on each btree block during the walk.

If if a given leaf or node has a reference count greater than one,
the reference count is decremented and the subtree pointed to by that
node is ignored.

If the reference count is one, walking continues down into that node
or leaf, and the references of everything it points to are decremented.

The old code would try to work in small pieces, walking down the tree
until it found the lowest leaf or node to free and then returning.  This
was very friendly to the rest of the FS because it didn't have a huge
impact on other operations.

But it wouldn't always keep up with the rate that new commits added new
snapshots for deletion, and it wasn't very optimal for the extent
allocation tree because it wasn't finding leaves that were close together
on disk and processing them at the same time.

This changes things to walk down to a level 1 node and then process it
in bulk.  All the leaf pointers are sorted and the leaves are dropped
in order based on their extent number.

The extent allocation tree and commit code are now fast enough for
this kind of bulk processing to work without slowing the rest of the FS
down.  Overall it does less IO and is better able to keep up with
snapshot deletions under high load.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
15 years agoBtrfs: Change btree locking to use explicit blocking points
Chris Mason [Wed, 4 Feb 2009 14:25:08 +0000 (09:25 -0500)]
Btrfs: Change btree locking to use explicit blocking points

Most of the btrfs metadata operations can be protected by a spinlock,
but some operations still need to schedule.

So far, btrfs has been using a mutex along with a trylock loop,
most of the time it is able to avoid going for the full mutex, so
the trylock loop is a big performance gain.

This commit is step one for getting rid of the blocking locks entirely.
btrfs_tree_lock takes a spinlock, and the code explicitly switches
to a blocking lock when it starts an operation that can schedule.

We'll be able get rid of the blocking locks in smaller pieces over time.
Tracing allows us to find the most common cause of blocking, so we
can start with the hot spots first.

The basic idea is:

btrfs_tree_lock() returns with the spin lock held

btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in
the extent buffer flags, and then drops the spin lock.  The buffer is
still considered locked by all of the btrfs code.

If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops
the spin lock and waits on a wait queue for the blocking bit to go away.

Much of the code that needs to set the blocking bit finishes without actually
blocking a good percentage of the time.  So, an adaptive spin is still
used against the blocking bit to avoid very high context switch rates.

btrfs_clear_lock_blocking() clears the blocking bit and returns
with the spinlock held again.

btrfs_tree_unlock() can be called on either blocking or spinning locks,
it does the right thing based on the blocking bit.

ctree.c has a helper function to set/clear all the locked buffers in a
path as blocking.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
15 years agoBtrfs: hash_lock is no longer needed
Chris Mason [Wed, 4 Feb 2009 14:24:25 +0000 (09:24 -0500)]
Btrfs: hash_lock is no longer needed

Before metadata is written to disk, it is updated to reflect that writeout
has begun.  Once this update is done, the block must be cow'd before it
can be modified again.

This update was originally synchronized by using a per-fs spinlock.  Today
the buffers for the metadata blocks are locked before writeout begins,
and everyone that tests the flag has the buffer locked as well.

So, the per-fs spinlock (called hash_lock for no good reason) is no
longer required.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
15 years agoBtrfs: disable leak debugging checks in extent_io.c
Chris Mason [Wed, 4 Feb 2009 14:24:05 +0000 (09:24 -0500)]
Btrfs: disable leak debugging checks in extent_io.c

extent_io.c has debugging code to report and free leaked extent_state
and extent_buffer objects at rmmod time.  This helps track down
leaks and it saves you from rebooting just to properly remove the
kmem_cache object.

But, the code runs under a fairly expensive spinlock and the checks to
see if it is currently enabled are not entirely consistent.  Some use
#ifdef and some #if.

This changes everything to #if and disables the leak checking.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
15 years agoBtrfs: sort references by byte number during btrfs_inc_ref
Chris Mason [Wed, 4 Feb 2009 14:23:45 +0000 (09:23 -0500)]
Btrfs: sort references by byte number during btrfs_inc_ref

When a block goes through cow, we update the reference counts of
everything that block points to.  The internal pointers of the block
can be in just about any order, and it is likely to have clusters of
things that are close together and clusters of things that are not.

To help reduce the seeks that come with updating all of these reference
counts, sort them by byte number before actual updates are done.

Signed-off-by: Chris Mason <chris.mason@oracle.com>