firefly-linux-kernel-4.4.55.git
14 years agoMerge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck...
Ingo Molnar [Mon, 23 Aug 2010 09:32:34 +0000 (11:32 +0200)]
Merge branch 'rcu/next' of git://git./linux/kernel/git/paulmck/linux-2.6-rcu into core/rcu

14 years agovhost: add __rcu annotations
Arnd Bergmann [Tue, 9 Mar 2010 18:24:45 +0000 (19:24 +0100)]
vhost: add __rcu annotations

Also add rcu_dereference_protected() for code paths where locks are held.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
14 years agoworkqueue: Add basic tracepoints to track workqueue execution
Arjan van de Ven [Sat, 21 Aug 2010 20:07:26 +0000 (13:07 -0700)]
workqueue: Add basic tracepoints to track workqueue execution

With the introduction of the new unified work queue thread pools,
we lost one feature: It's no longer possible to know which worker
is causing the CPU to wake out of idle. The result is that PowerTOP
now reports a lot of "kworker/a:b" instead of more readable results.

This patch adds a pair of tracepoints to the new workqueue code,
similar in style to the timer/hrtimer tracepoints.

With this pair of tracepoints, the next PowerTOP can correctly
report which work item caused the wakeup (and how long it took):

Interrupt (43)            i915      time   3.51ms    wakeups 141
Work      ieee80211_iface_work      time   0.81ms    wakeups  29
Work              do_dbs_timer      time   0.55ms    wakeups  24
Process                   Xorg      time  21.36ms    wakeups   4
Timer    sched_rt_period_timer      time   0.01ms    wakeups   1

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoMerge git://git.infradead.org/mtd-2.6
Linus Torvalds [Sat, 21 Aug 2010 19:47:05 +0000 (12:47 -0700)]
Merge git://git.infradead.org/mtd-2.6

* git://git.infradead.org/mtd-2.6:
  mtd: nand: Fix probe of Samsung NAND chips
  mtd: nand: Fix regression in BBM detection
  pxa3xx: fix ns2cycle equation

14 years agoReplace Configure with Enable in description of MAXSMP
Samuel Thibault [Sat, 21 Aug 2010 19:32:41 +0000 (21:32 +0200)]
Replace Configure with Enable in description of MAXSMP

The "Configure" word tends to make user believe they have to say 'yes'
to be able to choose the number of procs/nodes.  "Enable" should be
unambiguous enough.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agomm: make stack guard page logic use vm_prev pointer
Linus Torvalds [Fri, 20 Aug 2010 23:49:40 +0000 (16:49 -0700)]
mm: make stack guard page logic use vm_prev pointer

Like the mlock() change previously, this makes the stack guard check
code use vma->vm_prev to see what the mapping below the current stack
is, rather than have to look it up with find_vma().

Also, accept an abutting stack segment, since that happens naturally if
you split the stack with mlock or mprotect.

Tested-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agomm: make the mlock() stack guard page checks stricter
Linus Torvalds [Fri, 20 Aug 2010 23:39:25 +0000 (16:39 -0700)]
mm: make the mlock() stack guard page checks stricter

If we've split the stack vma, only the lowest one has the guard page.
Now that we have a doubly linked list of vma's, checking this is trivial.

Tested-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agomm: make the vma list be doubly linked
Linus Torvalds [Fri, 20 Aug 2010 23:24:55 +0000 (16:24 -0700)]
mm: make the vma list be doubly linked

It's a really simple list, and several of the users want to go backwards
in it to find the previous vma.  So rather than have to look up the
previous entry with 'find_vma_prev()' or something similar, just make it
doubly linked instead.

Tested-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agomtd: nand: Fix probe of Samsung NAND chips
Tilman Sauerbeck [Fri, 20 Aug 2010 21:01:47 +0000 (14:01 -0700)]
mtd: nand: Fix probe of Samsung NAND chips

Apparently, the check for a 6-byte ID string introduced by commit
426c457a3216fac74e3d44dd39729b0689f4c7ab ("mtd: nand: extend NAND flash
detection to new MLC chips") is NOT sufficient to determine whether or
not a Samsung chip uses their new MLC detection scheme or the old,
standard scheme. This adds a condition to check cell type.

Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
Signed-off-by: Brian Norris <norris@broadcom.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org
14 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 20 Aug 2010 21:25:08 +0000 (14:25 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, apic: Fix apic=debug boot crash
  x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues
  x86-32: Fix dummy trampoline-related inline stubs
  x86-32: Separate 1:1 pagetables from swapper_pg_dir
  x86, cpu: Fix regression in AMD errata checking code

14 years agoDocumentation: fix ozlabs.org mailing list address
Stephen Rothwell [Fri, 20 Aug 2010 09:56:31 +0000 (19:56 +1000)]
Documentation: fix ozlabs.org mailing list address

This list moved to lists.ozlabs.org quite some time ago.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoMAINTAINERS: Fix ozlabs.org mailing list addresses
Stephen Rothwell [Fri, 20 Aug 2010 09:52:45 +0000 (19:52 +1000)]
MAINTAINERS: Fix ozlabs.org mailing list addresses

All these lists moved to lists.ozlabs.org quite a while ago.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoDocumentation: kernel-locking: mutex_trylock cannot be used in interrupt context
Stefan Richter [Thu, 19 Aug 2010 21:13:43 +0000 (14:13 -0700)]
Documentation: kernel-locking: mutex_trylock cannot be used in interrupt context

Chapter 6 is right about mutex_trylock, but chapter 10 wasn't.  This error
was introduced during semaphore-to-mutex conversion of the Unreliable
guide.  :-)

If user context which performs mutex_lock() or mutex_trylock() is
preempted by interrupt context which performs mutex_trylock() on the same
mutex instance, a deadlock occurs.  This is because these functions do not
disable local IRQs when they operate on mutex->wait_lock.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agodrivers/scsi/qla4xxx: fix build
Andrew Morton [Thu, 19 Aug 2010 21:13:42 +0000 (14:13 -0700)]
drivers/scsi/qla4xxx: fix build

gcc-4.0.2:

  drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4_8xxx_error_recovery':
  drivers/scsi/qla4xxx/ql4_glbl.h:135: sorry, unimplemented: inlining failed in call to 'qla4_8xxx_set_drv_active': function body not available
  drivers/scsi/qla4xxx/ql4_os.c:2377: sorry, unimplemented: called from here
  drivers/scsi/qla4xxx/ql4_glbl.h:135: sorry, unimplemented: inlining failed in call to 'qla4_8xxx_set_drv_active': function body not available
  drivers/scsi/qla4xxx/ql4_os.c:2393: sorry, unimplemented: called from here

Cc: Ravi Anand <ravi.anand@qlogic.com>
Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agouml: fix compile error in dma_get_cache_alignment()
Miklos Szeredi [Thu, 19 Aug 2010 21:13:40 +0000 (14:13 -0700)]
uml: fix compile error in dma_get_cache_alignment()

Fix uml compile error:

  include/linux/dma-mapping.h:145: error: redefinition of 'dma_get_cache_alignment'
  arch/um/include/asm/dma-mapping.h:99: note: previous definition of 'dma_get_cache_alignment' was here

Introduced by commit 4565f0170dfc ("dma-mapping: unify
dma_get_cache_alignment implementations")

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agooom: __task_cred() need rcu_read_lock()
KOSAKI Motohiro [Thu, 19 Aug 2010 21:13:39 +0000 (14:13 -0700)]
oom: __task_cred() need rcu_read_lock()

dump_tasks() needs to hold the RCU read lock around its access of the
target task's UID.  To this end it should use task_uid() as it only needs
that one thing from the creds.

The fact that dump_tasks() holds tasklist_lock is insufficient to prevent the
target process replacing its credentials on another CPU.

Then, this patch change to call rcu_read_lock() explicitly.

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
mm/oom_kill.c:410 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 1
4 locks held by kworker/1:2/651:
 #0:  (events){+.+.+.}, at: [<ffffffff8106aae7>]
process_one_work+0x137/0x4a0
 #1:  (moom_work){+.+...}, at: [<ffffffff8106aae7>]
process_one_work+0x137/0x4a0
 #2:  (tasklist_lock){.+.+..}, at: [<ffffffff810fafd4>]
out_of_memory+0x164/0x3f0
 #3:  (&(&p->alloc_lock)->rlock){+.+...}, at: [<ffffffff810fa48e>]
find_lock_task_mm+0x2e/0x70

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agooom: fix tasklist_lock leak
KOSAKI Motohiro [Thu, 19 Aug 2010 21:13:39 +0000 (14:13 -0700)]
oom: fix tasklist_lock leak

Commit 0aad4b3124 ("oom: fold __out_of_memory into out_of_memory")
introduced a tasklist_lock leak.  Then it caused following obvious
danger warnings and panic.

    ================================================
    [ BUG: lock held when returning to user space! ]
    ------------------------------------------------
    rsyslogd/1422 is leaving the kernel with locks still held!
    1 lock held by rsyslogd/1422:
     #0:  (tasklist_lock){.+.+.+}, at: [<ffffffff810faf64>] out_of_memory+0x164/0x3f0
    BUG: scheduling while atomic: rsyslogd/1422/0x00000002
    INFO: lockdep is turned off.

This patch fixes it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agooom: fix NULL pointer dereference
KOSAKI Motohiro [Thu, 19 Aug 2010 21:13:38 +0000 (14:13 -0700)]
oom: fix NULL pointer dereference

Commit b940fd7035 ("oom: remove unnecessary code and cleanup") added an
unnecessary NULL pointer dereference.  remove it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agodrivers/mmc/host/sdhci-s3c.c: use the correct mutex and card detect function
Kyungmin Park [Thu, 19 Aug 2010 21:13:37 +0000 (14:13 -0700)]
drivers/mmc/host/sdhci-s3c.c: use the correct mutex and card detect function

There's some merge problem between sdhic core and sdhci-s3c host.  After
mutex is changed to spinlock.  It needs to use use spin lock functions and
use the correct card detection function.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agosdhci: add no hi-speed bit quirk support
Kyungmin Park [Thu, 19 Aug 2010 21:13:35 +0000 (14:13 -0700)]
sdhci: add no hi-speed bit quirk support

Some SDHCI controllers like s5pc110 don't have an HISPD bit in the HOSTCTL
register.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agos5pc110: SDHCI-s3c support on s5pc110
Kyungmin Park [Thu, 19 Aug 2010 21:13:35 +0000 (14:13 -0700)]
s5pc110: SDHCI-s3c support on s5pc110

s5pc110 (aka s5pv210) uses the same SDHCI IP.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agos5pc110: SDHCI-s3c can override host capabilities
Kyungmin Park [Thu, 19 Aug 2010 21:13:34 +0000 (14:13 -0700)]
s5pc110: SDHCI-s3c can override host capabilities

Each board can override the default sdhci host capabilities.
Some board has broken features by hardwares and support 8-bit bandwidth.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agolib/radix-tree.c: fix overflow in radix_tree_range_tag_if_tagged()
Jan Kara [Thu, 19 Aug 2010 21:13:33 +0000 (14:13 -0700)]
lib/radix-tree.c: fix overflow in radix_tree_range_tag_if_tagged()

When radix_tree_maxindex() is ~0UL, it can happen that scanning overflows
index and tree traversal code goes astray reading memory until it hits
unreadable memory.  Check for overflow and exit in that case.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agorevert "hwmon: f71882fg: add support for the Fintek F71808E"
Andrew Morton [Thu, 19 Aug 2010 21:13:31 +0000 (14:13 -0700)]
revert "hwmon: f71882fg: add support for the Fintek F71808E"

Revert commit 7721fea3d0fd93fb4d000eb737b444369358d6d3 ("hwmon:
f71882fg: add support for the Fintek F71808E").

Hans said:

: A second review after I've received a data sheet for this device from
: Fintek has turned up a few bugs.
:
: Unfortunately Giel (nor I) have time to fix this in time for the 2.6.36
: cycle.  Therefor I would like to see this patch reverted as not having any
: support for the hwmon function of this superio chip is better then having
: unreliable support.

Cc: Giel van Schijndel <me@mortis.eu>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agokfifo: add explicit error checking in all the examples
Andrea Righi [Thu, 19 Aug 2010 21:13:30 +0000 (14:13 -0700)]
kfifo: add explicit error checking in all the examples

Provide a check in all the kfifo examples to validate the correct
execution of each testcase.

Signed-off-by: Andrea Righi <arighi@develer.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agokfifo: fix a memory leak in dma example
Andrea Righi [Thu, 19 Aug 2010 21:13:30 +0000 (14:13 -0700)]
kfifo: fix a memory leak in dma example

We use a dynamically allocated kfifo in the dma example, so we need to
free it when unloading the module.

Signed-off-by: Andrea Righi <arighi@develer.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agokfifo: fix kernel BUG in dma example
Andrea Righi [Thu, 19 Aug 2010 21:13:29 +0000 (14:13 -0700)]
kfifo: fix kernel BUG in dma example

The scatterlist is used uninitialized in kfifo_dma_in_prepare().  This
triggers the following bug if CONFIG_DEBUG_SG=y:

  ------------[ cut here ]------------
  kernel BUG at include/linux/scatterlist.h:65!
  invalid opcode: 0000 [#1] PREEMPT SMP
  ...
  Call Trace:
   [<ffffffff810a1eab>] setup_sgl+0x6b/0xe0
   [<ffffffffa03d7000>] ? example_init+0x0/0x265 [dma_example]
   [<ffffffff810a2021>] __kfifo_dma_in_prepare+0x21/0x30
   [<ffffffffa03d7124>] example_init+0x124/0x265 [dma_example]
   [<ffffffff810f9c55>] ? trace_module_notify+0x25/0x370
   [<ffffffff81110c6e>] ? free_pages_prepare+0x11e/0x1e0
   [<ffffffff8106f2b1>] ? get_parent_ip+0x11/0x50
   [<ffffffff810f9c55>] ? trace_module_notify+0x25/0x370
   [<ffffffff810b65fd>] ? trace_hardirqs_on+0xd/0x10
   [<ffffffff814beade>] ? mutex_unlock+0xe/0x10
   [<ffffffff810f9c71>] ? trace_module_notify+0x41/0x370
   [<ffffffff810a77d5>] ? __blocking_notifier_call_chain+0x45/0x80
   [<ffffffff81137b7a>] ? vfree+0x2a/0x30
   [<ffffffff810a6ac3>] ? up_read+0x23/0x40
   [<ffffffff810a77f5>] ? __blocking_notifier_call_chain+0x65/0x80
   [<ffffffff810001e3>] do_one_initcall+0x43/0x180
   [<ffffffff810c577a>] sys_init_module+0xba/0x200
   [<ffffffff8103819b>] system_call_fastpath+0x16/0x1b
  RIP  [<ffffffff810a1e31>] setup_sgl_buf+0x1a1/0x1b0
   RSP <ffff88006720dc98>
  ---[ end trace a72b979fd3c1d3a5 ]---

Add the proper initialization to avoid the bug.

Signed-off-by: Andrea Righi <arighi@develer.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agokfifo: add explicit error checking in byte stream example
Andrea Righi [Thu, 19 Aug 2010 21:13:29 +0000 (14:13 -0700)]
kfifo: add explicit error checking in byte stream example

Provide a static array of expected items that kfifo should contain at the
end of the test to validate it.

Signed-off-by: Andrea Righi <arighi@develer.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agokfifo: add kfifo_skip() testcase
Andrea Righi [Thu, 19 Aug 2010 21:13:28 +0000 (14:13 -0700)]
kfifo: add kfifo_skip() testcase

Add a testcase for kfifo_skip() to the byte stream fifo example.

Signed-off-by: Andrea Righi <arighi@develer.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agokfifo: implement missing __kfifo_skip_r()
Andrea Righi [Thu, 19 Aug 2010 21:13:27 +0000 (14:13 -0700)]
kfifo: implement missing __kfifo_skip_r()

kfifo_skip() is currently broken, due to the missing of the internal
helper function.  Add it.

Signed-off-by: Andrea Righi <arighi@develer.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agomatroxfb: fix incorrect use of memcpy_toio()
Ondrej Zary [Thu, 19 Aug 2010 21:13:25 +0000 (14:13 -0700)]
matroxfb: fix incorrect use of memcpy_toio()

Screen is completely corrupted since 2.6.34.  Bisection revealed that it's
caused by commit 6175ddf06b61720 ("x86: Clean up mem*io functions.").

H.  Peter Anvin explained that memcpy_toio() does not copy data in 32bit
chunks anymore on x86.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Petr Vandrovec <vandrove@vc.cvut.cz>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: <stable@kernel.org> [2.6.34.x, 2.6.35.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agorcu: add comment stating that list_empty() applies to RCU-protected lists
Paul E. McKenney [Fri, 20 Aug 2010 04:43:09 +0000 (21:43 -0700)]
rcu: add comment stating that list_empty() applies to RCU-protected lists

Because list_empty() does not dereference any RCU-protected pointers, and
further does not pass such pointers to the caller (so that the caller
does not dereference them either), it is safe to use list_empty() on
RCU-protected lists.  There is no need for a list_empty_rcu().  This
commit adds a comment stating this explicitly.

Requested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
14 years agorcu: apply TINY_PREEMPT_RCU read-side speedup to TREE_PREEMPT_RCU
Paul E. McKenney [Thu, 19 Aug 2010 23:57:45 +0000 (16:57 -0700)]
rcu: apply TINY_PREEMPT_RCU read-side speedup to TREE_PREEMPT_RCU

Replace one of the ACCESS_ONCE() calls in each of __rcu_read_lock()
and __rcu_read_unlock() with barrier() as suggested by Steve Rostedt in
order to avoid the potential compiler-optimization-induced bug noted by
Mathieu Desnoyers.

Located-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
14 years agorcu: combine duplicate code, courtesy of CONFIG_PREEMPT_RCU
Paul E. McKenney [Tue, 17 Aug 2010 21:18:46 +0000 (14:18 -0700)]
rcu: combine duplicate code, courtesy of CONFIG_PREEMPT_RCU

The CONFIG_PREEMPT_RCU kernel configuration parameter was recently
re-introduced, but as an indication of the type of RCU (preemptible
vs. non-preemptible) instead of as selecting a given implementation.
This commit uses CONFIG_PREEMPT_RCU to combine duplicate code
from include/linux/rcutiny.h and include/linux/rcutree.h into
include/linux/rcupdate.h.  This commit also combines a few other pieces
of duplicate code that have accumulated.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
14 years agorcu: Upgrade srcu_read_lock() docbook about SRCU grace periods
Paul E. McKenney [Mon, 16 Aug 2010 17:50:54 +0000 (10:50 -0700)]
rcu: Upgrade srcu_read_lock() docbook about SRCU grace periods

It is illegal to wait for an SRCU grace period while within the
corresponding flavor of SRCU read-side critical section.  Therefore,
this commit updates the srcu_read_lock() docbook accordingly.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
14 years agorcu: document ways of stalling updates in low-memory situations
Paul E. McKenney [Fri, 13 Aug 2010 23:34:22 +0000 (16:34 -0700)]
rcu: document ways of stalling updates in low-memory situations

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
14 years agorcu: repair code-duplication FIXMEs
Paul E. McKenney [Fri, 13 Aug 2010 23:16:25 +0000 (16:16 -0700)]
rcu: repair code-duplication FIXMEs

Combine the duplicate definitions of ULONG_CMP_GE(), ULONG_CMP_LT(),
and rcu_preempt_depth() into include/linux/rcupdate.h.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
14 years agorcu: permit suppressing current grace period's CPU stall warnings
Paul E. McKenney [Tue, 10 Aug 2010 21:28:53 +0000 (14:28 -0700)]
rcu: permit suppressing current grace period's CPU stall warnings

When using a kernel debugger, a long sojourn in the debugger can get
you lots of RCU CPU stall warnings once you resume.  This might not be
helpful, especially if you are using the system console.  This patch
therefore allows RCU CPU stall warnings to be suppressed, but only for
the duration of the current set of grace periods.

This differs from Jason's original patch in that it adds support for
tiny RCU and preemptible RCU, and uses a slightly different method for
suppressing the RCU CPU stall warning messages.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Jason Wessel <jason.wessel@windriver.com>
14 years agorcu: refer RCU CPU stall-warning victims to stallwarn.txt
Paul E. McKenney [Mon, 9 Aug 2010 21:23:03 +0000 (14:23 -0700)]
rcu: refer RCU CPU stall-warning victims to stallwarn.txt

There is some documentation on RCU CPU stall warnings contained in
Documentation/RCU/stallwarn.txt, but it will not be apparent to someone
who runs into such a warning while under time pressure.  This commit
therefore adds comments preceding the printk()s pointing out the
location of this documentation.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
14 years agorcu: update obsolete rcu_read_lock() comment.
Paul E. McKenney [Sun, 8 Aug 2010 04:59:54 +0000 (21:59 -0700)]
rcu: update obsolete rcu_read_lock() comment.

The comment says that blocking is illegal in rcu_read_lock()-style
RCU read-side critical sections, which is no longer entirely true
given preemptible RCU.  This commit provides a fix.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
14 years agorcu: Add a TINY_PREEMPT_RCU
Paul E. McKenney [Tue, 29 Jun 2010 23:49:16 +0000 (16:49 -0700)]
rcu: Add a TINY_PREEMPT_RCU

Implement a small-memory-footprint uniprocessor-only implementation of
preemptible RCU.  This implementation uses but a single blocked-tasks
list rather than the combinatorial number used per leaf rcu_node by
TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies
processing.  This version also takes advantage of uniprocessor execution
to accelerate grace periods in the case where there are no readers.

The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU.

This implementation is a step towards having RCU implementation driven
off of the SMP and PREEMPT kernel configuration variables, which can
happen once this implementation has accumulated sufficient experience.

Removed ACCESS_ONCE() from __rcu_read_unlock() and added barrier() as
suggested by Steve Rostedt in order to avoid the compiler-reordering
issue noted by Mathieu Desnoyers (http://lkml.org/lkml/2010/8/16/183).

As can be seen below, CONFIG_TINY_PREEMPT_RCU represents almost 5Kbyte
savings compared to CONFIG_TREE_PREEMPT_RCU.  Of course, for non-real-time
workloads, CONFIG_TINY_RCU is even better.

CONFIG_TREE_PREEMPT_RCU

   text    data     bss     dec    filename
     13       0       0      13    kernel/rcupdate.o
   6170     825      28    7023    kernel/rcutree.o
   ----
   7026    Total

CONFIG_TINY_PREEMPT_RCU

   text    data     bss     dec    filename
     13       0       0      13    kernel/rcupdate.o
   2081      81       8    2170    kernel/rcutiny.o
   ----
   2183    Total

CONFIG_TINY_RCU (non-preemptible)

   text    data     bss     dec    filename
     13       0       0      13    kernel/rcupdate.o
    719      25       0     744    kernel/rcutiny.o
    ---
    757    Total

Requested-by: Loïc Minier <loic.minier@canonical.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
14 years agox86, apic: Fix apic=debug boot crash
Daniel Kiper [Thu, 19 Aug 2010 22:46:16 +0000 (00:46 +0200)]
x86, apic: Fix apic=debug boot crash

Fix a boot crash when apic=debug is used and the APIC is
not properly initialized.

This issue appears during Xen Dom0 kernel boot but the
fix is generic and the crash could occur on real hardware
as well.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: jeremy@goop.org
Cc: <stable@kernel.org> # .35.x, .34.x, .33.x, .32.x
LKML-Reference: <20100819224616.GB9967@router-fw-old.local.net-space.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
14 years agorcu: Fix RCU_FANOUT help message
Paul E. McKenney [Thu, 5 Aug 2010 00:31:12 +0000 (17:31 -0700)]
rcu: Fix RCU_FANOUT help message

Commit cf244dc01bf68 added a fourth level to the TREE_RCU hierarchy,
but the RCU_FANOUT help message still said "cube root".  This commit
fixes this to "fourth root" and also emphasizes that production
systems are well-served by the default.  (Stress-testing RCU itself
uses small RCU_FANOUT values in order to test large-system code paths
on small(er) systems.)

Located-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
14 years agorcu: Allow RCU CPU stall warnings to be off at boot, but manually enablable
Paul E. McKenney [Wed, 21 Jul 2010 15:05:56 +0000 (08:05 -0700)]
rcu: Allow RCU CPU stall warnings to be off at boot, but manually enablable

Currently, if RCU CPU stall warnings are enabled, they are enabled
immediately upon boot.  They can be manually disabled via /sys (and
also re-enabled via /sys), and are automatically disabled upon panic.
However, some users need RCU CPU stalls to be disabled at boot time,
but to be enabled without rebuilding/rebooting.  For example, someone
running a real-time application in production might not want the
additional latency of RCU CPU stall detection in normal operation, but
might need to enable it at any point for fault isolation purposes.

This commit therefore provides a new CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE
kernel configuration parameter that maintains the current behavior
(enable at boot) by default, but allows a kernel to be configured
with RCU CPU stall detection built into the kernel, but disabled at
boot time.

Requested-by: Clark Williams <williams@redhat.com>
Requested-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
14 years agorcu: restrict TREE_RCU to SMP builds with !PREEMPT
Paul E. McKenney [Wed, 21 Jul 2010 13:52:40 +0000 (06:52 -0700)]
rcu: restrict TREE_RCU to SMP builds with !PREEMPT

Because both TINY_RCU and TREE_PREEMPT_RCU have been in mainline for
several releases, it is time to restrict the use of TREE_RCU to SMP
non-preemptible systems.  This reduces testing/validation effort.  This
commit is a first step towards driving the selection of RCU implementation
directly off of the SMP and PREEMPT configuration parameters.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
14 years agorcu: allow RCU CPU stall warning messages to be controlled in /sys
Paul E. McKenney [Wed, 14 Jul 2010 21:38:30 +0000 (14:38 -0700)]
rcu: allow RCU CPU stall warning messages to be controlled in /sys

Set the permissions of the rcu_cpu_stall_suppress to 644 to enable RCU
CPU stall warnings to be enabled and disabled at runtime via sysfs.

Suggested-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
14 years agoUpdate call_rcu() usage, add synchronize_rcu()
Paul E. McKenney [Wed, 19 May 2010 17:46:55 +0000 (10:46 -0700)]
Update call_rcu() usage, add synchronize_rcu()

Reported-by: Kyle Hubert <khubert@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agoUpdate documentation to note the passage of INIT_RCU_HEAD()
Paul E. McKenney [Wed, 19 May 2010 17:42:16 +0000 (10:42 -0700)]
Update documentation to note the passage of INIT_RCU_HEAD()

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agorcu head remove init
Mathieu Desnoyers [Sat, 17 Apr 2010 12:48:41 +0000 (08:48 -0400)]
rcu head remove init

RCU heads really don't need to be initialized. Their state before call_rcu()
really does not matter.

We need to keep init/destroy_rcu_head_on_stack() though, since we want
debugobjects to be able to keep track of these objects.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: David S. Miller <davem@davemloft.net>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: akpm@linux-foundation.org
CC: mingo@elte.hu
CC: laijs@cn.fujitsu.com
CC: dipankar@in.ibm.com
CC: josh@joshtriplett.org
CC: dvhltc@us.ibm.com
CC: niv@us.ibm.com
CC: tglx@linutronix.de
CC: peterz@infradead.org
CC: rostedt@goodmis.org
CC: Valdis.Kletnieks@vt.edu
CC: dhowells@redhat.com
CC: eric.dumazet@gmail.com
CC: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agokernel: __rcu annotations
Arnd Bergmann [Wed, 24 Feb 2010 19:01:56 +0000 (20:01 +0100)]
kernel: __rcu annotations

This adds annotations for RCU operations in core kernel components

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agoidr: __rcu annotations
Arnd Bergmann [Fri, 26 Feb 2010 13:53:26 +0000 (14:53 +0100)]
idr: __rcu annotations

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agoradix-tree: __rcu annotations
Arnd Bergmann [Thu, 25 Feb 2010 22:43:52 +0000 (23:43 +0100)]
radix-tree: __rcu annotations

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agonotifiers: __rcu annotations
Arnd Bergmann [Wed, 24 Feb 2010 19:00:13 +0000 (20:00 +0100)]
notifiers: __rcu annotations

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agorcu: improve kerneldoc for rcu_read_lock(), call_rcu(), and synchronize_rcu()
Paul E. McKenney [Fri, 9 Jul 2010 00:38:59 +0000 (17:38 -0700)]
rcu: improve kerneldoc for rcu_read_lock(), call_rcu(), and synchronize_rcu()

Make it explicit that new RCU read-side critical sections that start
after call_rcu() and synchronize_rcu() start might still be running
after the end of the relevant grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agorcu: add boot parameter to suppress RCU CPU stall warning messages
Paul E. McKenney [Wed, 30 Jun 2010 18:43:52 +0000 (11:43 -0700)]
rcu: add boot parameter to suppress RCU CPU stall warning messages

Although the RCU CPU stall warning messages are a very good way to alert
people to a problem, once alerted, it is sometimes helpful to shut them
off in order to avoid obscuring other messages that might be being used
to track down the problem.  Although you can rebuild the kernel with
CONFIG_RCU_CPU_STALL_DETECTOR=n, this is sometimes inconvenient.  This
commit therefore adds a boot parameter named "rcu_cpu_stall_suppress"
that shuts these messages off without requiring a rebuild (though a
reboot might be needed for those not brave enough to patch their kernel
while it is running).

This message-suppression was already in place for the panic case, so this
commit need only rename the variable and export it via module_param().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
14 years agorcu: make CPU stall warning timeout configurable
Paul E. McKenney [Wed, 2 Jun 2010 23:21:38 +0000 (16:21 -0700)]
rcu: make CPU stall warning timeout configurable

Also set the default to 60 seconds, up from the previous hard-coded timeout
of 10 seconds.  This allows people who care to set short timeouts, while
avoiding people with unusual configurations (make randconfig!!!) from being
bothered with spurious CPU stall warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agoAdd RCU check for find_task_by_vpid().
Tetsuo Handa [Fri, 25 Jun 2010 16:08:19 +0000 (01:08 +0900)]
Add RCU check for find_task_by_vpid().

find_task_by_vpid() says "Must be called under rcu_read_lock().". But due to
commit 3120438 "rcu: Disable lockdep checking in RCU list-traversal primitives",
we are currently unable to catch "find_task_by_vpid() with tasklist_lock held
but RCU lock not held" errors due to the RCU-lockdep checks being
suppressed in the RCU variants of the struct list_head traversals.
This commit therefore places an explicit check for being in an RCU
read-side critical section in find_task_by_pid_ns().

  ===================================================
  [ INFO: suspicious rcu_dereference_check() usage. ]
  ---------------------------------------------------
  kernel/pid.c:386 invoked rcu_dereference_check() without protection!

  other info that might help us debug this:

  rcu_scheduler_active = 1, debug_locks = 1
  1 lock held by rc.sysinit/1102:
   #0:  (tasklist_lock){.+.+..}, at: [<c1048340>] sys_setpgid+0x40/0x160

  stack backtrace:
  Pid: 1102, comm: rc.sysinit Not tainted 2.6.35-rc3-dirty #1
  Call Trace:
   [<c105e714>] lockdep_rcu_dereference+0x94/0xb0
   [<c104b4cd>] find_task_by_pid_ns+0x6d/0x70
   [<c104b4e8>] find_task_by_vpid+0x18/0x20
   [<c1048347>] sys_setpgid+0x47/0x160
   [<c1002b50>] sysenter_do_call+0x12/0x36

Commit updated to use a new rcu_lockdep_assert() exported API rather than
the old internal __do_rcu_dereference().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agorcu: simplify the usage of percpu data
Lai Jiangshan [Mon, 28 Jun 2010 08:25:04 +0000 (16:25 +0800)]
rcu: simplify the usage of percpu data

&percpu_data is compatible with allocated percpu data.

And we use it and remove the "->rda[NR_CPUS]" array, saving significant
storage on systems with large numbers of CPUs.  This does add an additional
level of indirection and thus an additional cache line referenced, but
because ->rda is not used on the read side, this is OK.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agorcutorture: add random preemption
Lai Jiangshan [Mon, 21 Jun 2010 08:57:42 +0000 (16:57 +0800)]
rcutorture: add random preemption

Add random preemption to help we to torture the preemptable rcu.

srcu_read_delay() also calls rcu_read_delay() for shorter delays.

Added comment to preempt_schedule() call indicating that no quiescent
states happen if preemption is disabled.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agorcu: add shiny new debug assists to Documentation/RCU/checklist.txt
Paul E. McKenney [Wed, 16 Jun 2010 23:48:13 +0000 (16:48 -0700)]
rcu: add shiny new debug assists to Documentation/RCU/checklist.txt

Add a section describing PROVE_RCU, DEBUG_OBJECTS_RCU_HEAD, and
the __rcu sparse checking to the RCU checklist.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agokvm: add __rcu annotations
Arnd Bergmann [Thu, 4 Mar 2010 14:59:23 +0000 (15:59 +0100)]
kvm: add __rcu annotations

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agonet/netfilter: __rcu annotations
Arnd Bergmann [Tue, 9 Mar 2010 19:59:15 +0000 (20:59 +0100)]
net/netfilter: __rcu annotations

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agoinput: __rcu annotations
Arnd Bergmann [Thu, 4 Mar 2010 14:50:28 +0000 (15:50 +0100)]
input: __rcu annotations

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agonfs: __rcu annotations
Arnd Bergmann [Wed, 3 Mar 2010 09:20:10 +0000 (10:20 +0100)]
nfs: __rcu annotations

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
14 years agokeys: __rcu annotations
Arnd Bergmann [Fri, 26 Feb 2010 17:01:20 +0000 (18:01 +0100)]
keys: __rcu annotations

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agocredentials: rcu annotation
Arnd Bergmann [Wed, 24 Feb 2010 18:45:09 +0000 (19:45 +0100)]
credentials: rcu annotation

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agocgroups: __rcu annotations
Arnd Bergmann [Wed, 24 Feb 2010 18:41:39 +0000 (19:41 +0100)]
cgroups: __rcu annotations

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agorculist: avoid __rcu annotations
Arnd Bergmann [Thu, 25 Feb 2010 15:55:13 +0000 (16:55 +0100)]
rculist: avoid __rcu annotations

This avoids warnings from missing __rcu annotations
in the rculist implementation, making it possible to
use the same lists in both RCU and non-RCU cases.

We can add rculist annotations later, together with
lockdep support for rculist, which is missing as well,
but that may involve changing all the users.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agorcu: define __rcu address space modifier for sparse
Paul E. McKenney [Wed, 28 Apr 2010 21:39:09 +0000 (14:39 -0700)]
rcu: define __rcu address space modifier for sparse

This commit provides definitions for the __rcu annotation defined earlier.
This annotation permits sparse to check for correct use of RCU-protected
pointers.  If a pointer that is annotated with __rcu is accessed
directly (as opposed to via rcu_dereference(), rcu_assign_pointer(),
or one of their variants), sparse can be made to complain.  To enable
such complaints, use the new default-disabled CONFIG_SPARSE_RCU_POINTER
kernel configuration option.  Please note that these sparse complaints are
intended to be a debugging aid, -not- a code-style-enforcement mechanism.

There are special rcu_dereference_protected() and rcu_access_pointer()
accessors for use when RCU read-side protection is not required, for
example, when no other CPU has access to the data structure in question
or while the current CPU hold the update-side lock.

This patch also updates a number of docbook comments that were showing
their age.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christopher Li <sparse@chrisli.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
14 years agonet: convert to rcu_dereference_index_check()
Paul E. McKenney [Tue, 15 Jun 2010 00:06:21 +0000 (17:06 -0700)]
net: convert to rcu_dereference_index_check()

The task_cls_classid() function applies rcu_dereference() to integers,
which does not work with the shiny new sparse-based checking in
rcu_dereference().  This commit therefore moves to the new RCU API
rcu_dereference_index_check().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
14 years agox86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues
Borislav Petkov [Thu, 19 Aug 2010 18:10:29 +0000 (20:10 +0200)]
x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues

When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d:
Stuck ??" message due to multiple cores concurrently accessing the
cpu_callin_mask, among others.

Since these codepaths are not protected from concurrent access due to
the fact that there's no sane reason for making an already complex
code unnecessarily more complex - we hit the issue only when insanely
switching cores off- and online - serialize hotplugging cores on the
sysfs level and be done with it.

[ v2.1: fix !HOTPLUG_CPU build ]

Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100819181029.GC17171@aftab>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
14 years agoMerge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 19 Aug 2010 16:06:49 +0000 (09:06 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  kprobes/x86: Fix the return address of multiple kretprobes
  perf tools: Fix build error on read only source.
  perf, x86: Fix Intel-nhm PMU programming errata workaround

14 years agomtd: nand: Fix regression in BBM detection
Brian Norris [Wed, 18 Aug 2010 18:25:04 +0000 (11:25 -0700)]
mtd: nand: Fix regression in BBM detection

Commit c7b28e25cb9beb943aead770ff14551b55fa8c79 ("mtd: nand: refactor BB
marker detection") caused a regression in detection of factory-set bad
block markers, especially for certain small-page NAND. This fix removes
some unneeded constraints on using NAND_SMALL_BADBLOCK_POS, making the
detection code more correct.

This regression can be seen, for example, in Hynix HY27US081G1M and
similar.

Signed-off-by: Brian Norris <norris@broadcom.com>
Tested-by: Michael Guntsche <mike@it-loops.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
14 years agokprobes/x86: Fix the return address of multiple kretprobes
KUMANO Syuhei [Sun, 15 Aug 2010 06:18:04 +0000 (15:18 +0900)]
kprobes/x86: Fix the return address of multiple kretprobes

Fix the return address of subsequent kretprobes when multiple
kretprobes are set on the same function.

For example:

 # cd /sys/kernel/debug/tracing
 # echo "r:event1 sys_symlink" > kprobe_events
 # echo "r:event2 sys_symlink" >> kprobe_events
 # echo 1 > events/kprobes/enable
 # ln -s /tmp/foo /tmp/bar

(without this patch)

 # cat trace
              ln-897   [000] 20404.133727: event1: (kretprobe_trampoline+0x0/0x4c <- sys_symlink)
              ln-897   [000] 20404.133747: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink)

(with this patch)

 # cat trace
              ln-740   [000] 13799.491076: event1: (system_call_fastpath+0x16/0x1b <- sys_symlink)
              ln-740   [000] 13799.491096: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink)

Signed-off-by: KUMANO Syuhei <kumano.prog@gmail.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
LKML-Reference: <1281853084.3254.11.camel@camp10-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
14 years agoMerge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/acme...
Ingo Molnar [Thu, 19 Aug 2010 10:25:29 +0000 (12:25 +0200)]
Merge branch 'perf/urgent' of git://git./linux/kernel/git/acme/linux-2.6 into perf/urgent

14 years agoMerge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
Linus Torvalds [Wed, 18 Aug 2010 22:45:23 +0000 (15:45 -0700)]
Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix an Oops in the NFSv4 atomic open code
  NFS: Fix the selection of security flavours in Kconfig
  NFS: fix the return value of nfs_file_fsync()
  rpcrdma: Fix SQ size calculation when memreg is FRMR
  xprtrdma: Do not truncate iova_start values in frmr registrations.
  nfs: Remove redundant NULL check upon kfree()
  nfs: Add "lookupcache" to displayed mount options
  NFS: allow close-to-open cache semantics to apply to root of NFS filesystem
  SUNRPC: fix NFS client over TCP hangs due to packet loss (Bug 16494)

14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Linus Torvalds [Wed, 18 Aug 2010 22:29:38 +0000 (15:29 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  USB HID: Add ID for eGalax Multitouch used in JooJoo tablet
  HID: hiddev: fix memory corruption due to invalid intfdata
  HID: hiddev: protect against disconnect/NULL-dereference race
  HID: picolcd: correct ordering of framebuffer freeing
  HID: picolcd: testing the wrong variable

14 years agoMerge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
Linus Torvalds [Wed, 18 Aug 2010 20:27:41 +0000 (13:27 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/aegl/linux-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] Fix build error: conflicting types for ‘sys_execve’

14 years agox86-32: Fix dummy trampoline-related inline stubs
H. Peter Anvin [Wed, 18 Aug 2010 18:42:23 +0000 (11:42 -0700)]
x86-32: Fix dummy trampoline-related inline stubs

Fix dummy inline stubs for trampoline-related functions when no
trampolines exist (until we get rid of the no-trampoline case
entirely.)

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <4C6C294D.3030404@zytor.com>

14 years agoFix the declaration of sys_execve() in asm-generic/syscalls.h
David Howells [Wed, 18 Aug 2010 17:55:33 +0000 (18:55 +0100)]
Fix the declaration of sys_execve() in asm-generic/syscalls.h

Fix the declaration of sys_execve() in asm-generic/syscalls.h to have
various consts applied to its pointers.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years ago[IA64] Fix build error: conflicting types for ‘sys_execve’
Tony Luck [Wed, 18 Aug 2010 17:17:44 +0000 (10:17 -0700)]
[IA64] Fix build error: conflicting types for ‘sys_execve’

arch/ia64/kernel/process.c:636: error: conflicting types for ‘sys_execve’

commit d7627467b7a8dd6944885290a03a07ceb28c10eb
Make do_execve() take a const filename pointer

Missed the declaration of sys_execve in the ia64 asm/unistd.h (perhaps
because there is no reason for it to be there ... it might be a left over
from the COMPAT code?). Just delete the conflicting version.

Signed-off-by: Tony Luck <tony.luck@intel.com>
14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
Linus Torvalds [Wed, 18 Aug 2010 16:35:08 +0000 (09:35 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  fs: brlock vfsmount_lock
  fs: scale files_lock
  lglock: introduce special lglock and brlock spin locks
  tty: fix fu_list abuse
  fs: cleanup files_lock locking
  fs: remove extra lookup in __lookup_hash
  fs: fs_struct rwlock to spinlock
  apparmor: use task path helpers
  fs: dentry allocation consolidation
  fs: fix do_lookup false negative
  mbcache: Limit the maximum number of cache entries
  hostfs ->follow_link() braino
  hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
  remove SWRITE* I/O types
  kill BH_Ordered flag
  vfs: update ctime when changing the file's permission by setfacl
  cramfs: only unlock new inodes
  fix reiserfs_evict_inode end_writeback second call

14 years agommc: build fix: mmc_pm_notify is only available with CONFIG_PM=y
Uwe Kleine-König [Wed, 18 Aug 2010 16:25:38 +0000 (09:25 -0700)]
mmc: build fix: mmc_pm_notify is only available with CONFIG_PM=y

This fixes a build breakage introduced by commit 4c2ef25fe0b8 ("mmc: fix
all hangs related to mmc/sd card insert/removal during suspend/resume")

Cc: David Brownell <david-b@pacbell.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-mmc@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Acked-by: Maxim Levitsky <maximlevitsky@gmail.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoperf tools: Fix build error on read only source.
Kusanagi Kouichi [Wed, 18 Aug 2010 16:32:37 +0000 (13:32 -0300)]
perf tools: Fix build error on read only source.

Parts of the build process were generating files outside the specified
O= directory, causing the build to fail on systems where the sources are
in a read only file system.

Fix it by using $(OUTPUT) on these locations.

Also check that $(OUTPUT) actually exists, just like the top level
kernel Makefile does. Otherwise the failure message emitted is
completely misleading.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20100817140841.0859362C03A@msa106.auone-net.jp>
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
14 years agoMerge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 18 Aug 2010 16:32:13 +0000 (09:32 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf tools: Fix build on POSIX shells
  latencytop: Fix kconfig dependency warnings
  perf annotate tui: Fix exit and RIGHT keys handling
  tracing: Sanitize value returned from write(trace_marker, "...", len)
  tracing/events: Convert format output to seq_file
  tracing: Extend recordmcount to better support Blackfin mcount
  tracing: Fix ring_buffer_read_page reading out of page boundary
  tracing: Fix an unallocated memory access in function_graph

14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Wed, 18 Aug 2010 16:30:08 +0000 (09:30 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: emu10k1 - delay the PCM interrupts (add pcm_irq_delay parameter)
  ALSA: hda - Fix ALC680 base model capture
  ASoC: Remove DSP mode support for WM8776
  ALSA: hda - Add quirk for Dell Vostro 1220
  ALSA: riptide - Fix detection / load of firmware files

14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Linus Torvalds [Wed, 18 Aug 2010 16:27:10 +0000 (09:27 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/gerg/m68knommu

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68knommu: include sched.h in ColdFire/SPI driver
  m68knommu: formatting of pointers in printk()
  m68knommu: arch/m68k/include/asm/ide.h fix for nommu

14 years agoMerge branch 'for-linus' of git://neil.brown.name/md
Linus Torvalds [Wed, 18 Aug 2010 16:26:42 +0000 (09:26 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md

* 'for-linus' of git://neil.brown.name/md:
  md raid-1/10 Fix bio_rw bit manipulations again
  md: provide appropriate return value for spare_active functions.
  md: Notify sysfs when RAID1/5/10 disk is In_sync.
  Update recovery_offset even when external metadata is used.

14 years agoMerge branch 'merge-devicetree' of git://git.secretlab.ca/git/linux-2.6
Linus Torvalds [Wed, 18 Aug 2010 16:26:17 +0000 (09:26 -0700)]
Merge branch 'merge-devicetree' of git://git.secretlab.ca/git/linux-2.6

* 'merge-devicetree' of git://git.secretlab.ca/git/linux-2.6:
  spi.h: missing kernel-doc notation, please fix
  of: fix missing headers for of_address_to_resource() in MTD and SysACE drivers
  of: Fix missing includes
  ata: update for of_device to platform_device replacement
  microblaze: Fix of: eliminate of_device->node and dev_archdata->{of,prom}_node
  microblaze: Fix of/address: Merge all of the bus translation code
  booting-without-of: Remove nonexistent chapters from TOC, fix numbering

14 years agox86-32: Separate 1:1 pagetables from swapper_pg_dir
Joerg Roedel [Mon, 16 Aug 2010 12:38:33 +0000 (14:38 +0200)]
x86-32: Separate 1:1 pagetables from swapper_pg_dir

This patch fixes machine crashes which occur when heavily exercising the
CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
AMD Erratum 383 and result in a fatal machine check exception. Here's
the scenario:

1. On 32-bit, the swapper_pg_dir page table is used as the initial page
table for booting a secondary CPU.

2. To make this work, swapper_pg_dir needs a direct mapping of physical
memory in it (the low mappings). By adding those low, large page (2M)
mappings (PAE kernel), we create the necessary conditions for Erratum
383 to occur.

3. Other CPUs which do not participate in the off- and onlining game may
use swapper_pg_dir while the low mappings are present (when leave_mm is
called). For all steps below, the CPU referred to is a CPU that is using
swapper_pg_dir, and not the CPU which is being onlined.

4. The presence of the low mappings in swapper_pg_dir can result
in TLB entries for addresses below __PAGE_OFFSET to be established
speculatively. These TLB entries are marked global and large.

5. When the CPU with such TLB entry switches to another page table, this
TLB entry remains because it is global.

6. The process then generates an access to an address covered by the
above TLB entry but there is a permission mismatch - the TLB entry
covers a large global page not accessible to userspace.

7. Due to this permission mismatch a new 4kb, user TLB entry gets
established. Further, Erratum 383 provides for a small window of time
where both TLB entries are present. This results in an uncorrectable
machine check exception signalling a TLB multimatch which panics the
machine.

There are two ways to fix this issue:

        1. Always do a global TLB flush when a new cr3 is loaded and the
        old page table was swapper_pg_dir. I consider this a hack hard
        to understand and with performance implications

        2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
        does.

This patch implements solution 2. It introduces a trampoline_pg_dir
which has the same layout as swapper_pg_dir with low_mappings. This page
table is used as the initial page table of the booting CPU. Later in the
bringup process, it switches to swapper_pg_dir and does a global TLB
flush. This fixes the crashes in our test cases.

-v2: switch to swapper_pg_dir right after entering start_secondary() so
that we are able to access percpu data which might not be mapped in the
trampoline page table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <20100816123833.GB28147@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
14 years agox86, cpu: Fix regression in AMD errata checking code
Hans Rosenfeld [Wed, 18 Aug 2010 14:19:50 +0000 (16:19 +0200)]
x86, cpu: Fix regression in AMD errata checking code

A bug in the family-model-stepping matching code caused the presence of
errata to go undetected when OSVW was not used. This causes hangs on
some K8 systems because the E400 workaround is not enabled.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
LKML-Reference: <1282141190-930137-1-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
14 years agoNFS: Fix an Oops in the NFSv4 atomic open code
Trond Myklebust [Wed, 18 Aug 2010 13:25:42 +0000 (09:25 -0400)]
NFS: Fix an Oops in the NFSv4 atomic open code

Adam Lackorzynski reports:

with 2.6.35.2 I'm getting this reproducible Oops:

[  110.825396] BUG: unable to handle kernel NULL pointer dereference at
(null)
[  110.828638] IP: [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4
[  110.828638] PGD be89f067 PUD bf18f067 PMD 0
[  110.828638] Oops: 0000 [#1] SMP
[  110.828638] last sysfs file: /sys/class/net/lo/operstate
[  110.828638] CPU 2
[  110.828638] Modules linked in: rtc_cmos rtc_core rtc_lib amd64_edac_mod
i2c_amd756 edac_core i2c_core dm_mirror dm_region_hash dm_log dm_snapshot
sg sr_mod usb_storage ohci_hcd mptspi tg3 mptscsih mptbase usbcore nls_base
[last unloaded: scsi_wait_scan]
[  110.828638]
[  110.828638] Pid: 11264, comm: setchecksum Not tainted 2.6.35.2 #1
[  110.828638] RIP: 0010:[<ffffffff811247b7>]  [<ffffffff811247b7>]
encode_attrs+0x1a/0x2a4
[  110.828638] RSP: 0000:ffff88003bf5b878  EFLAGS: 00010296
[  110.828638] RAX: ffff8800bddb48a8 RBX: ffff88003bf5bb18 RCX:
0000000000000000
[  110.828638] RDX: ffff8800be258800 RSI: 0000000000000000 RDI:
ffff88003bf5b9f8
[  110.828638] RBP: 0000000000000000 R08: ffff8800bddb48a8 R09:
0000000000000004
[  110.828638] R10: 0000000000000003 R11: ffff8800be779000 R12:
ffff8800be258800
[  110.828638] R13: ffff88003bf5b9f8 R14: ffff88003bf5bb20 R15:
ffff8800be258800
[  110.828638] FS:  0000000000000000(0000) GS:ffff880041e00000(0063)
knlGS:00000000556bd6b0
[  110.828638] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[  110.828638] CR2: 0000000000000000 CR3: 00000000be8ef000 CR4:
00000000000006e0
[  110.828638] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  110.828638] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  110.828638] Process setchecksum (pid: 11264, threadinfo
ffff88003bf5a000, task ffff88003f232210)
[  110.828638] Stack:
[  110.828638]  0000000000000000 ffff8800bfbcf920 0000000000000000
0000000000000ffe
[  110.828638] <0> 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[  110.828638] <0> 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[  110.828638] Call Trace:
[  110.828638]  [<ffffffff81124c1f>] ? nfs4_xdr_enc_setattr+0x90/0xb4
[  110.828638]  [<ffffffff81371161>] ? call_transmit+0x1c3/0x24a
[  110.828638]  [<ffffffff813774d9>] ? __rpc_execute+0x78/0x22a
[  110.828638]  [<ffffffff81371a91>] ? rpc_run_task+0x21/0x2b
[  110.828638]  [<ffffffff81371b7e>] ? rpc_call_sync+0x3d/0x5d
[  110.828638]  [<ffffffff8111e284>] ? _nfs4_do_setattr+0x11b/0x147
[  110.828638]  [<ffffffff81109466>] ? nfs_init_locked+0x0/0x32
[  110.828638]  [<ffffffff810ac521>] ? ifind+0x4e/0x90
[  110.828638]  [<ffffffff8111e2fb>] ? nfs4_do_setattr+0x4b/0x6e
[  110.828638]  [<ffffffff8111e634>] ? nfs4_do_open+0x291/0x3a6
[  110.828638]  [<ffffffff8111ed81>] ? nfs4_open_revalidate+0x63/0x14a
[  110.828638]  [<ffffffff811056c4>] ? nfs_open_revalidate+0xd7/0x161
[  110.828638]  [<ffffffff810a2de4>] ? do_lookup+0x1a4/0x201
[  110.828638]  [<ffffffff810a4733>] ? link_path_walk+0x6a/0x9d5
[  110.828638]  [<ffffffff810a42b6>] ? do_last+0x17b/0x58e
[  110.828638]  [<ffffffff810a5fbe>] ? do_filp_open+0x1bd/0x56e
[  110.828638]  [<ffffffff811cd5e0>] ? _atomic_dec_and_lock+0x30/0x48
[  110.828638]  [<ffffffff810a9b1b>] ? dput+0x37/0x152
[  110.828638]  [<ffffffff810ae063>] ? alloc_fd+0x69/0x10a
[  110.828638]  [<ffffffff81099f39>] ? do_sys_open+0x56/0x100
[  110.828638]  [<ffffffff81027a22>] ? ia32_sysret+0x0/0x5
[  110.828638] Code: 83 f1 01 e8 f5 ca ff ff 48 83 c4 50 5b 5d 41 5c c3 41
57 41 56 41 55 49 89 fd 41 54 49 89 d4 55 48 89 f5 53 48 81 ec 18 01 00 00
<8b> 06 89 c2 83 e2 08 83 fa 01 19 db 83 e3 f8 83 c3 18 a8 01 8d
[  110.828638] RIP  [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4
[  110.828638]  RSP <ffff88003bf5b878>
[  110.828638] CR2: 0000000000000000
[  112.840396] ---[ end trace 95282e83fd77358f ]---

We need to ensure that the O_EXCL flag is turned off if the user doesn't
set O_CREAT.

Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
14 years agoMerge branch 'fix/asoc' into for-linus
Takashi Iwai [Wed, 18 Aug 2010 13:22:18 +0000 (15:22 +0200)]
Merge branch 'fix/asoc' into for-linus

14 years agoMerge branch 'fix/hda' into for-linus
Takashi Iwai [Wed, 18 Aug 2010 13:22:15 +0000 (15:22 +0200)]
Merge branch 'fix/hda' into for-linus

14 years agoALSA: emu10k1 - delay the PCM interrupts (add pcm_irq_delay parameter)
Jaroslav Kysela [Wed, 18 Aug 2010 12:08:17 +0000 (14:08 +0200)]
ALSA: emu10k1 - delay the PCM interrupts (add pcm_irq_delay parameter)

With some hardware combinations, the PCM interrupts are acknowledged
before the period boundary from the emu10k1 chip. The midlevel PCM code
gets confused and the playback stream is interrupted.

It seems that the interrupt processing shift by 2 samples is enough
to fix this issue. This default value does not harm other,
non-affected hardware.

More information: Kernel bugzilla bug#16300

[A copmile warning fixed by tiwai]

Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
14 years agofs: brlock vfsmount_lock
Nick Piggin [Tue, 17 Aug 2010 18:37:39 +0000 (04:37 +1000)]
fs: brlock vfsmount_lock

fs: brlock vfsmount_lock

Use a brlock for the vfsmount lock. It must be taken for write whenever
modifying the mount hash or associated fields, and may be taken for read when
performing mount hash lookups.

A new lock is added for the mnt-id allocator, so it doesn't need to take
the heavy vfsmount write-lock.

The number of atomics should remain the same for fastpath rlock cases, though
code would be slightly slower due to per-cpu access. Scalability is not not be
much improved in common cases yet, due to other locks (ie. dcache_lock) getting
in the way. However path lookups crossing mountpoints should be one case where
scalability is improved (currently requiring the global lock).

The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
Altix system (high latency to remote nodes), a simple umount microbenchmark
(mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
took 6.8s, afterwards took 7.1s, about 5% slower.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: scale files_lock
Nick Piggin [Tue, 17 Aug 2010 18:37:38 +0000 (04:37 +1000)]
fs: scale files_lock

fs: scale files_lock

Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).

One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.

However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.

A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.

Testing results:

On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.

Booting:    locks=  25049 cpu-hits=  23174 (92.5%) node-hits=  23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64   locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)

So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.

Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.

                throughput
2.6.34-rc2      24.5
+patch          24.9

                us      sys     idle    IO wait (in %)
2.6.34-rc2      51.25   28.25   17.25   3.25
+patch          53.75   18.5    19      8.75

So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.

Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.

Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agolglock: introduce special lglock and brlock spin locks
Nick Piggin [Tue, 17 Aug 2010 18:37:37 +0000 (04:37 +1000)]
lglock: introduce special lglock and brlock spin locks

lglock: introduce special lglock and brlock spin locks

This patch introduces "local-global" locks (lglocks). These can be used to:

- Provide fast exclusive access to per-CPU data, with exclusive access to
  another CPU's data allowed but possibly subject to contention, and to provide
  very slow exclusive access to all per-CPU data.
- Or to provide very fast and scalable read serialisation, and to provide
  very slow exclusive serialisation of data (not necessarily per-CPU data).

Brlocks are also implemented as a short-hand notation for the latter use
case.

Thanks to Paul for local/global naming convention.

Cc: linux-kernel@vger.kernel.org
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agotty: fix fu_list abuse
Nick Piggin [Tue, 17 Aug 2010 18:37:36 +0000 (04:37 +1000)]
tty: fix fu_list abuse

tty: fix fu_list abuse

tty code abuses fu_list, which causes a bug in remount,ro handling.

If a tty device node is opened on a filesystem, then the last link to the inode
removed, the filesystem will be allowed to be remounted readonly. This is
because fs_may_remount_ro does not find the 0 link tty inode on the file sb
list (because the tty code incorrectly removed it to use for its own purpose).
This can result in a filesystem with errors after it is marked "clean".

Taking idea from Christoph's initial patch, allocate a tty private struct
at file->private_data and put our required list fields in there, linking
file and tty. This makes tty nodes behave the same way as other device nodes
and avoid meddling with the vfs, and avoids this bug.

The error handling is not trivial in the tty code, so for this bugfix, I take
the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
This is not a problem because our allocator doesn't fail small allocs as a rule
anyway. So proper error handling is left as an exercise for tty hackers.

[ Arguably filesystem's device inode would ideally be divorced from the
driver's pseudo inode when it is opened, but in practice it's not clear whether
that will ever be worth implementing. ]

Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: cleanup files_lock locking
Nick Piggin [Tue, 17 Aug 2010 18:37:35 +0000 (04:37 +1000)]
fs: cleanup files_lock locking

fs: cleanup files_lock locking

Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.

Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>