Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

sched: Fix idle_cpu()

On -rt we observed hackbench waking all 400 tasks to a single cpu.
This is because of select_idle_sibling()'s interaction with the new
ipi based wakeup scheme.

The existing idle_cpu() test only checks to see if the current task on
that cpu is the idle task, it does not take already queued tasks into
account, nor does it take queued to be woken tasks into account.

If the remote wakeup IPIs come hard enough, there won't be time to
schedule away from the idle task, and would thus keep thinking the cpu
was in fact idle, regardless of the fact that there were already
several hundred tasks runnable.

We couldn't reproduce on mainline, but there's no reason it couldn't
happen.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-3o30p18b2paswpc9ohy2gltp@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>

llist: Remove cpu_relax() usage in cmpxchg loops

Initial benchmarks show they're a net loss:

$ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done
$ echo 4096 32000 64 128 > /proc/sys/kernel/sem
$ ./sembench -t 2048 -w 1900 -o 0

Pre:

run time 30 seconds 778936 worker burns per second
run time 30 seconds 912190 worker burns per second
run time 30 seconds 817506 worker burns per second
run time 30 seconds 830870 worker burns per second
run time 30 seconds 845056 worker burns per second

Post:

run time 30 seconds 905920 worker burns per second
run time 30 seconds 849046 worker burns per second
run time 30 seconds 886286 worker burns per second
run time 30 seconds 822320 worker burns per second
run time 30 seconds 900283 worker burns per second

So about 4% faster. (!)

cpu_relax() stalls the pipeline, therefore, when used in a tight loop
it has the following benefits:

- allows SMT siblings to have a go;
- reduces pressure on the CPU interconnect.

However, cmpxchg loops are unfair and thus have unbounded completion
time, therefore we should avoid getting in such heavily contended
situations where the above benefits make any difference.

A typical cmpxchg loop should not go round more than a handfull of
times at worst, therefore adding extra delays just slows things down.

Since the llist primitives are new, there aren't any bad users yet,
and we should avoid growing them. Heavily contended sites should
generally be better off using the ticket locks for serialization since
they provide bounded completion times (fifo-fair over the cpus).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1315836358.26517.43.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: Convert to struct llist

Use the generic llist primitives.

We had a private lockless list implementation in the scheduler in the wake-list
code, now that we have a generic llist implementation that provides all required
operations, switch to it.

This patch is not expected to change any behavior.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1315836353.26517.42.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>

llist: Add llist_next()

So we don't have to expose the struct list_node member.

Cc: Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315836348.26517.41.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>

irq_work: Use llist in the struct irq_work logic

Use llist in irq_work instead of the lock-less linked list
implementation in irq_work to avoid the code duplication.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-6-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>

llist: Return whether list is empty before adding in llist_add()

Extend the llist_add*() functions to return a success indicator, this
allows us in the scheduler code to send an IPI if the queue was empty.

( There's no effect on existing users, because the list_add_xxx() functions
are inline, thus this will be optimized out by the compiler if not used
by callers. )

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-5-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>

llist: Move cpu_relax() to after the cmpxchg()

If in llist_add()/etc. functions the first cmpxchg() call succeeds, it is
not necessary to use cpu_relax() before the cmpxchg(). So cpu_relax() in
a busy loop involving cmpxchg() should go after cmpxchg() instead of before
that.

This patch fixes this for all involved llist functions.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-4-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>

llist: Remove the platform-dependent NMI checks

Remove the nmi() checks spread around the code. in_nmi() is not available
on every architecture and it's a pretty obscure and ugly check in any case.

Cc: Huang Ying <ying.huang@intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-3-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>

llist: Make some llist functions inline

Because llist code will be used in performance critical scheduler
code path, make llist_add() and llist_del_all() inline to avoid
function calling overhead and related 'glue' overhead.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-2-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Merge branch 'linus' into sched/core

Merge reason: pick up the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

Merge branch 'hwmon-for-linus' of git://github.com/groeck/linux

* 'hwmon-for-linus' of git://github.com/groeck/linux:
hwmon: (coretemp) Avoid leaving around dangling pointer
hwmon: (coretemp) Fixup platform device ID change

Merge git://github.com/davem330/ide

* git://github.com/davem330/ide:
ide-disk: Fix request requeuing

Merge branch 'btrfs-3.0' of git://github.com/chrismason/linux

* 'btrfs-3.0' of git://github.com/chrismason/linux:
Btrfs: force a page fault if we have a shorty copy on a page boundary

ide-disk: Fix request requeuing

Simon Kirby reported that on his RAID setup with idedisk underneath
the box OOMs after a couple of days of runtime. Running with
CONFIG_DEBUG_KMEMLEAK pointed to idedisk_prep_fn() which unconditionally
allocates an ide_cmd struct. However, ide_requeue_and_plug() can be
called more than once per request, either from the request issue or the
IRQ handler path and do blk_peek_request() ends up in idedisk_prep_fn()
repeatedly, allocating a struct ide_cmd everytime and "forgetting" the
previous pointer.

Make sure the code reuses the old allocated chunk.

Reported-and-tested-by: Simon Kirby <sim@hostway.ca>
Cc: <stable@kernel.org> [ 39.x, 3.0.x ]
Link: http://marc.info/?l=linux-kernel&m=131667641517919
Link: http://lkml.kernel.org/r/20110922072643.GA27232@hostway.ca
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git.infradead.org/users/sameo/mfd-2.6

* 'for-linus' of git://git.infradead.org/users/sameo/mfd-2.6:
mfd: Fix generic irq chip ack function name for jz4740-adc

Merge branch 'for-linus' of git://github.com/tiwai/sound

* 'for-linus' of git://github.com/tiwai/sound:
ALSA: hda - Fix a regression of the position-buffer check

Merge branch 'perf-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip

* 'perf-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
perf tools: Fix raw sample reading

Merge branches 'irq-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip

* 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  irq: Fix check for already initialized irq_domain in irq_domain_add
  irq: Add declaration of irq_domain_simple_ops to irqdomain.h

* 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  x86/rtc: Don't recursively acquire rtc_lock

* 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  posix-cpu-timers: Cure SMP wobbles
  sched: Fix up wchan borkage
  sched/rt: Migrate equal priority tasks to available CPUs

Btrfs: force a page fault if we have a shorty copy on a page boundary

A user reported a problem where ceph was getting into 100% cpu usage while doing
some writing.  It turns out it's because we were doing a short write on a not
uptodate page, which means we'd fall back at one page at a time and fault the
page in.  The problem is our position is on the page boundary, so our fault in
logic wasn't actually reading the page, so we'd just spin forever or until the
page got read in by somebody else.  This will force a readpage if we end up
doing a short copy.  Alexandre could reproduce this easily with ceph and reports
it fixes his problem.  I also wrote a reproducer that no longer hangs my box
with this patch.  Thanks,

Reported-and-tested-by: Alexandre Oliva <aoliva@redhat.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Merge branch 'perf/urgent' of git://github.com/acmel/linux into perf/urgent

posix-cpu-timers: Cure SMP wobbles

David reported:

  Attached below is a watered-down version of rt/tst-cpuclock2.c from
  GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
  similar.

  Run it several times, and you will see cases where the main thread
  will measure a process clock difference before and after the nanosleep
  which is smaller than the cpu-burner thread's individual thread clock
  difference.  This doesn't make any sense since the cpu-burner thread
  is part of the top-level process's thread group.

  I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
  64-bit binaries).

  For example:

  [davem@boricha build-x86_64-linux]$ ./test
  process: before(0.001221967) after(0.498624371) diff(497402404)
  thread:  before(0.000081692) after(0.498316431) diff(498234739)
  self:    before(0.001223521) after(0.001240219) diff(16698)
  [davem@boricha build-x86_64-linux]$

  The diff of 'process' should always be >= the diff of 'thread'.

  I make sure to wrap the 'thread' clock measurements the most tightly
  around the nanosleep() call, and that the 'process' clock measurements
  are the outer-most ones.

  ---
  #include <unistd.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <time.h>
  #include <fcntl.h>
  #include <string.h>
  #include <errno.h>
  #include <pthread.h>

  static pthread_barrier_t barrier;

  static void *chew_cpu(void *arg)
  {
  pthread_barrier_wait(&barrier);
  while (1)
  __asm__ __volatile__("" : : : "memory");
  return NULL;
  }

  int main(void)
  {
  clockid_t process_clock, my_thread_clock, th_clock;
  struct timespec process_before, process_after;
  struct timespec me_before, me_after;
  struct timespec th_before, th_after;
  struct timespec sleeptime;
  unsigned long diff;
  pthread_t th;
  int err;

  err = clock_getcpuclockid(0, &process_clock);
  if (err)
  return 1;

  err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
  if (err)
  return 1;

  pthread_barrier_init(&barrier, NULL, 2);
  err = pthread_create(&th, NULL, chew_cpu, NULL);
  if (err)
  return 1;

  err = pthread_getcpuclockid(th, &th_clock);
  if (err)
  return 1;

  pthread_barrier_wait(&barrier);

  err = clock_gettime(process_clock, &process_before);
  if (err)
  return 1;

  err = clock_gettime(my_thread_clock, &me_before);
  if (err)
  return 1;

  err = clock_gettime(th_clock, &th_before);
  if (err)
  return 1;

  sleeptime.tv_sec = 0;
  sleeptime.tv_nsec = 500000000;
  nanosleep(&sleeptime, NULL);

  err = clock_gettime(th_clock, &th_after);
  if (err)
  return 1;

  err = clock_gettime(my_thread_clock, &me_after);
  if (err)
  return 1;

  err = clock_gettime(process_clock, &process_after);
  if (err)
  return 1;

  diff = process_after.tv_nsec - process_before.tv_nsec;
  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
process_before.tv_sec, process_before.tv_nsec,
process_after.tv_sec, process_after.tv_nsec, diff);
  diff = th_after.tv_nsec - th_before.tv_nsec;
  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
th_before.tv_sec, th_before.tv_nsec,
th_after.tv_sec, th_after.tv_nsec, diff);
  diff = me_after.tv_nsec - me_before.tv_nsec;
  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
me_before.tv_sec, me_before.tv_nsec,
me_after.tv_sec, me_after.tv_nsec, diff);

  return 0;
  }

This is due to us using p->se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.

This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().

Aside of that it makes the function safe on 32 bit systems. The old
code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
64bit value and could be changed on another cpu at the same time.

Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
Tested-by: David Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

ALSA: hda - Fix a regression of the position-buffer check

The commit a810364a0424c297242c6c66071a42f7675a5568
ALSA: hda - Handle -1 as invalid position, too
caused a regression on some machines that require the position-buffer
instead of LPIB, e.g. resulting in noises with mic recording with
PulseAudio.

This patch fixes the detection by delaying the test at the timing as
same as 3.0, i.e. doing the position check only when requested in
azx_position_ok().

Reported-and-tested-by: Rocko Requin <rockorequin@hotmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Resource: fix wrong resource window calculation

__find_resource() incorrectly returns a resource window which overlaps
an existing allocated window.  This happens when the parent's
resource-window spans 0x00000000 to 0xffffffff and is entirely allocated
to all its children resource-windows.

__find_resource() looks for gaps in resource allocation among the
children resource windows.  When it encounters the last child window it
blindly tries the range next to one allocated to the last child.  Since
the last child's window ends at 0xffffffff the calculation overflows,
leading the algorithm to believe that any window in the range 0x0000000
to 0xfffffff is available for allocation.  This leads to a conflicting
window allocation.

Michal Ludvig reported this issue seen on his platform.  The following
patch fixes the problem and has been verified by Michal.  I believe this
bug has been there for ages.  It got exposed by git commit 2bbc6942273b
("PCI : ability to relocate assigned pci-resources")

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Tested-by: Michal Ludvig <mludvig@logix.net.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://github.com/NewDreamNetwork/ceph-client

* 'for-linus' of git://github.com/NewDreamNetwork/ceph-client:
  libceph: fix pg_temp mapping update
  libceph: fix pg_temp mapping calculation
  libceph: fix linger request requeuing
  libceph: fix parse options memory leak
  libceph: initialize ack_stamp to avoid unnecessary connection reset

Merge branch 'v4l_for_linus' of git://linuxtv.org/mchehab/for_linus

* 'v4l_for_linus' of git://linuxtv.org/mchehab/for_linus:
  [media] omap3isp: Fix build error in ispccdc.c
  [media] uvcvideo: Fix crash when linking entities
  [media] v4l: Make sure we hold a reference to the v4l2_device before using it
  [media] v4l: Fix use-after-free case in v4l2_device_release
  [media] uvcvideo: Set alternate setting 0 on resume if the bus has been reset
  [media] OMAP_VOUT: Fix build break caused by update_mode removal in DSS2

Merge branch 'for-linus' of git://git390.marist.edu/linux-2.6

* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] cio: fix cio_tpi ignoring adapter interrupts
  [S390] gmap: always up mmap_sem properly
  [S390] Do not clobber personality flags on exec

Merge git://github.com/davem330/sparc

* git://github.com/davem330/sparc:
  sparc64: Force the execute bit in OpenFirmware's translation entries.
  sparc: Make '-p' boot option meaningful again.
  sparc, exec: remove redundant addr_limit assignment
  sparc64: Future proof Niagara cpu detection.

Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~keithp/linux

* 'drm-intel-fixes' of git://people.freedesktop.org/~keithp/linux:
  drm/i915: FBC off for ironlake and older, otherwise on by default
  drm/i915: Enable SDVO hotplug interrupts for HDMI and DVI
  drm/i915: Enable dither whenever display bpc < frame buffer bpc

powerpc: Fix device-tree matching for Apple U4 bridge

Apple Quad G5 has some oddity in it's device-tree which causes the new
generic matching code to fail to relate nodes for PCI-E devices below U4
with their respective struct pci_dev. This breaks graphics on those
machines among others.

This fixes it using a quirk which copies the node pointer from the host
bridge for the root complex, which makes the generic code work for the
children afterward.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bootup: move 'usermodehelper_enable()' a little earlier

Commit d5767c53535a ("bootup: move 'usermodehelper_enable()' to the end
of do_basic_setup()") moved 'usermodehelper_enable()' to end of
do_basic_setup() to after the initcalls.  But then I get failed to let
uvesafb work on my computer, and lose the splash boot.

So maybe we could start usermodehelper_enable a little early to make
some task work that need eary init with the help of user mode.

[ I would *really* prefer that initcalls not call into user space - even
  the real 'init' hasn't been execve'd yet, after all! But for uvesafb
  it really does look like we don't have much choice.

  I considered doing this when we mount the root filesystem, but
  depending on config options that is in multiple places.  We could do
  the usermode helper enable as a rootfs_initcall()..

  So I'm just using wang yanqing's trivial patch.  It's not wonderful,
  but it's simple and should work.  We should revisit this some day,
  though.      - Linus ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

perf tools: Fix raw sample reading

Wrong pointer is being passed for raw data sanity checking, when parsing
sample event.

This ends up with invalid event and perf record being stuck in
__perf_session__process_events function during processing build IDs
(process_buildids function).

Following command hangs up in my setup:
./perf record -e raw_syscalls:sys_enter ls

The fix is to use proper pointer to the raw data instead of the 'u'
union.

Reviewed-by: David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1317308709-9474-2-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

sparc64: Force the execute bit in OpenFirmware's translation entries.

In the OF 'translations' property, the template TTEs in the mappings
never specify the executable bit.  This is the case even though some
of these mappings are for OF's code segment.

Therefore, we need to force the execute bit on in every mapping.

This problem can only really trigger on Niagara/sun4v machines and the
history behind this is a little complicated.

Previous to sun4v, the sun4u TTE entries lacked a hardware execute
permission bit.  So OF didn't have to ever worry about setting
anything to handle executable pages.  Any valid TTE loaded into the
I-TLB would be respected by the chip.

But sun4v Niagara chips have a real hardware enforced executable bit
in their TTEs.  So it has to be set or else the I-TLB throws an
instruction access exception with type code 6 (protection violation).

We've been extremely fortunate to not get bitten by this in the past.

The best I can tell is that the OF's mappings for it's executable code
were mapped using permanent locked mappings on sun4v in the past.
Therefore, the fact that we didn't have the exec bit set in the OF
translations we would use did not matter in practice.

Thanks to Greg Onufer for helping me track this down.

Signed-off-by: David S. Miller <davem@davemloft.net>

bootup: move 'usermodehelper_enable()' to the end of do_basic_setup()

Doing it just before starting to call into cpu_idle() made a sick kind
of sense only because the original bug we fixed (see commit
288d5abec831: "Boot up with usermodehelper disabled") was about problems
with some scheduler data structures not being initialized, and they had
better be initialized at that point.

But it really didn't make any other conceptual sense, and doing it after
the initial "schedule()" call for the idle thread actually opened up a
race: what if the main initialization thread did everything without
needing to sleep, and got all the way into user land too? Without
actually having scheduled back to the idle thread?

Now, in normal circumstances that doesn't ever happen, but it looks like
Richard Cochran triggered exactly that on his ARM IXP4xx machines:

  "I have some ARM IXP4xx based machines that use the two on chip MAC
   ports (aka NPEs).  The NPE needs a firmware in order to function.
   Ever since the following commit [that 288d5abec831 one], it is no
   longer possible to bring up the interfaces during the init scripts."

with a call trace showing an ioctl coming from user space. Richard says:

  "The init is busybox, and the startup script does mount, syslogd, and
   then ifup, so that all can go by quickly."

The fix is to move the usermodehelper_enable() into the main 'init'
thread, and just put it after we've done all our initcalls.  By then,
everything really should be up, but we've obviously not actually started
the user-mode portion of init yet.

Reported-and-tested-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

libceph: fix pg_temp mapping update

The incremental map updates have a record for each pg_temp mapping that is
to be add/updated (len > 0) or removed (len == 0). The old code was
written as if the updates were a complete enumeration; that was just wrong.
Update the code to remove 0-length entries and drop the rbtree traversal.

This avoids misdirected (and hung) requests that manifest as server
errors like

[WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11

Signed-off-by: Sage Weil <sage@newdream.net>

libceph: fix pg_temp mapping calculation

We need to apply the modulo pg_num calculation before looking up a pgid in
the pg_temp mapping rbtree. This fixes pg_temp mappings, and fixes
(some) misdirected requests that result in messages like

[WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11

on the server and stall make the client block without getting a reply (at
least until the pg_temp mapping goes way, but that can take a long long
time).

Reorder calc_pg_raw() a bit to make more sense.

Signed-off-by: Sage Weil <sage@newdream.net>

Merge git://github.com/davem330/net

* git://github.com/davem330/net:
  ipv6-multicast: Fix memory leak in IPv6 multicast.
  ipv6: check return value for dst_alloc
  net: check return value for dst_alloc
  ipv6-multicast: Fix memory leak in input path.
  bnx2x: add missing break in bnx2x_dcbnl_get_cap
  bnx2x: fix WOL by enablement PME in config space
  bnx2x: fix hw attention handling
  net: fix a typo in Documentation/networking/scaling.txt
  ath9k: Fix a dma warning/memory leak
  rtlwifi: rtl8192cu: Fix unitialized struct
  iwlagn: fix dangling scan request
  batman-adv: do_bcast has to be true for broadcast packets only
  cfg80211: Fix validation of AKM suites
  iwlegacy: do not use interruptible waits
  iwlegacy: fix command queue timeout
  ath9k_hw: Fix Rx DMA stuck for AR9003 chips

Merge git://bedivere.hansenpartnership.com/git/scsi-rc-fixes-2.6

* git://bedivere.hansenpartnership.com/git/scsi-rc-fixes-2.6:
  [SCSI] 3w-9xxx: fix iommu_iova leak
  [SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference
  [SCSI] scsi: qla4xxx needs libiscsi.o
  [SCSI] libsas: fix failure to revalidate domain for anything but the first expander child.
  [SCSI] aacraid: reset should disable MSI interrupt

hwmon: (coretemp) Avoid leaving around dangling pointer

Storing the struct temp_data pointer allocated from create_core_data()
when returning an error has the potential of leaving around a pointer
to freed memory. Reset it to NULL for error returns.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Jean Delvare <khali@linux-fr.org>

hwmon: (coretemp) Fixup platform device ID change

With recent change "hwmon: (coretemp) don't use kernel assigned CPU
number as platform device ID", the microcode check is now running on
random CPU. Fix that by checking the microcode before creating the
platform device rather than at probe time.

Also avoid calling TO_PHYS_ID(cpu) twice in the same function, it's
expensive.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

* 'for-linus' of git://git.kernel.dk/linux-block:
block: Free queue resources at blk_release_queue()

Merge branch 'writeback-for-linus' of git://github.com/fengguang/linux

* 'writeback-for-linus' of git://github.com/fengguang/linux:
writeback: show raw dirtied_when in trace writeback_single_inode

block: Free queue resources at blk_release_queue()

A kernel crash is observed when a mounted ext3/ext4 filesystem is
physically removed. The problem is that blk_cleanup_queue() frees up
some resources eg by calling elevator_exit(), which are not checked for
in normal operation. So we should rather move these calls to the
destructor function blk_release_queue() as at that point all remaining
references are gone. However, in doing so we have to ensure that any
externally supplied queue_lock is disconnected as the driver might free
up the lock after the call of blk_cleanup_queue(),

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-davem' of git://git.infradead.org/users/linville/wireless

Linux 3.1-rc8

Merge branch 'for-linus' of git://github.com/tiwai/sound

* 'for-linus' of git://github.com/tiwai/sound:
  ASoC: ssm2602: Re-enable oscillator after suspend
  ALSA: usb-audio: Check for possible chip NULL pointer before clearing probing flag
  ALSA: hda/realtek - Don't detect LO jack when identical with HP
  ALSA: hda/realtek - Avoid bogus HP-pin assignment
  ALSA: HDA: No power nids on 92HD93
  ASoC: omap-mcbsp: Do not attempt to change DAI sysclk if stream is active

Merge branch 'pm-fixes' of git://github.com/rjwysocki/linux-pm

* 'pm-fixes' of git://github.com/rjwysocki/linux-pm:
PM / Clocks: Do not acquire a mutex under a spinlock

Merge branch 'master' of git://git.infradead.org/users/linville/wireless into for-davem

ipv6-multicast: Fix memory leak in IPv6 multicast.

If reg_vif_xmit cannot find a routing entry, be sure to
free the skb before returning the error.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: check return value for dst_alloc

return value of dst_alloc must be checked before use

Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: check return value for dst_alloc

return value of dst_alloc must be checked before use

Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6-multicast: Fix memory leak in input path.

Have to free the skb before returning if we fail
the fib lookup.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'batman-adv/maint' of git://git.open-mesh.org/linux-merge

bnx2x: add missing break in bnx2x_dcbnl_get_cap

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: fix WOL by enablement PME in config space

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: fix hw attention handling

Use register name to initialize attention mask

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix a typo in Documentation/networking/scaling.txt

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fix/asoc' into for-linus

vfs: remove LOOKUP_NO_AUTOMOUNT flag

That flag no longer makes sense, since we don't look up automount points
as eagerly any more. Additionally, it turns out that the NO_AUTOMOUNT
handling was buggy to begin with: it would avoid automounting even for
cases where we really *needed* to do the automount handling, and could
return ENOENT for autofs entries that hadn't been instantiated yet.

With our new non-eager automount semantics, one discussion has been
about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the
newfstatat() and fstatat64() system calls), but it's probably not worth
it: you can always force at least directory automounting by simply
adding the final '/' to the filename, which works for *all* of the stat
family system calls, old and new.

So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a
result of our bad default behavior.

Acked-by: Ian Kent <raven@themaw.net>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ASoC: ssm2602: Re-enable oscillator after suspend

Currently the the internal oscillator is powered down when entering BIAS_OFF
state, but not re-enabled when going back to BIAS_STANDBY. As a result the
CODEC will stop working after suspend if the internal oscillator is used to
generate the sysclock signal. This patch fixes it by clearing the appropriate
bit in the power down register when the CODEC is re-enabled.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: stable@kernel.org

VFS: Fix the remaining automounter semantics regressions

The concensus seems to be that system calls such as stat() etc should
not trigger an automount.  Neither should the l* versions.

This patch therefore adds a LOOKUP_AUTOMOUNT flag to tag those lookups
that _should_ trigger an automount on the last path element.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ Edited to leave out the cases that are already covered by LOOKUP_OPEN,
  LOOKUP_DIRECTORY and LOOKUP_CREATE - all of which also fundamentally
  force automounting for their own reasons   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vfs pathname lookup: Add LOOKUP_AUTOMOUNT flag

Since we've now turned around and made LOOKUP_FOLLOW *not* force an
automount, we want to add the ability to force an automount event on
lookup even if we don't happen to have one of the other flags that force
it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..)

Most cases will never want to use this, since you'd normally want to
delay automounting as long as possible, which usually implies
LOOKUP_OPEN (when we open a file or directory, we really cannot avoid
the automount any more).

But Trond argued sufficiently forcefully that at a minimum bind mounting
a file and quotactl will want to force the automount lookup. Some other
cases (like nfs_follow_remote_path()) could use it too, although
LOOKUP_DIRECTORY would work there as well.

This commit just adds the flag and logic, no users yet, though. It also
doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and
was made irrelevant by the same change that made us not follow on
LOOKUP_FOLLOW.

Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'samsung-fixes-3' of git://github.com/kgene/linux-samsung

* 'samsung-fixes-3' of git://github.com/kgene/linux-samsung:
  ARM: EXYNOS4: Rename sclk_cam clocks for FIMC driver
  ARM: S5PV210: Rename sclk_cam clocks for FIMC media driver
  ARM: S5P: fix incorrect loop iterator usage on gpio-interrupt
  ARM: S3C2443: Fix bit-reset in setrate of clk_armdiv

ARM: EXYNOS4: Rename sclk_cam clocks for FIMC driver

The sclk_cam clocks are now controlled by the top level FIMC media
device driver bound to "s5p-fimc-md" platform device.
Rename sclk_cam clocks so they accessible by the corresponding
driver.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>

ARM: S5PV210: Rename sclk_cam clocks for FIMC media driver

The sclk_cam clocks are now controlled by the top level FIMC media
device driver bound to "s5p-fimc-md" platform device.
Rename sclk_cam clocks so they accessible by the corresponding
driver.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>

Merge branch 'hwmon-for-linus' of git://github.com/groeck/linux

* 'hwmon-for-linus' of git://github.com/groeck/linux:
  hwmon: (coretemp) remove struct platform_data * parameter from create_core_data()
  hwmon: (coretemp) constify static data
  hwmon: (coretemp) don't use kernel assigned CPU number as platform device ID
  hwmon: (ds620) Fix handling of negative temperatures
  hwmon: (w83791d) rename prototype parameter from 'register' to 'reg'
  hwmon: (coretemp) Don't use threshold registers for tempX_max
  hwmon: (coretemp) Let the user force TjMax
  hwmon: (coretemp) Drop duplicate function get_pkg_tjmax

Merge branch 'kvm-updates/3.1' of git://github.com/avikivity/kvm

* 'kvm-updates/3.1' of git://github.com/avikivity/kvm:
KVM: x86 emulator: fix Src2CL decode
KVM: MMU: fix incorrect return of spte

Merge branch 'fixes' of ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm

* 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
  ARM: 7099/1: futex: preserve oldval in SMP __futex_atomic_op
  ARM: dma-mapping: free allocated page if unable to map
  ARM: fix vmlinux.lds.S discarding sections
  ARM: nommu: fix warning with checksyscalls.sh
  ARM: 7091/1: errata: D-cache line maintenance operation by MVA may not succeed

ath9k: Fix a dma warning/memory leak

proper dma_unmapping and freeing of skb's has to be done in the rx
cleanup for EDMA chipsets when the device is unloaded and this also
seems to address the following warning which shows up occasionally when
the device is unloaded

Call Trace:
[<c0148cd2>] warn_slowpath_common+0x72/0xa0
[<c03b669c>] ? dma_debug_device_change+0x19c/0x200
[<c03b669c>] ? dma_debug_device_change+0x19c/0x200
[<c0148da3>] warn_slowpath_fmt+0x33/0x40
[<c03b669c>] dma_debug_device_change+0x19c/0x200
[<c0657f12>] notifier_call_chain+0x82/0xb0
[<c0171370>] __blocking_notifier_call_chain+0x60/0x90
[<c01713bf>] blocking_notifier_call_chain+0x1f/0x30
[<c044f594>] __device_release_driver+0xa4/0xc0
[<c044f647>] driver_detach+0x97/0xa0
[<c044e65c>] bus_remove_driver+0x6c/0xe0
[<c029af0b>] ? sysfs_addrm_finish+0x4b/0x60
[<c0450109>] driver_unregister+0x49/0x80
[<c0299f54>] ? sysfs_remove_file+0x14/0x20
[<c03c3ab2>] pci_unregister_driver+0x32/0x80
[<f92c2162>] ath_pci_exit+0x12/0x20 [ath9k]
[<f92c8467>] ath9k_exit+0x17/0x36 [ath9k]
[<c06523cd>] ? mutex_unlock+0xd/0x10
[<c018e27f>] sys_delete_module+0x13f/0x200
[<c02139bb>] ? sys_munmap+0x4b/0x60
[<c06547c5>] ? restore_all+0xf/0xf
[<c0657a20>] ? spurious_fault+0xe0/0xe0
[<c01832f4>] ? trace_hardirqs_on_caller+0xf4/0x180
[<c065b863>] sysenter_do_call+0x12/0x38
---[ end trace 16e1c1521c06bcf9 ]---
Mapped at:
[<c03b7938>] debug_dma_map_page+0x48/0x120
[<f92ba3e8>] ath_rx_init+0x3f8/0x4b0 [ath9k]
[<f92b5ae4>] ath9k_init_device+0x4c4/0x7b0 [ath9k]
[<f92c2813>] ath_pci_probe+0x263/0x330 [ath9k]

Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rtlwifi: rtl8192cu: Fix unitialized struct

Driver rtl8192cu assigns a new struct rtl_tcb_desc object, but fails to
clear it.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@kernel.org> [2.6.39+]
Signed-off-by: John W. Linville <linville@tuxdriver.com>

iwlagn: fix dangling scan request

If iwl_scan_initiate() fails for any reason,
priv->scan_request and priv->scan_vif are left
dangling. This can lead to a crash later when
iwl_bg_scan_completed() tries to run a pending
scan request.

In practice, this seems to be very rare due to
the STATUS_SCANNING check earlier. That check,
however, is wrong -- it should allow a scan to
be queued when a reset/roc scan is going on.
When a normal scan is already going on, a new
one can't be issued by mac80211, so that code
can be removed completely. I introduced this
bug when adding off-channel support in commit
266af4c745952e9bebf687dd68af58df553cb59d.

Cc: stable@kernel.org [3.0]
Reported-by: Peng Yan <peng.yan@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

PM / Clocks: Do not acquire a mutex under a spinlock

Commit b7ab83e (PM: Use spinlock instead of mutex in clock
management functions) introduced a regression causing clocks_mutex
to be acquired under a spinlock. This happens because
pm_clk_suspend() and pm_clk_resume() call pm_clk_acquire() under
pcd->lock, but pm_clk_acquire() executes clk_get() which causes
clocks_mutex to be acquired. Similarly, __pm_clk_remove(),
executed under pcd->lock, calls clk_put(), which also causes
clocks_mutex to be acquired.

To fix those problems make pm_clk_add() call pm_clk_acquire(), so
that pm_clk_suspend() and pm_clk_resume() don't have to do that.
Change pm_clk_remove() and pm_clk_destroy() to separate
modifications of the pcd->clock_list list from the actual removal of
PM clock entry objects done by __pm_clk_remove().

Reported-and-tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>

[S390] cio: fix cio_tpi ignoring adapter interrupts

Ensure that adapter interrupts are correctly processed when they are
retrieved using TEST PENDING INTERRUPTION.

Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[S390] gmap: always up mmap_sem properly

If gmap_unmap_segment figures that the segment was not mapped in the
first place, it need to up mmap_sem on exit.

Cc: <stable@kernel.org>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[S390] Do not clobber personality flags on exec

Analog to git commit 59e4c3a2fe9cb1681bb2cff508ff79466f7585ba
do not clear the additional personality flags on exec. We
need to inherit the personality bits in PER_MASK across exec.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[SCSI] 3w-9xxx: fix iommu_iova leak

Following reports on the list, it looks like the 3e-9xxx driver will leak dma
mappings every time we get a transient queueing error back from the card.
This is because it maps the sg list in the routine that sends the command, but
doesn't unmap again in the transient failure path (even though the command is
sent back to the block layer). Fix by unmapping before returning the status.

Reported-by: Chris Boot <bootc@bootc.net>
Tested-by: Chris Boot <bootc@bootc.net>
Acked-by: Adam Radford <aradford@gmail.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

[SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference

This oops was reported recently:
d:mon> e
cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120]
    pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3]
    lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i]
    sp: c0000000fd4c73a0
   msr: 8000000000009032
   dar: 0
dsisr: 40000000
  current = 0xc0000000fd640d40
  paca    = 0xc00000000054ff80
    pid   = 5085, comm = iscsid
d:mon> t
[c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4 [cxgb3i]
[c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8 [libcxgbi]
[c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18
[scsi_transport_iscsi2]
[c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4
[c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c
[c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c
[c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8
[c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac
[c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c
[c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 00000080da560cfc

The root cause was an EEH error, which sent us down the offload_close path in
the cxgb3 driver, which in turn sets cdev->l2opt to NULL, without regard for
upper layer driver (like the cxgbi drivers) which might have execution contexts
in the middle of its use. The result is the oops above, when t3_l2t_get attempts
to dereference L2DATA(cdev)->nentries in arp_hash right after the EEH error handler sets it to NULL.

The fix is to prevent the setting of the NULL pointer until after there are no
further users of it.  The t3cdev->l2opt pointer is now converted to be an rcu
pointer and the L2DATA macro is now called under the protection of the
rcu_read_lock().  When the EEH error path:
t3_adapter_error->offload_close->cxgb3_offload_deactivate
Is exectured, setting of that l2opt pointer to NULL, is now gated on an rcu
quiescence point, preventing, allowing L2DATA callers to safely check for a NULL
pointer without concern that the underlying data will be freeded before the
pointer is dereferenced.

This has been tested by the reporter and shown to fix the reproted oops

[nhorman: fix up unitinialised variable reported by Dan Carpenter]
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Karen Xie <kxie@chelsio.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

ALSA: usb-audio: Check for possible chip NULL pointer before clearing probing flag

Before clearing the probing flag in the error exit path, check that the
chip pointer is not NULL.

Signed-off-by: Thomas Pfaff <tpfaff@gmx.net>
Cc: <stable@kernel.org> [2.6.39+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda/realtek - Don't detect LO jack when identical with HP

The spec->autocfg.line_out_pins[] may contain the same pins as hp_pins[]
depending on the configuration. When they are identical, detecting the
line_jack_present flag screws up the auto-mute because alc_line_automute()
is called unconditionally at initialization while it won't be triggered
by unsol events, thus the old line_jack_present flag is kept for the
whole run.

For fixing this buggy behavior, the driver needs to check whether the
line-outs are really individual, and skip if same as headphone jacks.

Reference: https://bugzilla.novell.com/show_bug.cgi?id=716104

Signed-off-by: Takashi Iwai <tiwai@suse.de>

ARM: 7099/1: futex: preserve oldval in SMP __futex_atomic_op

The SMP implementation of __futex_atomic_op clobbers oldval with the
status flag from the exclusive store. This causes it to always read as
zero when performing the FUTEX_OP_CMP_* operation.

This patch updates the ARM __futex_atomic_op implementations to take a
tmp argument, allowing us to store the strex status flag without
overwriting the register containing oldval.

Cc: stable@kernel.org
Reported-by: Minho Ban <mhban@samsung.com>
Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

sched, tracing: Show PREEMPT_ACTIVE state in trace_sched_switch

We had need to see the difference between scheduling a runnable task and
a runnable task being involuntarily preempted.

No app should rely on the old string output (the binary
trace event record format is not changed).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1316164603.10174.11.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: Remove redundant test in check_preempt_tick()

The caller already checks for nr_running > 1, therefore we don't have
to do so again.

Signed-off-by: Wang Xingchao <xingchao.wang@intel.com>
Reviewed-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1316194552-12019-1-git-send-email-xingchao.wang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: Fix up wchan borkage

Commit c259e01a1ec ("sched: Separate the scheduler entry for
preemption") contained a boo-boo wrecking wchan output. It forgot to
put the new schedule() function in the __sched section and thereby
doesn't get properly ignored for things like wchan.

Tested-by: Simon Kirby <sim@hostway.ca>
Cc: stable@kernel.org # 2.6.39+
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110923000346.GA25425@hostway.ca
Signed-off-by: Ingo Molnar <mingo@elte.hu>

ALSA: hda/realtek - Avoid bogus HP-pin assignment

When the headphone pin is assigned as primary output to line_out_pins[],
the automatic HP-pin assignment by ASSID must be suppressed. Otherwise
a wrong pin might be assigned to the headphone and breaks the auto-mute.

Reference: https://bugzilla.novell.com/show_bug.cgi?id=716104

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: <stable@kernel.org>

ARM: dma-mapping: free allocated page if unable to map

If the attempt to map a page for DMA fails (eg, because we're out of
mapping space) then we must not hold on to the page we allocated for
DMA - doing so will result in a memory leak.

Cc: <stable@kernel.org>
Reported-by: Bryan Phillippe <bp@darkforest.org>
Tested-by: Bryan Phillippe <bp@darkforest.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ARM: S5P: fix incorrect loop iterator usage on gpio-interrupt

Loop iterator value after terminating list_for_each_entry()
is not NULL. This patch fixes incorrect iterator usage in
GPIO interrupt code for SAMSUNG S5P platforms.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>

ARM: S3C2443: Fix bit-reset in setrate of clk_armdiv

The changed statement should set the old armdiv bits to 0
and not everything else, before setting the new value.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>

ptrace: PTRACE_LISTEN forgets to unlock ->siglock

If PTRACE_LISTEN fails after lock_task_sighand() it doesn't drop ->siglock.

Reported-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

KVM: x86 emulator: fix Src2CL decode

Src2CL decode (used for double width shifts) erronously decodes only bit 3
of %rcx, instead of bits 7:0.

Fix by decoding %cl in its entirety.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: MMU: fix incorrect return of spte

__update_clear_spte_slow should return original spte while the
current code returns low half of original spte combined with high
half of new spte.

Signed-off-by: Zhao Jin <cronozhj@gmail.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

ALSA: HDA: No power nids on 92HD93

This patch is necessary to make internal speakers work on this chip.

Cc: stable@kernel.org
BugLink: http://bugs.launchpad.net/bugs/854468
Tested-by: Alex Wolfson <alex.wolfson@canonical.com>
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge branch 'spi/merge' of git://git.secretlab.ca/git/linux-2.6

* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6:
spi: Fix WARN when removing spi-fsl-spi module
spi/imx: Fix spi-imx when the hardware SPI chipselects are used

spi: Fix WARN when removing spi-fsl-spi module

If CPM mode is not used, the fsl_dummy_rx variable is never allocated.  When
the cleanup attempts to free it, the reference count is zero and a WARN is
generated.  The same CPM mode check used in the initialize is applied to the
free as well.

Tested on 2.6.33 with the previous spi_mpc8xxx driver.  The renamed
spi-fsl-spi driver looks to have the same problem.

Signed-off-by: Jeff Harris <jeff_harris@kentrox.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>

scsi: fix qla2xxx printk format warning

sector_t can be different types, so cast it to its largest possible
type.

drivers/scsi/qla2xxx/qla_isr.c:1509:5: warning: format '%lx' expects type 'long unsigned int', but argument 5 has type 'sector_t'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

scsi: SCSI_ISCI needs to select SCSI_SAS_HOST_SMP, fixes build error

SCSI_ISCI needs to select SCSI_SAS_HOST_SMP to ensure that all
needed symbols are available to it.

Fixes this build error:

ERROR: "try_test_sas_gpio_gp_bit" [drivers/scsi/isci/isci.ko] undefined!

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'perf-tools-for-linus' of git://github.com/acmel/linux

* 'perf-tools-for-linus' of git://github.com/acmel/linux:
perf python: Add missing perf_event__parse_sample 'swapped' parm

Merge branch 'perf-tools-for-linus' of git://github.com/acmel/linux

* 'perf-tools-for-linus' of git://github.com/acmel/linux:
  perf tools: Add support for disabling -Werror via WERROR=0
  perf top: Fix userspace sample addr map offset
  perf symbols: Fix issue with binaries using 16-bytes buildids (v2)
  perf tool: Fix endianness handling of u32 data in samples
  perf sort: Fix symbol sort output by separating unresolved samples by type
  perf symbols: Synthesize anonymous mmap events
  perf record: Create events initially disabled and enable after init
  perf symbols: Add some heuristics for choosing the best duplicate symbol
  perf symbols: Preserve symbol scope when parsing /proc/kallsyms
  perf symbols: /proc/kallsyms does not sort module symbols
  perf symbols: Fix ppc64 SEGV in dso__load_sym with debuginfo files
  perf probe: Fix regression of variable finder

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon/kms: fix DDIA enable on some rs690 systems
Revert "drm/radeon/kms: fix typo in r100_blit_copy"

Merge branch 'for-linus' of git://github.com/tiwai/sound

* 'for-linus' of git://github.com/tiwai/sound:
  ALSA: usb-audio - clear chip->probing on error exit
  ALSA: fm801: Gracefully handle failure of tuner auto-detect
  ALSA: fm801: Fix double free in case of error in tuner detection
  ASoC: Ensure we generate a driver name
  ASoC: Remove bitrotted wm8962_resume()
  ASoC: bf5xx-ad73311: Fix prototype for bf5xx_probe

perf python: Add missing perf_event__parse_sample 'swapped' parm

Problem introduced in 936be50, that missed one perf_event__parse_sample
user, the python binding.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-ja4phms9618ggi657plyuch2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

hwmon: (coretemp) remove struct platform_data * parameter from create_core_data()

The only caller of the function obtained the pointer solely for the
purpose of passing it to this function, while it can be easily
determined from the struct platform_device * parameter also passed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>