Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

workqueue: reimplement CPU online rebinding to handle idle workers

Currently, if there are left workers when a CPU is being brough back
online, the trustee kills all idle workers and scheduled rebind_work
so that they re-bind to the CPU after the currently executing work is
finished.  This works for busy workers because concurrency management
doesn't try to wake up them from scheduler callbacks, which require
the target task to be on the local run queue.  The busy worker bumps
concurrency counter appropriately as it clears WORKER_UNBOUND from the
rebind work item and it's bound to the CPU before returning to the
idle state.

To reduce CPU on/offlining overhead (as many embedded systems use it
for powersaving) and simplify the code path, workqueue is planned to
be modified to retain idle workers across CPU on/offlining.  This
patch reimplements CPU online rebinding such that it can also handle
idle workers.

As noted earlier, due to the local wakeup requirement, rebinding idle
workers is tricky.  All idle workers must be re-bound before scheduler
callbacks are enabled.  This is achieved by interlocking idle
re-binding.  Idle workers are requested to re-bind and then hold until
all idle re-binding is complete so that no bound worker starts
executing work item.  Only after all idle workers are re-bound and
parked, CPU_ONLINE proceeds to release them and queue rebind work item
to busy workers thus guaranteeing scheduler callbacks aren't invoked
until all idle workers are ready.

worker_rebind_fn() is renamed to busy_worker_rebind_fn() and
idle_worker_rebind() for idle workers is added.  Rebinding logic is
moved to rebind_workers() and now called from CPU_ONLINE after
flushing trustee.  While at it, add CPU sanity check in
worker_thread().

Note that now a worker may become idle or the manager between trustee
release and rebinding during CPU_ONLINE.  As the previous patch
updated create_worker() so that it can be used by regular manager
while unbound and this patch implements idle re-binding, this is safe.

This prepares for removal of trustee and keeping idle workers across
CPU hotplugs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>

workqueue: drop @bind from create_worker()

Currently, create_worker()'s callers are responsible for deciding
whether the newly created worker should be bound to the associated CPU
and create_worker() sets WORKER_UNBOUND only for the workers for the
unbound global_cwq.  Creation during normal operation is always via
maybe_create_worker() and @bind is true.  For workers created during
hotplug, @bind is false.

Normal operation path is planned to be used even while the CPU is
going through hotplug operations or offline and this static decision
won't work.

Drop @bind from create_worker() and decide whether to bind by looking
at GCWQ_DISASSOCIATED.  create_worker() will also set WORKER_UNBOUND
autmatically if disassociated.  To avoid flipping GCWQ_DISASSOCIATED
while create_worker() is in progress, the flag is now allowed to be
changed only while holding all manager_mutexes on the global_cwq.

This requires that GCWQ_DISASSOCIATED is not cleared behind trustee's
back.  CPU_ONLINE no longer clears DISASSOCIATED before flushing
trustee, which clears DISASSOCIATED before rebinding remaining workers
if asked to release.  For cases where trustee isn't around, CPU_ONLINE
clears DISASSOCIATED after flushing trustee.  Also, now, first_idle
has UNBOUND set on creation which is explicitly cleared by CPU_ONLINE
while binding it.  These convolutions will soon be removed by further
simplification of CPU hotplug path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>

workqueue: use mutex for global_cwq manager exclusion

POOL_MANAGING_WORKERS is used to ensure that at most one worker takes
the manager role at any given time on a given global_cwq.  Trustee
later hitched on it to assume manager adding blocking wait for the
bit.  As trustee already needed a custom wait mechanism, waiting for
MANAGING_WORKERS was rolled into the same mechanism.

Trustee is scheduled to be removed.  This patch separates out
MANAGING_WORKERS wait into per-pool mutex.  Workers use
mutex_trylock() to test for manager role and trustee uses mutex_lock()
to claim manager roles.

gcwq_claim/release_management() helpers are added to grab and release
manager roles of all pools on a global_cwq.  gcwq_claim_management()
always grabs pool manager mutexes in ascending pool index order and
uses pool index as lockdep subclass.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>

workqueue: ROGUE workers are UNBOUND workers

Currently, WORKER_UNBOUND is used to mark workers for the unbound
global_cwq and WORKER_ROGUE is used to mark workers for disassociated
per-cpu global_cwqs. Both are used to make the marked worker skip
concurrency management and the only place they make any difference is
in worker_enter_idle() where WORKER_ROGUE is used to skip scheduling
idle timer, which can easily be replaced with trustee state testing.

This patch replaces WORKER_ROGUE with WORKER_UNBOUND and drops
WORKER_ROGUE. This is to prepare for removing trustee and handling
disassociated global_cwqs as unbound.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>

workqueue: drop CPU_DYING notifier operation

Workqueue used CPU_DYING notification to mark GCWQ_DISASSOCIATED.
This was necessary because workqueue's CPU_DOWN_PREPARE happened
before other DOWN_PREPARE notifiers and workqueue needed to stay
associated across the rest of DOWN_PREPARE.

After the previous patch, workqueue's DOWN_PREPARE happens after
others and can set GCWQ_DISASSOCIATED directly. Drop CPU_DYING and
let the trustee set GCWQ_DISASSOCIATED after disabling concurrency
management.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>

workqueue: perform cpu down operations from low priority cpu_notifier()

Currently, all workqueue cpu hotplug operations run off
CPU_PRI_WORKQUEUE which is higher than normal notifiers.  This is to
ensure that workqueue is up and running while bringing up a CPU before
other notifiers try to use workqueue on the CPU.

Per-cpu workqueues are supposed to remain working and bound to the CPU
for normal CPU_DOWN_PREPARE notifiers.  This holds mostly true even
with workqueue offlining running with higher priority because
workqueue CPU_DOWN_PREPARE only creates a bound trustee thread which
runs the per-cpu workqueue without concurrency management without
explicitly detaching the existing workers.

However, if the trustee needs to create new workers, it creates
unbound workers which may wander off to other CPUs while
CPU_DOWN_PREPARE notifiers are in progress.  Furthermore, if the CPU
down is cancelled, the per-CPU workqueue may end up with workers which
aren't bound to the CPU.

While reliably reproducible with a convoluted artificial test-case
involving scheduling and flushing CPU burning work items from CPU down
notifiers, this isn't very likely to happen in the wild, and, even
when it happens, the effects are likely to be hidden by the following
successful CPU down.

Fix it by using different priorities for up and down notifiers - high
priority for up operations and low priority for down operations.

Workqueue cpu hotplug operations will soon go through further cleanup.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>

workqueue: reimplement WQ_HIGHPRI using a separate worker_pool

WQ_HIGHPRI was implemented by queueing highpri work items at the head
of the global worklist.  Other than queueing at the head, they weren't
handled differently; unfortunately, this could lead to execution
latency of a few seconds on heavily loaded systems.

Now that workqueue code has been updated to deal with multiple
worker_pools per global_cwq, this patch reimplements WQ_HIGHPRI using
a separate worker_pool.  NR_WORKER_POOLS is bumped to two and
gcwq->pools[0] is used for normal pri work items and ->pools[1] for
highpri.  Highpri workers get -20 nice level and has 'H' suffix in
their names.  Note that this change increases the number of kworkers
per cpu.

POOL_HIGHPRI_PENDING, pool_determine_ins_pos() and highpri chain
wakeup code in process_one_work() are no longer used and removed.

This allows proper prioritization of highpri work items and removes
high execution latency of highpri work items.

v2: nr_running indexing bug in get_pool_nr_running() fixed.

v3: Refreshed for the get_pool_nr_running() update in the previous
    patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Josh Hunt <joshhunt00@gmail.com>
LKML-Reference: <CAKA=qzaHqwZ8eqpLNFjxnO2fX-tgAOjmpvxgBFjv6dJeQaOW1w@mail.gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>

workqueue: introduce NR_WORKER_POOLS and for_each_worker_pool()

Introduce NR_WORKER_POOLS and for_each_worker_pool() and convert code
paths which need to manipulate all pools in a gcwq to use them.
NR_WORKER_POOLS is currently one and for_each_worker_pool() iterates
over only @gcwq->pool.

Note that nr_running is per-pool property and converted to an array
with NR_WORKER_POOLS elements and renamed to pool_nr_running.  Note
that get_pool_nr_running() currently assumes 0 index.  The next patch
will make use of non-zero index.

The changes in this patch are mechanical and don't caues any
functional difference.  This is to prepare for multiple pools per
gcwq.

v2: nr_running indexing bug in get_pool_nr_running() fixed.

v3: Pointer to array is stupid.  Don't use it in get_pool_nr_running()
    as suggested by Linus.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>

workqueue: separate out worker_pool flags

GCWQ_MANAGE_WORKERS, GCWQ_MANAGING_WORKERS and GCWQ_HIGHPRI_PENDING
are per-pool properties. Add worker_pool->flags and make the above
three flags per-pool flags.

The changes in this patch are mechanical and don't caues any
functional difference. This is to prepare for multiple pools per
gcwq.

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: use @pool instead of @gcwq or @cpu where applicable

Modify all functions which deal with per-pool properties to pass
around @pool instead of @gcwq or @cpu.

The changes in this patch are mechanical and don't caues any
functional difference. This is to prepare for multiple pools per
gcwq.

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: factor out worker_pool from global_cwq

Move worklist and all worker management fields from global_cwq into
the new struct worker_pool.  worker_pool points back to the containing
gcwq.  worker and cpu_workqueue_struct are updated to point to
worker_pool instead of gcwq too.

This change is mechanical and doesn't introduce any functional
difference other than rearranging of fields and an added level of
indirection in some places.  This is to prepare for multiple pools per
gcwq.

v2: Comment typo fixes as suggested by Namhyung.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>

workqueue: don't use WQ_HIGHPRI for unbound workqueues

Unbound wqs aren't concurrency-managed and try to execute work items
as soon as possible. This is currently achieved by implicitly setting
%WQ_HIGHPRI on all unbound workqueues; however, WQ_HIGHPRI
implementation is about to be restructured and this usage won't be
valid anymore.

Add an explicit chain-wakeup path for unbound workqueues in
process_one_work() instead of piggy backing on %WQ_HIGHPRI.

Signed-off-by: Tejun Heo <tj@kernel.org>

Merge tag 'fbdev-fixes-for-3.5-2' of git://github.com/schandinat/linux-2.6

Pull fbdev fixes from Florian Tobias Schandinat:
"Two fixes for OMAPDSS by Tomi Valkeinen:
   - one to avoid warnings when runtime PM is not enabled
   - one workaround to dependancy issues during suspend/resume"

* tag 'fbdev-fixes-for-3.5-2' of git://github.com/schandinat/linux-2.6:
  OMAPDSS: fix warnings if CONFIG_PM_RUNTIME=n
  OMAPDSS: Use PM notifiers for system suspend

Merge branch 'akpm' (Andrew's patch-bomb)

Merge random patches from Andrew Morton.

* Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (32 commits)
  memblock: free allocated memblock_reserved_regions later
  mm: sparse: fix usemap allocation above node descriptor section
  mm: sparse: fix section usemap placement calculation
  xtensa: fix incorrect memset
  shmem: cleanup shmem_add_to_page_cache
  shmem: fix negative rss in memcg memory.stat
  tmpfs: revert SEEK_DATA and SEEK_HOLE
  drivers/rtc/rtc-twl.c: fix threaded IRQ to use IRQF_ONESHOT
  fat: fix non-atomic NFS i_pos read
  MAINTAINERS: add OMAP CPUfreq driver to OMAP Power Management section
  sgi-xp: nested calls to spin_lock_irqsave()
  fs: ramfs: file-nommu: add SetPageUptodate()
  drivers/rtc/rtc-mxc.c: fix irq enabled interrupts warning
  mm/memory_hotplug.c: release memory resources if hotadd_new_pgdat() fails
  h8300/uaccess: add mising __clear_user()
  h8300/uaccess: remove assignment to __gu_val in unhandled case of get_user()
  h8300/time: add missing #include <asm/irq_regs.h>
  h8300/signal: fix typo "statis"
  h8300/pgtable: add missing #include <asm-generic/pgtable.h>
  drivers/rtc/rtc-ab8500.c: ensure correct probing of the AB8500 RTC when Device Tree is enabled
  ...

memblock: free allocated memblock_reserved_regions later

memblock_free_reserved_regions() calls memblock_free(), but
memblock_free() would double reserved.regions too, so we could free the
old range for reserved.regions.

Also tj said there is another bug which could be related to this.

| I don't think we're saving any noticeable
| amount by doing this "free - give it to page allocator - reserve
| again" dancing.  We should just allocate regions aligned to page
| boundaries and free them later when memblock is no longer in use.

in that case, when DEBUG_PAGEALLOC, will get panic:

     memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
  BUG: unable to handle kernel paging request at ffff88102febd948
  IP: [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
  PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  CPU 0
  Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447
  RIP: 0010:[<ffffffff836a5774>]  [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155

See the discussion at https://lkml.org/lkml/2012/6/13/469

So try to allocate with PAGE_SIZE alignment and free it later.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: sparse: fix usemap allocation above node descriptor section

After commit f5bf18fa22f8 ("bootmem/sparsemem: remove limit constraint
in alloc_bootmem_section"), usemap allocations may easily be placed
outside the optimal section that holds the node descriptor, even if
there is space available in that section. This results in unnecessary
hotplug dependencies that need to have the node unplugged before the
section holding the usemap.

The reason is that the bootmem allocator doesn't guarantee a linear
search starting from the passed allocation goal but may start out at a
much higher address absent an upper limit.

Fix this by trying the allocation with the limit at the section end,
then retry without if that fails. This keeps the fix from f5bf18fa22f8
of not panicking if the allocation does not fit in the section, but
still makes sure to try to stay within the section at first.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org> [3.3.x, 3.4.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: sparse: fix section usemap placement calculation

Commit 238305bb4d41 ("mm: remove sparsemem allocation details from the
bootmem allocator") introduced a bug in the allocation goal calculation
that put section usemaps not in the same section as the node
descriptors, creating unnecessary hotplug dependencies between them:

  node 0 must be removed before remove section 16399
  node 1 must be removed before remove section 16399
  node 2 must be removed before remove section 16399
  node 3 must be removed before remove section 16399
  node 4 must be removed before remove section 16399
  node 5 must be removed before remove section 16399
  node 6 must be removed before remove section 16399

The reason is that it applies PAGE_SECTION_MASK to the physical address
of the node descriptor when finding a suitable place to put the usemap,
when this mask is actually intended to be used with PFNs.  Because the
PFN mask is wider, the target address will point beyond the wanted
section holding the node descriptor and the node must be offlined before
the section holding the usemap can go.

Fix this by extending the mask to address width before use.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

xtensa: fix incorrect memset

Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=43871

Reported-by: <dcb314@hotmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

shmem: cleanup shmem_add_to_page_cache

shmem_add_to_page_cache() has three callsites, but only one of them wants
the radix_tree_preload() (an exceptional entry guarantees that the radix
tree node is present in the other cases), and only that site can achieve
mem_cgroup_uncharge_cache_page() (PageSwapCache makes it a no-op in the
other cases). We did it this way originally to reflect
add_to_page_cache_locked(); but it's confusing now, so move the radix_tree
preloading and mem_cgroup uncharging to that one caller.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

shmem: fix negative rss in memcg memory.stat

When adding the page_private checks before calling shmem_replace_page(), I
did realize that there is a further race, but thought it too unlikely to
need a hurried fix.

But independently I've been chasing why a mem cgroup's memory.stat
sometimes shows negative rss after all tasks have gone: I expected it to
be a stats gathering bug, but actually it's shmem swapping's fault.

It's an old surprise, that when you lock_page(lookup_swap_cache(swap)),
the page may have been removed from swapcache before getting the lock; or
it may have been freed and reused and be back in swapcache; and it can
even be using the same swap location as before (page_private same).

The swapoff case is already secure against this (swap cannot be reused
until the whole area has been swapped off, and a new swapped on); and
shmem_getpage_gfp() is protected by shmem_add_to_page_cache()'s check for
the expected radix_tree entry - but a little too late.

By that time, we might have already decided to shmem_replace_page(): I
don't know of a problem from that, but I'd feel more at ease not to do so
spuriously.  And we have already done mem_cgroup_cache_charge(), on
perhaps the wrong mem cgroup: and this charge is not then undone on the
error path, because PageSwapCache ends up preventing that.

It's this last case which causes the occasional negative rss in
memory.stat: the page is charged here as cache, but (sometimes) found to
be anon when eventually it's uncharged - and in between, it's an
undeserved charge on the wrong memcg.

Fix this by adding an earlier check on the radix_tree entry: it's
inelegant to descend the tree twice, but swapping is not the fast path,
and a better solution would need a pair (try+commit) of memcg calls, and a
rework of shmem_replace_page() to keep out of the swapcache.

We can use the added shmem_confirm_swap() function to replace the
find_get_page+page_cache_release we were already doing on the error path.
And add a comment on that -EEXIST: it seems a peculiar errno to be using,
but originates from its use in radix_tree_insert().

[It can be surprising to see positive rss left in a memcg's memory.stat
after all tasks have gone, since it is supposed to count anonymous but not
shmem.  Aside from sharing anon pages via fork with a task in some other
memcg, it often happens after swapping: because a swap page can't be freed
while under writeback, nor while locked.  So it's not an error, and these
residual pages are easily freed once pressure demands.]

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tmpfs: revert SEEK_DATA and SEEK_HOLE

Revert 4fb5ef089b28 ("tmpfs: support SEEK_DATA and SEEK_HOLE").  I believe
it's correct, and it's been nice to have from rc1 to rc6; but as the
original commit said:

I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it
would be of any use to them on tmpfs.  This code adds 92 lines and 752
bytes on x86_64 - is that bloat or worthwhile?

Nobody asked for it, so I conclude that it's bloat: let's revert tmpfs to
the dumb generic support for v3.5.  We can always reinstate it later if
useful, and anyone needing it in a hurry can just get it out of git.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Josef Bacik <josef@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Jeff liu <jeff.liu@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-twl.c: fix threaded IRQ to use IRQF_ONESHOT

Requesting a threaded interrupt without a primary handler and without
IRQF_ONESHOT is dangerous, and after commit 1c6c6952 ("genirq: Reject
bogus threaded irq requests"), these requests are rejected. This causes
->probe() to fail, and the RTC driver not to be availble.

To fix, add IRQF_ONESHOT to the IRQ flags.

Tested on OMAP3730/OveroSTORM and OMAP4430/Panda board using rtcwake to
wake from system suspend multiple times.

Signed-off-by: Kevin Hilman <khilman@ti.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fat: fix non-atomic NFS i_pos read

fat_encode_fh() can fetch an invalid i_pos value on systems where 64-bit
accesses are not atomic. Make it use the same accessor as the rest of the
FAT code.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: add OMAP CPUfreq driver to OMAP Power Management section

Add the OMAP CPUFreq driver to the list of files in the OMAP Power
Management section.

I've already been maintaining this driver, this just makes it official.

Signed-off-by: Kevin Hilman <khilman@ti.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sgi-xp: nested calls to spin_lock_irqsave()

The code here has a nested spin_lock_irqsave(). It's not needed since
IRQs are already disabled and it causes a problem because it means that
IRQs won't be enabled again at the end. The second call to
spin_lock_irqsave() will overwrite the value of irq_flags and we can't
restore the proper settings.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs: ramfs: file-nommu: add SetPageUptodate()

There is a bug in the below scenario for !CONFIG_MMU:

1. create a new file
2. mmap the file and write to it
3. read the file can't get the correct value

Because

sys_read() -> generic_file_aio_read() -> simple_readpage() -> clear_page()

which causes the page to be zeroed.

Add SetPageUptodate() to ramfs_nommu_expand_for_mapping() so that
generic_file_aio_read() do not call simple_readpage().

Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-mxc.c: fix irq enabled interrupts warning

Fixes

  WARNING: at irq/handle.c:146 handle_irq_event_percpu+0x19c/0x1b8()
  irq 25 handler mxc_rtc_interrupt+0x0/0xac enabled interrupts
  Modules linked in:
   (unwind_backtrace+0x0/0xf0) from (warn_slowpath_common+0x4c/0x64)
   (warn_slowpath_common+0x4c/0x64) from (warn_slowpath_fmt+0x30/0x40)
   (warn_slowpath_fmt+0x30/0x40) from (handle_irq_event_percpu+0x19c/0x1b8)
   (handle_irq_event_percpu+0x19c/0x1b8) from (handle_irq_event+0x28/0x38)
   (handle_irq_event+0x28/0x38) from (handle_level_irq+0x80/0xc4)
   (handle_level_irq+0x80/0xc4) from (generic_handle_irq+0x24/0x38)
   (generic_handle_irq+0x24/0x38) from (handle_IRQ+0x30/0x84)
   (handle_IRQ+0x30/0x84) from (avic_handle_irq+0x2c/0x4c)
   (avic_handle_irq+0x2c/0x4c) from (__irq_svc+0x40/0x60)
  Exception stack(0xc050bf60 to 0xc050bfa8)
  bf60: 00000001 00000000 003c4208 c0018e20 c050a000 c050a000 c054a4c8 c050a000
  bf80: c05157a8 4117b363 80503bb4 00000000 01000000 c050bfa8 c0018e2c c000e808
  bfa0: 60000013 ffffffff
   (__irq_svc+0x40/0x60) from (default_idle+0x1c/0x30)
   (default_idle+0x1c/0x30) from (cpu_idle+0x68/0xa8)
   (cpu_idle+0x68/0xa8) from (start_kernel+0x22c/0x26c)

Signed-off-by: Benoît Thébaudeau <benoit.thebaudeau@advansee.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Sascha Hauer <kernel@pengutronix.de>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/memory_hotplug.c: release memory resources if hotadd_new_pgdat() fails

We should goto error to release memory resource if hotadd_new_pgdat()
failed.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

h8300/uaccess: add mising __clear_user()

Fix the build error:

include/linux/regset.h: In function 'user_regset_copyout_zero':
include/linux/regset.h:289:3: error: implicit declaration of function '__clear_user' [-Werror=implicit-function-declaration]

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

h8300/uaccess: remove assignment to __gu_val in unhandled case of get_user()

__gu_val is const if the passed ptr is const, giving:

  include/linux/pagemap.h: In function 'fault_in_pages_readable':
  include/linux/pagemap.h:442:2: error: assignment of read-only variable '__gu_val'
  include/linux/pagemap.h:448:4: error: assignment of read-only variable '__gu_val'
  include/linux/pagemap.h: In function 'fault_in_multipages_readable':
  include/linux/pagemap.h:499:3: error: assignment of read-only variable '__gu_val'
  include/linux/pagemap.h:508:3: error: assignment of read-only variable '__gu_val'
  make[4]: *** [init/main.o] Error 1

As we don't care about the actual value of __gu_val in the unhandled
case (it will cause a link error anyway), just remove the assignment.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

h8300/time: add missing #include <asm/irq_regs.h>

Fix the build error:

  arch/h8300/kernel/time.c: In function 'h8300_timer_tick':
  arch/h8300/kernel/time.c:39:2: error: implicit declaration of function 'get_irq_regs' [-Werror=implicit-function-declaration]
  arch/h8300/kernel/time.c:39:42: error: invalid type argument of '->' (have 'int')

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

h8300/signal: fix typo "statis"

The keyword is "static", not "statis":

  arch/h8300/kernel/signal.c:455:8: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void'
  arch/h8300/kernel/signal.c: In function 'do_notify_resume':
  arch/h8300/kernel/signal.c:511:3: error: implicit declaration of function 'do_signal' [-Werror=implicit-function-declaration]
  arch/h8300/kernel/signal.c: At top level:
  arch/h8300/kernel/signal.c:414:1: warning: 'handle_signal' defined but not used [-Wunused-function]

Introduced in commit 7ae4e32a6514 ("h8300: switch to saved_sigmask-based
sigsuspend/rt_sigsuspend")

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

h8300/pgtable: add missing #include <asm-generic/pgtable.h>

Fix the h8300 build error:

kernel/sched/core.c: In function 'context_switch':
kernel/sched/core.c:2061:2: error: implicit declaration of function 'arch_start_context_switch' [-Werror=implicit-function-declaration]

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-ab8500.c: ensure correct probing of the AB8500 RTC when Device Tree is enabled

Without this patch, if Device Tree is enabled the AB8500 RTC wouldn't get
probed at all, as there is no reference to it from platform code. This
patch ensures the driver is probed during normal DT start-up.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-ab8500.c: use IRQF_ONESHOT when requesting a threaded IRQ

This driver's IRQ registration is failing because the kernel now forces
IRQs to be ONESHOT if no IRQ handler is passed.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, thp: abort compaction if migration page cannot be charged to memcg

If page migration cannot charge the temporary page to the memcg,
migrate_pages() will return -ENOMEM.  This isn't considered in memory
compaction however, and the loop continues to iterate over all
pageblocks trying to isolate and migrate pages.  If a small number of
very large memcgs happen to be oom, however, these attempts will mostly
be futile leading to an enormous amout of cpu consumption due to the
page migration failures.

This patch will short circuit and fail memory compaction if
migrate_pages() returns -ENOMEM.  COMPACT_PARTIAL is returned in case
some migrations were successful so that the page allocator will retry.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c/r: prctl: less paranoid prctl_set_mm_exe_file()

"no other files mapped" requirement from my previous patch (c/r: prctl:
update prctl_set_mm_exe_file() after mm->num_exe_file_vmas removal) is too
paranoid, it forbids operation even if there mapped one shared-anon vma.

Let's check that current mm->exe_file already unmapped, in this case
exe_file symlink already outdated and its changing is reasonable.

Plus, this patch fixes exit code in case operation success.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Tested-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2: fix NULL pointer dereference in __ocfs2_change_file_space()

As ocfs2_fallocate() will invoke __ocfs2_change_file_space() with a NULL
as the first parameter (file), it may trigger a NULL pointer dereferrence
due to a missing check.

Addresses http://bugs.launchpad.net/bugs/1006012

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Reported-by: Bret Towe <magnade@gmail.com>
Tested-by: Bret Towe <magnade@gmail.com>
Cc: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mn10300: use "#elif defined(CONFIG_*)" instead of "#elif CONFIG_*"

Fix the warnings:

arch/mn10300/kernel/irq.c:173:7: warning: "CONFIG_MN10300_TTYSM1_TIMER9" is not defined [-Wundef]
arch/mn10300/kernel/irq.c:175:7: warning: "CONFIG_MN10300_TTYSM1_TIMER3" is not defined [-Wundef]

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mn10300: mm/dma-alloc.c needs <linux/export.h>

Fix the warnings:

  arch/mn10300/mm/dma-alloc.c: At top level:
  arch/mn10300/mm/dma-alloc.c:63:1: warning: data definition has no type or storage class [enabled by default]
  arch/mn10300/mm/dma-alloc.c:63:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL' [-Wimplicit-int]
  arch/mn10300/mm/dma-alloc.c:63:1: warning: parameter names (without types) in function declaration [enabled by default]
  arch/mn10300/mm/dma-alloc.c:75:1: warning: data definition has no type or storage class [enabled by default]
  arch/mn10300/mm/dma-alloc.c:75:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL' [-Wimplicit-int]
  arch/mn10300/mm/dma-alloc.c:75:1: warning: parameter names (without types) in function declaration [enabled by default]

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mn10300: kernel/traps.c needs <linux/export.h>

Fix the warning:

  arch/mn10300/kernel/traps.c:304:1: warning: data definition has no type or storage class [enabled by default]
  arch/mn10300/kernel/traps.c:304:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL' [-Wimplicit-int]
  arch/mn10300/kernel/traps.c:304:1: warning: parameter names (without types) in function declaration [enabled by default]

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mn10300: kernel/internal.h needs <linux/irqreturn.h>

Fix the nm10300 build failure:

In file included from arch/mn10300/kernel/csrc-mn10300.c:14:0:
arch/mn10300/kernel/internal.h:42:1: error: unknown type name 'irqreturn_t'

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mn10300: remove duplicate definition of PTRACE_O_TRACESYSGOOD

Fix the warning:

include/linux/ptrace.h:66:0: warning: "PTRACE_O_TRACESYSGOOD" redefined [enabled by default]
arch/mn10300/include/asm/ptrace.h:85:0: note: this is the location of the previous definition

We already have it in <linux/ptrace.h>, so remove it from <asm/ptrace.h>

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mn10300: move setup_jiffies_interrupt() to cevt-mn10300.c

Move the static inline function setup_jiffies_interrupt() from
<asm/timex.h> to arch/mn10300/kernel/cevt-mn10300.c, which is its only
callsite.

This allows to remove the inclusion of <asm/hardirq.h> and <linux/irq.h>
from <asm/timex.h> and <unit/timex.h>, fixing include hell like:

  include/linux/jiffies.h:260:31: warning: "CLOCK_TICK_RATE" is not defined [-Wundef]
  include/linux/jiffies.h:260:31: warning: "CLOCK_TICK_RATE" is not defined [-Wundef]
  include/linux/jiffies.h:46:42: error: division by zero in #if
  ...
  make[4]: *** [arch/mn10300/kernel/asm-offsets.s] Error 1

and (after a quick hack for the above by defining CLOCK_TICK_RATE in
<linux/jiffies.h>):

  In file included from include/linux/notifier.h:15:0,
                 from include/linux/memory_hotplug.h:6,
                 from include/linux/mmzone.h:718,
                 from include/linux/gfp.h:4,
                 from include/linux/irq.h:20,
                 from arch/mn10300/unit-asb2303/include/unit/timex.h:15,
                 from arch/mn10300/include/asm/timex.h:15,
                 from include/linux/timex.h:174,
                 from include/linux/jiffies.h:8,
                 from include/linux/ktime.h:25,
                 from include/linux/timer.h:5,
                 from include/linux/workqueue.h:8,
  include/linux/srcu.h:55:22: error: field 'work' has incomplete type

As a consequence, we do need a few more inclusions of <asm/irq.h>, namely
in arch/mn10300/unit-asb2303/smc91111.c and
arch/mn10300/unit-asb2305/unit-init.c.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-spear.c: fix use-after-free in spear_rtc_remove()

`config' is freed and is then used in the rtc_device_unregister() call,
causing a kernel panic.

Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Reviewed-by: Viresh Kumar <viresh.linux@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memory hotplug: fix invalid memory access caused by stale kswapd pointer

kswapd_stop() is called to destroy the kswapd work thread when all memory
of a NUMA node has been offlined.  But kswapd_stop() only terminates the
work thread without resetting NODE_DATA(nid)->kswapd to NULL.  The stale
pointer will prevent kswapd_run() from creating a new work thread when
adding memory to the memory-less NUMA node again.  Eventually the stale
pointer may cause invalid memory access.

An example stack dump as below. It's reproduced with 2.6.32, but latest
kernel has the same issue.

  BUG: unable to handle kernel NULL pointer dereference at (null)
  IP: [<ffffffff81051a94>] exit_creds+0x12/0x78
  PGD 0
  Oops: 0000 [#1] SMP
  last sysfs file: /sys/devices/system/memory/memory391/state
  CPU 11
  Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod tpm_tis rtc_cmos i2c_i801 rtc_core tpm serio_raw pcspkr sg tpm_bios igb i2c_core iTCO_wdt rtc_lib mptctl iTCO_vendor_support button dca bnx2 usbhid hid uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix libata thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas scsi_mod
  Pid: 7949, comm: sh Not tainted 2.6.32.12-qiuxishi-5-default #92 Tecal RH2285
  RIP: 0010:exit_creds+0x12/0x78
  RSP: 0018:ffff8806044f1d78  EFLAGS: 00010202
  RAX: 0000000000000000 RBX: ffff880604f22140 RCX: 0000000000019502
  RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
  RBP: ffff880604f22150 R08: 0000000000000000 R09: ffffffff81a4dc10
  R10: 00000000000032a0 R11: ffff880006202500 R12: 0000000000000000
  R13: 0000000000c40000 R14: 0000000000008000 R15: 0000000000000001
  FS:  00007fbc03d066f0(0000) GS:ffff8800282e0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000000 CR3: 000000060f029000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process sh (pid: 7949, threadinfo ffff8806044f0000, task ffff880603d7c600)
  Stack:
   ffff880604f22140 ffffffff8103aac5 ffff880604f22140 ffffffff8104d21e
   ffff880006202500 0000000000008000 0000000000c38000 ffffffff810bd5b1
   0000000000000000 ffff880603d7c600 00000000ffffdd29 0000000000000003
  Call Trace:
    __put_task_struct+0x5d/0x97
    kthread_stop+0x50/0x58
    offline_pages+0x324/0x3da
    memory_block_change_state+0x179/0x1db
    store_mem_state+0x9e/0xbb
    sysfs_write_file+0xd0/0x107
    vfs_write+0xad/0x169
    sys_write+0x45/0x6e
    system_call_fastpath+0x16/0x1b
  Code: ff 4d 00 0f 94 c0 84 c0 74 08 48 89 ef e8 1f fd ff ff 5b 5d 31 c0 41 5c c3 53 48 8b 87 20 06 00 00 48 89 fb 48 8b bf 18 06 00 00 <8b> 00 48 c7 83 18 06 00 00 00 00 00 00 f0 ff 0f 0f 94 c0 84 c0
  RIP  exit_creds+0x12/0x78
   RSP <ffff8806044f1d78>
  CR2: 0000000000000000

[akpm@linux-foundation.org: add pglist_data.kswapd locking comments]
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'stable' of git://git./linux/kernel/git/cmetcalf/linux-tile

Pull arch/tile fix from Chris Metcalf:
"This is a single change to fix backtracing in big-endian mode."

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: big-endian: properly bswap instruction bundles when backtracing

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"This is a set of three fixes for data corruption (libsas task file),
  oops causing (NULL in scsi_cmd_to_driver) and driver failure (bnx2i).
  The oops caused by the NULL in scsi_cmd_to_driver() manifests in
  scsi_eh_send_cmd() and has been seen by several people now.

Signed-off-by: James Bottomley <JBottomley@Parallels.com>"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  [SCSI] bnx2i: Removed the reference to the netdev->base_addr
  [SCSI] libsas: fix taskfile corruption in sas_ata_qc_fill_rtf
  [SCSI] Fix NULL dereferences in scsi_cmd_to_driver

media: mx2_camera: Fix mbus format handling

Do not use MX2_CAMERA_SWAP16 and MX2_CAMERA_PACK_DIR_MSB flags. The driver
must negotiate with the attached sensor whether the mbus format is UYUV or
YUYV and set CSICR1 configuration accordingly.

This is needed for the video function on mach-imx27_visstrim_m10.c to
perform properly, since an earlier version of this patch has been proven
wrong and has been reverted and a commit, depending on it: "[media]
i.MX27: visstrim_m10: Remove use of MX2_CAMERA_SWAP16" is in the mainline.

Signed-off-by: Javier Martin <javier.martin@vista-silicon.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
[ g.liakhovetski@gmx.de: move a macro definition to a more logical place ]
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
[ Applying directly because Mauro is on vacation - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
-  multiple omap2+ bug fixes
- a regression on ux500 dt support
- a build failure on shmobile

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: OMAP2+: omap2plus_defconfig: EHCI driver is not stable, disable it
  ARM: shmobile: fix platsmp.c build when ARCH_SH73A0=n
  ARM: ux500: Over-ride the DT device naming scheme for pinctrl
  ARM: ux500: Fix build errors/warnings when MACH_UX500_DT is not set
  of: address: Don't fail a lookup just because a node has no reg property
  ARM: OMAP2+: hwmod code/clockdomain data: fix 32K sync timer

Merge tag 'pm-for-3.5-rc7' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
"This removes ACPICA code that had already been removed once from the
  kernel already by commit 2780cc4660e1 ("[ACPI] Fix suspend/resume
  lockup issue by leaving Bus Master Arbitration enabled"), because it
  was known to cause systems to lock up during resume from suspend, but
  was re-introduced by mistake during the v3.4 merge window."

* tag 'pm-for-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / PM: Leave Bus Master Arbitration enabled for suspend/resume

Merge tag 'driver-core-3.5-rc6' of git://git./linux/kernel/git/gregkh/driver-core

Pull printk fixes from Greg Kroah-Hartman:
"Here are some more printk fixes for 3.5-rc6.  They resolve all known
  outstanding issues with the printk changes that have been happening.
  They have been tested by the people reporting the problems.

  This hopefully should be it for the printk stuff for 3.5-final.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'driver-core-3.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kmsg: merge continuation records while printing
  kmsg: /proc/kmsg - support reading of partial log records
  kmsg: make sure all messages reach a newly registered boot console
  kmsg: properly handle concurrent non-blocking read() from /proc/kmsg
  kmsg: add the facility number to the syslog prefix
  kmsg: escape the backslash character while exporting data
  printk: replacing the raw_spin_lock/unlock with raw_spin_lock/unlock_irq

Merge tag 'usb-3.5-rc6' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg Kroah-Hartman:
"Here are a few fixes and new device ids for the 3.5-rc6 tree.

  The PCI changes resolve a long-standing issue with resuming some EHCI
  controllers.  It has been acked by the PCI maintainer, and he asked
  for it to go through my USB tree instead of his.

  The xhci patches also resolve a number of reported issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'usb-3.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  PCI: EHCI: fix crash during suspend on ASUS computers
  USB: cdc-wdm: fix lockup on error in wdm_read
  USB: metro-usb: fix tty_flip_buffer_push use
  USB: option: Add MEDIATEK product ids
  USB: option: add ZTE MF60
  xhci: Fix hang on back-to-back Set TR Deq Ptr commands.
  usb: Add support for root hub port status CAS

Merge tag 'char-misc-3.5-rc6' of git://git./linux/kernel/git/gregkh/char-misc

Pull misc fix from Greg Kroah-Hartman:
"Here's a single MEI driver fix that resolves a regression from 3.4
  that a number of people have reported (and sent to me in different
  patches.)

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'char-misc-3.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  mei: pci_resume: set IRQF_ONESHOT for msi request_threaded_irq

Merge tag 'fixes-for-v3.5-v2' of git://git./linux/kernel/git/linusw/linux-gpio

Pull a late GPIO fix from Linus Walleij:
"Grr! So typically -next washed out a bug in the bug fixes.  This v2 of
  the pull request fixes another OF/DT related issue caused by fixing
  another OF/DT related issue, courtesy of Gerard Sintselaar.

  So please pull the v2.  Or pull it on top of the other one, whatever.
  Sorry for the panic mode, I'm in the middle of the Swedish woods,
  supposedly on vacation."

* tag 'fixes-for-v3.5-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio/gpio-tps65910: gpio_chip.of_node referenced without CONFIG_OF_GPIO defined

MN10300: Fix a missing semicolon

The declaration of arch_release_thread_info() needs a semicolon.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gpio/gpio-tps65910: gpio_chip.of_node referenced without CONFIG_OF_GPIO defined

commit 626f9914 added code to initialize gpio_chip.of_node, but if
CONFIG_OF_GPIO is not defined gps-tps65910 fails to build with an
error complaining gpio_chip has no member of_node. I ran into this
while doing a allyesconfig build on linux-next.

Signed-off-by: Gerard Snitselaar <dev@snitselaar.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

Merge tag 'fixes-for-v3.5' of git://git./linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
"Yes, this is a *LATE* GPIO pull request with fixes for v3.5.

  Grant moved across the planet and accidentally fell off the grid, so
  he asked me to take over the GPIO merges for a while 10 days ago.

  Since then I went over the archives and collected this pile of fixes,
  and pulled two of them from the TI maintainer Kevin Hilman.  Then
  waited for them to at least hit linux-next once or twice."

GPIO fixes for v3.5:
- Invalid context restore on bank 0 for OMAP driver in runtime
   suspend/resume cycle
- Check for NULL platform data in sta-2x11 driver
- Constrain selection of the V1 MSM GPIO driver to applicable platforms
   (Kconfig issue)
- Make sure the correct output value is set in the wm8994 driver
- Export devm_gpio_request_one() so it can be used in modules.
   Apparently some in-kernel modules can be configured to use this
   leading to breakage.
- Check that the GPIO is valid in the lantiq driver
- Fix the flag bits introduced for v3.5, so they don't overlap
- Fix a device tree intialization bug for imx21-compatible devices
- Carry over the OF node to the TPS65910 GPIO chip struct

* tag 'fixes-for-v3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: tps65910: initialize of_node of gpio_chip
  gpio/mxc: make irqs work for fsl,imx21-gpio devices
  gpio: fix bits conflict for gpio flags
  mips: pci-lantiq: Fix check for valid gpio
  gpio: export devm_gpio_request_one
  gpiolib: wm8994: Pay attention to the value set when enabling as output
  gpio/msm_v1: CONFIG_GPIO_MSM_V1 is only available on three SoCs
  gpio-sta2x11: don't use pdata if null
  gpio/omap: fix invalid context restore of gpio bank-0
  gpio/omap: fix irq loss while in idle with debounce on

Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc

Pull powerpc fixes from Benjamin Herrenschmidt:
"It looks like my rewrite of our lazy irq scheme is still exposing
  "interesting" issues left and right.  The previous fixes are now
  causing an occasional BUG_ON to trigger (which this patch turns into a
  WARN_ON while at it), due to another issue of disconnect of the lazy
  irq state vs the processor state in the idle loop on pseries and
  cell.

  This should fix it properly once for all moving the nasty code to a
  common helper function.

  There's also couple more fixes for some debug stuff that didn't build
  (and helped resolving those problems so it's worth having), along with
  a compile fix for newer gcc's."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  tty/hvc_opal: Fix debug function name
  powerpc/numa: Avoid stupid uninitialized warning from gcc
  powerpc: Fix build of some debug irq code
  powerpc: More fixes for lazy IRQ vs. idle

ACPI / PM: Leave Bus Master Arbitration enabled for suspend/resume

This is an old suspend/resume lockup fix:

commit 2780cc4660e1
Author: Len Brown <len.brown@intel.com>
Date:   Thu Dec 23 13:43:30 2004 -0500

    [ACPI] Fix suspend/resume lockup issue
    by leaving Bus Master Arbitration enabled.
    The ACPI spec mandates it be disabled only for C3.

    http://bugzilla.kernel.org/show_bug.cgi?id=3599

Signed-off-by: David Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The bug snuck back in in commit 2feec47d4c5f (ACPICA: ACPI 5: Support
for new FADT SleepStatus, SleepControl registers, 2012-02-14),
presumably by copy/pasting a copy of the code without that fix for the
legacy case.

On affected machines, after that commit, the machine locks up hard on
resume from suspend.  The same fix as seven years ago still works.

Addresses <https://bugzilla.kernel.org/show_bug.cgi?id=43641>.

Reported-bisected-and-tested-by: Octavio Alvarez <alvarezp@alvarezp.com>
Reported-by: Adrian Knoth <adi@drcomp.erfurt.thur.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Revert "of: match by compatible property first"

This reverts commit 107a84e61cdd3406c842a0e4be7efffd3a05dba6.

Meelis Roos reports a regression since 3.5-rc5 that stops Sun Fire V100
and Sun Netra X1 sparc64 machines from booting, hanging after enabling
serial console.  He bisected it to commit 107a84e61cdd.

Rob Herring explains:
"The problem is match combinations of compatible plus name and/or type
  fail to match correctly.  I have a fix for this, but given how late it
  is for 3.5 I think it is best to revert this for now.  There could be
  other cases that rely on the current although wrong behavior.  I will
  post an updated version for 3.6."

Bisected-and-reported-by: Meelis Roos <mroos@linux.ee>
Requested-by: Rob Herring <rob.herring@calxeda.com>
Cc: Thierry Reding <thierry.reding@avionic-design.de>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mei: pci_resume: set IRQF_ONESHOT for msi request_threaded_irq

when the default irq quick handler is used then IRQF_ONESHOT must be set
otherwise the request fails and following error is displayed:

mei 0000:00:16.0: irq 48 for MSI/MSI-X
genirq: Threaded irq requested with handler=NULL and !ONESHOT for irq 48
mei 0000:00:16.0: request_threaded_irq failed: irq = 48.
dpm_run_callback(): pci_pm_resume+0x0/0x140 returns -22
PM: Device 0000:00:16.0 failed to resume async: error -22

Reported-by: Peter Wu <lekensteyn@gmail.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Tested-by: Peter Wu <lekensteyn@gmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Cc: stable <stable@vger.kernel.org> # 3.5
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

PCI: EHCI: fix crash during suspend on ASUS computers

Quite a few ASUS computers experience a nasty problem, related to the
EHCI controllers, when going into system suspend.  It was observed
that the problem didn't occur if the controllers were not put into the
D3 power state before starting the suspend, and commit
151b61284776be2d6f02d48c23c3625678960b97 (USB: EHCI: fix crash during
suspend on ASUS computers) was created to do this.

It turned out this approach messed up other computers that didn't have
the problem -- it prevented USB wakeup from working.  Consequently
commit c2fb8a3fa25513de8fedb38509b1f15a5bbee47b (USB: add
NO_D3_DURING_SLEEP flag and revert 151b61284776be2) was merged; it
reverted the earlier commit and added a whitelist of known good board
names.

Now we know the actual cause of the problem.  Thanks to AceLan Kao for
tracking it down.

According to him, an engineer at ASUS explained that some of their
BIOSes contain a bug that was added in an attempt to work around a
problem in early versions of Windows.  When the computer goes into S3
suspend, the BIOS tries to verify that the EHCI controllers were first
quiesced by the OS.  Nothing's wrong with this, but the BIOS does it
by checking that the PCI COMMAND registers contain 0 without checking
the controllers' power state.  If the register isn't 0, the BIOS
assumes the controller needs to be quiesced and tries to do so.  This
involves making various MMIO accesses to the controller, which don't
work very well if the controller is already in D3.  The end result is
a system hang or memory corruption.

Since the value in the PCI COMMAND register doesn't matter once the
controller has been suspended, and since the value will be restored
anyway when the controller is resumed, we can work around the BIOS bug
simply by setting the register to 0 during system suspend.  This patch
(as1590) does so and also reverts the second commit mentioned above,
which is now unnecessary.

In theory we could do this for every PCI device.  However to avoid
introducing new problems, the patch restricts itself to EHCI host
controllers.

Finally the affected systems can suspend with USB wakeup working
properly.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=37632
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=42728
Based-on-patch-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Dâniel Fraga <fragabr@gmail.com>
Tested-by: Javier Marcet <jmarcet@gmail.com>
Tested-by: Andrey Rahmatullin <wrar@wrar.name>
Tested-by: Oleksij Rempel <bug-track@fisher-privat.net>
Tested-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Cc: stable <stable@vger.kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge tag 'omap-fixes-for-v3.5-rc6' of git://git./linux/kernel/git/tmlind/linux-omap into fixes

From Tony Lindgren <tony@atomide.com>:
Here is one PM regression fix and a defconfig change to disable
echi-omap because the driver currently causes issues with PM.
This annoys Kevin as it makes it harder for him to validate that
PM is working. The proper fixes for the echi-omap are being
discussed, but looks like it will not be properly working with PM
until in v3.7.

* tag 'omap-fixes-for-v3.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: omap2plus_defconfig: EHCI driver is not stable, disable it
ARM: OMAP2+: hwmod code/clockdomain data: fix 32K sync timer

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

tty/hvc_opal: Fix debug function name

udbg_init_debug_opal() should be udbg_init_debug_opal_raw() as
the caller in arch/powerpc/kernel/udbg.c expects

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/numa: Avoid stupid uninitialized warning from gcc

Newer gcc are being a bit blind here (it's pretty obvious we don't
reach the code path using the array if we haven't initialized the
pointer) but none of that is performance critical so let's just
silence it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Fix build of some debug irq code

There was a typo, checking for CONFIG_TRACE_IRQFLAG instead of
CONFIG_TRACE_IRQFLAGS causing some useful debug code to not be
built

This in turns causes a build error on BookE 64-bit due to incorrect
semicolons at the end of a couple of macros, so let's fix that too

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org [v3.4]

powerpc: More fixes for lazy IRQ vs. idle

Looks like we still have issues with pSeries and Cell idle code
vs. the lazy irq state. In fact, the reset fixes that went upstream
are exposing the problem more by causing BUG_ON() to trigger (which
this patch turns into a WARN_ON instead).

We need to be careful when using a variant of low power state that
has the side effect of turning interrupts back on, to properly set
all the SW & lazy state to look as if everything is enabled before
we enter the low power state with MSR:EE off as we will return with
MSR:EE on. If not, we have a discrepancy of state which can cause
things to go very wrong later on.

This patch moves the logic into a helper and uses it from the
pseries and cell idle code. The power4/970 idle code already got
things right (in assembly even !) so I'm not touching it. The power7
"bare metal" idle code is subtly different and correct. Remains PA6T
and some hypervisor based Cell platforms which have questionable
code in there, but they are mostly dead platforms so I'll fix them
when I manage to get final answers from the respective maintainers
about how the low power state actually works on them.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org [v3.4]

Merge tag 'regulator-3.5' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
"A smallish fix for a lock dependency issue which affects a bunch of
  Qualcomm boards that do unusually complicated things with their
  regulators, the API is unlikely to be called by any other system."

* tag 'regulator-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: Fix recursive mutex lockdep warning

gspca_sn9c20x: Fix NULL pointer dereference

Don't call v4l2_ctrl_g_ctrl on ctrls which the model cam in question
does not have.

Reported-by: Frank Schäfer <fschaefer.oss@googlemail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
[ Taken directly, since Mauro is on vacation ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'virtio-for-linus' of git://git./linux/kernel/git/rusty/linux-2.6-for-linus

Pull minor virtio-balloon fix from Rusty Russell:
"Theoretical fix, which greatly simplifies upcoming balloon patches
which will go in via some vm tree."

* tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
virtio-balloon: fix add/get API use

Merge tag 'rpmsg-3.5-fixes' of git://git./linux/kernel/git/ohad/rpmsg

Pull rpmsg fixes from Ohad Ben-Cohen:
"Fixing two (somewhat rare) endpoint-related race issues, both of which
  were reported by Fernando Guzman Lugo."

* tag 'rpmsg-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg:
  rpmsg: make sure inflight messages don't invoke just-removed callbacks
  rpmsg: avoid premature deallocation of endpoints

Merge tag 'remoteproc-3.5-fixes' of git://git./linux/kernel/git/ohad/remoteproc

Pull remoteproc fixes from Ohad Ben-Cohen:
"Two build-related remoteproc fixes for 3.5."

* tag 'remoteproc-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc:
remoteproc: fix missing CONFIG_FW_LOADER configurations
remoteproc/omap: fix randconfig unmet direct dependencies

Merge tag 'hwspinlock-3.5-fixes' of git://git./linux/kernel/git/ohad/hwspinlock

Pull hwspinlock fix from Ohad Ben-Cohen:
"A single hwspinlock core fix for multiple hwspinlock devices
scenarios, from Shinya Kuribayashi."

* tag 'hwspinlock-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock:
hwspinlock/core: use global ID to register hwspinlocks on multiple devices

kmsg: merge continuation records while printing

In (the unlikely) case our continuation merge buffer is busy, we unfortunately
can not merge further continuation printk()s into a single record and have to
store them separately, which leads to split-up output of these lines when they
are printed.

Add some flags about newlines and prefix existence to these records and try to
reconstruct the full line again, when the separated records are printed.

Reported-By: Michael Neuling <mikey@neuling.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Tested-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge branch 'kevin' into fixes

Merge tag 'iommu-fixes-v3.5-rc5' of git://git./linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
"The patches fix several issues in the AMD IOMMU driver, the NVidia
  SMMU driver, and the DMA debug code.

  The most important fix for the AMD IOMMU solves a problem with SR-IOV
  devices where virtual functions did not work with IOMMU enabled.  The
  NVidia SMMU patch fixes a possible sleep while spin-lock situation
  (queued the small fix for v3.5, a better but more intrusive fix is
  coming for v3.6).  The DMA debug patches fix a possible data
  corruption issue due to bool vs u32 usage."

* tag 'iommu-fixes-v3.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: fix type bug in flush code
  dma-debug: debugfs_create_bool() takes a u32 pointer
  iommu/tegra: smmu: Fix unsleepable memory allocation
  iommu/amd: Initialize dma_ops for hotplug and sriov devices
  iommu/amd: Fix missing iommu_shutdown initialization in passthrough mode

kmsg: /proc/kmsg - support reading of partial log records

Restore support for partial reads of any size on /proc/kmsg, in case the
supplied read buffer is smaller than the record size.

Some people seem to think is is ia good idea to run:
$ dd if=/proc/kmsg bs=1 of=...
as a klog bridge.

Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=44211
Reported-by: Jukka Ollila <jiiksteri@gmail.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: OMAP2+: omap2plus_defconfig: EHCI driver is not stable, disable it

The EHCI driver is not stable enough to be enabled by default. In v3.5,
it has at least the following problems:

- warning dump during bootup
- hang during suspend
- prevents CORE powerdomain from entering retention during idle (even
when no USB devices connected.)

This demonstrates that this driver has not been thoroughly tested and
therfore should not be enabled in the default defconfig.

In addition, the problems above cause new PM regressions which need be
addressed before this driver should be enabled in the default
defconfig.

Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>

virtio-balloon: fix add/get API use

Since ee7cd8981e15bcb365fc762afe3fc47b8242f630 'virtio: expose added
descriptors immediately.', in virtio balloon virtqueue_get_buf might
now run concurrently with virtqueue_kick. I audited both and this
seems safe in practice but this is not guaranteed by the API.
Additionally, a spurious interrupt might in theory make
virtqueue_get_buf run in parallel with virtqueue_add_buf, which is
racy.

While we might try to protect against spurious callbacks it's
easier to fix the driver: balloon seems to be the only one
(mis)using the API like this, so let's just fix balloon.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (removed unused var)

Merge branch 'for-3.5-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
"The previous cgroup pull request contained a patch to fix a race
  condition during cgroup hierarchy umount.  Unfortunately, while the
  patch reduced the race window such that the test case I and Sasha were
  using didn't trigger it anymore, it wasn't complete - Shyju and Li
  could reliably trigger the race condition using a different test case.

  The problem wasn't the gap between dentry deletion and release which
  the previous patch tried to fix.  The window was between the last
  dput() of a root's child and the resulting dput() of the root.  For
  cgroup dentries, the deletion and release always happen synchronously.
  As this releases the s_active ref, the refcnt of the root dentry,
  which doesn't hold s_active, stays above zero without the
  corresponding s_active.  If umount was in progress, the last
  deactivate_super() proceeds to destory the superblock and triggers
  BUG() on the non-zero root dentry refcnt after shrinking.

  This issue surfaced because cgroup dentries are now allowed to linger
  after rmdir(2) since 3.5-rc1.  Before, rmdir synchronously drained the
  dentry refcnt and the s_active acquired by rmdir from vfs layer
  protected the whole thing.  After 3.5-rc1, cgroup may internally hold
  and put dentry refs after rmdir finishes and the delayed dput()
  doesn't have surrounding s_active ref exposing this issue.

  This pull request contains two patches - one reverting the previous
  incorrect fix and the other adding the surrounding s_active ref around
  the delayed dput().

  This is quite late in the release cycle but the change is on the safer
  side and fixes the test cases reliably, so I don't think it's too
  crazy."

* 'for-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix cgroup hierarchy umount race
  Revert "cgroup: superblock can't be released with active dentries"

OMAPDSS: fix warnings if CONFIG_PM_RUNTIME=n

If runtime PM is not enabled in the kernel config, pm_runtime_get_sync()
will always return 1 and pm_runtime_put_sync() will always return
-ENOSYS. pm_runtime_get_sync() returning 1 presents no problem to the
driver, but -ENOSYS from pm_runtime_put_sync() causes the driver to
print a warning.

One option would be to ignore errors returned by pm_runtime_put_sync()
totally, as they only say that the call was unable to put the hardware
into suspend mode.

However, I chose to ignore the returned -ENOSYS explicitly, and print a
warning for other errors, as I think we should get notified if the HW
failed to go to suspend properly.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Jassi Brar <jaswinder.singh@linaro.org>
Cc: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Archit Taneja <archit@ti.com>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>

OMAPDSS: Use PM notifiers for system suspend

The current way how omapdss handles system suspend and resume is that
omapdss device (a platform device, which is not part of the device
hierarchy of the DSS HW devices, like DISPC and DSI, or panels.) uses
the suspend and resume callbacks from platform_driver to handle system
suspend. It does this by disabling all enabled panels on suspend, and
resuming the previously disabled panels on resume.

This presents a few problems.

One is that as omapdss device is not related to the panel devices or the
DSS HW devices, there's no ordering in the suspend process. This means
that suspend could be first ran for DSS HW devices and panels, and only
then for omapdss device. Currently this is not a problem, as DSS HW
devices and panels do not handle suspend.

Another, more pressing problem, is that when suspending or resuming, the
runtime PM functions return -EACCES as runtime PM is disabled during
system suspend. This causes the driver to print warnings, and operations
to fail as they think that they failed to bring up the HW.

This patch changes the omapdss suspend handling to use PM notifiers,
which are called before suspend and after resume. This way we have a
normally functioning system when we are suspending and resuming the
panels.

This patch, I believe, creates a problem that somebody could enable or
disable a panel between PM_SUSPEND_PREPARE and the system suspend, and
similarly the other way around in resume. I choose to ignore the problem
for now, as it sounds rather unlikely, and if it happens, it's not
fatal.

In the long run the system suspend handling of omapdss and panels should
be thought out properly. The current approach feels rather hacky.
Perhaps the panel drivers should handle system suspend, or the users of
omapdss (omapfb, omapdrm) should handle system suspend.

Note that after this patch we could probably revert
0eaf9f52e94f756147dbfe1faf1f77a02378dbf9 (OMAPDSS: use sync versions of
pm_runtime_put). But as I said, this patch may be temporary, so let's
leave the sync version still in place.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Reported-by: Jassi Brar <jaswinder.singh@linaro.org>
Tested-by: Jassi Brar <jaswinder.singh@linaro.org>
Tested-by: Joe Woodward <jw@terrafix.co.uk>
Signed-off-by: Archit Taneja <archit@ti.com>
[fts: fixed 2 brace coding style issues]
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>

[SCSI] bnx2i: Removed the reference to the netdev->base_addr

The netdev->base_addr parameter has been deprecated in the L2 bnx2
driver. This is used by bnx2i for the BARn iomapping.

This patch will directly reference the pci_resource_start instead
of using the deprecated netdev->base_addr.

This patch is actually a critical bug fix as the 1G bnx2 driver no
longer supports the netdev->base_addr in the current kernel of the scsi
tree. This means that Broadcom's 1G Linux iSCSI offload solution would
not work at all without this patch.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

[SCSI] libsas: fix taskfile corruption in sas_ata_qc_fill_rtf

fill_result_tf() grabs the taskfile flags from the originating qc which
sas_ata_qc_fill_rtf() promptly overwrites.  The presence of an
ata_taskfile in the sata_device makes it tempting to just copy the full
contents in sas_ata_qc_fill_rtf().  However, libata really only wants
the fis contents and expects the other portions of the taskfile to not
be touched by ->qc_fill_rtf.  To that end store a fis buffer in the
sata_device and use ata_tf_from_fis() like every other ->qc_fill_rtf()
implementation.

Cc: <stable@vger.kernel.org>
Reported-by: Praveen Murali <pmurali@logicube.com>
Tested-by: Praveen Murali <pmurali@logicube.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

[SCSI] Fix NULL dereferences in scsi_cmd_to_driver

Avoid crashing if the private_data pointer happens to be NULL. This has
been seen sometimes when a host reset happens, notably when there are
many LUNs:

host3: Assigned Port ID 0c1601
scsi host3: libfc: Host reset succeeded on port (0c1601)
BUG: unable to handle kernel NULL pointer dereference at 0000000000000350
IP: [<ffffffff81352bb8>] scsi_send_eh_cmnd+0x58/0x3a0
<snip>
Process scsi_eh_3 (pid: 4144, threadinfo ffff88030920c000, task ffff880326b160c0)
Stack:
000000010372e6ba 0000000000000282 000027100920dca0 ffffffffa0038ee0
0000000000000000 0000000000030003 ffff88030920dc80 ffff88030920dc80
00000002000e0000 0000000a00004000 ffff8803242f7760 ffff88031326ed80
Call Trace:
[<ffffffff8105b590>] ? lock_timer_base+0x70/0x70
[<ffffffff81352fbe>] scsi_eh_tur+0x3e/0xc0
[<ffffffff81353a36>] scsi_eh_test_devices+0x76/0x170
[<ffffffff81354125>] scsi_eh_host_reset+0x85/0x160
[<ffffffff81354291>] scsi_eh_ready_devs+0x91/0x110
[<ffffffff813543fd>] scsi_unjam_host+0xed/0x1f0
[<ffffffff813546a8>] scsi_error_handler+0x1a8/0x200
[<ffffffff81354500>] ? scsi_unjam_host+0x1f0/0x1f0
[<ffffffff8106ec3e>] kthread+0x9e/0xb0
[<ffffffff81509264>] kernel_thread_helper+0x4/0x10
[<ffffffff8106eba0>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff81509260>] ? gs_change+0x13/0x13
Code: 25 28 00 00 00 48 89 45 c8 31 c0 48 8b 87 80 00 00 00 48 8d b5 60 ff ff ff 89 d1 48 89 fb 41 89 d6 4c 89 fa 48 8b 80 b8 00 00 00
<48> 8b 80 50 03 00 00 48 8b 00 48 89 85 38 ff ff ff 48 8b 07 4c
RIP [<ffffffff81352bb8>] scsi_send_eh_cmnd+0x58/0x3a0
RSP <ffff88030920dc50>
CR2: 0000000000000350

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

Linux 3.5-rc6

Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

Pull security docs update from James Morris.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
security: Minor improvements to no_new_privs documentation

vfs: make O_PATH file descriptors usable for 'fchdir()'

We already use them for openat() and friends, but fchdir() also wants to
be able to use O_PATH file descriptors. This should make it comparable
to the O_SEARCH of Solaris. In particular, O_PATH allows you to access
(not-quite-open) a directory you don't have read persmission to, only
execute permission.

Noticed during development of multithread support for ksh93.

Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org # O_PATH introduced in 3.0+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cgroup: fix cgroup hierarchy umount race

48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal
optional" allowed a css to linger after the associated cgroup is
removed.  As a css holds a reference on the cgroup's dentry, it means
that cgroup dentries may linger for a while.

Destroying a superblock which has dentries with positive refcnts is a
critical bug and triggers BUG() in vfs code.  As each cgroup dentry
holds an s_active reference, any lingering cgroup has both its dentry
and the superblock pinned and thus preventing premature release of
superblock.

Unfortunately, after 48ddbe1946, there's a small window while
releasing a cgroup which is directly under the root of the hierarchy.
When a cgroup directory is released, vfs layer first deletes the
corresponding dentry and then invokes dput() on the parent, which may
recurse further, so when a cgroup directly below root cgroup is
released, the cgroup is first destroyed - which releases the s_active
it was holding - and then the dentry for the root cgroup is dput().

This creates a window where the root dentry's refcnt isn't zero but
superblock's s_active is.  If umount happens before or during this
window, vfs will see the root dentry with non-zero refcnt and trigger
BUG().

Before 48ddbe1946, this problem didn't exist because the last dentry
reference was guaranteed to be put synchronously from rmdir(2)
invocation which holds s_active around the whole process.

Fix it by holding an extra superblock->s_active reference across
dput() from css release, which is the dput() path added by 48ddbe1946
and the only one which doesn't hold an extra s_active ref across the
final cgroup dput().

Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <4FEEA5CB.8070809@huawei.com>
Reported-by: shyju pv <shyju.pv@huawei.com>
Tested-by: shyju pv <shyju.pv@huawei.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Li Zefan <lizefan@huawei.com>

Revert "cgroup: superblock can't be released with active dentries"

This reverts commit fa980ca87d15bb8a1317853f257a505990f3ffde.  The
commit was an attempt to fix a race condition where a cgroup hierarchy
may be unmounted with positive dentry reference on root cgroup.  While
the commit made the race condition slightly more difficult to trigger,
the race was still there and could be reliably triggered using a
different test case.

Revert the incorrect fix.  The next commit will describe the race and
fix it correctly.

Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <4FEEA5CB.8070809@huawei.com>
Reported-by: shyju pv <shyju.pv@huawei.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Li Zefan <lizefan@huawei.com>

hwspinlock/core: use global ID to register hwspinlocks on multiple devices

Commit 300bab9770 (hwspinlock/core: register a bank of hwspinlocks in a
single API call, 2011-09-06) introduced 'hwspin_lock_register_single()'
to register numerous (a bank of) hwspinlock instances in a single API,
'hwspin_lock_register()'.

At which time, 'hwspin_lock_register()' accidentally passes 'local IDs'
to 'hwspin_lock_register_single()', despite that ..._single() requires
'global IDs' to register hwspinlocks.

We have to convert into global IDs by supplying the missing 'base_id'.

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Shinya Kuribayashi <shinya.kuribayashi.px@renesas.com>
[ohad: fix error path of hwspin_lock_register, too]
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM fixes from Russell King:
"Last merge window, we had some updates from Al cleaning up the signal
  restart handling.  These have caused some problems on ARM, and while
  Al has some fixes, we have some concerns with Al's patches but we've
  been unsuccesful with discussing this.

  We have got to the point where we need to do something, and we've
  decided that the best solution is to revert the appropriate commits
  until Al is able to reply to us.

  Also included here are four patches to fix warnings that I've noticed
  in my build system, and one fix for kprobes test code."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: fix warning caused by wrongly typed arm_dma_limit
  ARM: fix warnings about atomic64_read
  ARM: 7440/1: kprobes: only test 'sub pc, pc, #1b-2b+8-2' on ARMv6
  ARM: 7441/1: perf: return -EOPNOTSUPP if requested mode exclusion is unavailable
  ARM: 7443/1: Revert "new way of handling ERESTART_RESTARTBLOCK"
  ARM: 7442/1: Revert "remove unused restart trampoline"
  ARM: fix set_domain() macro
  ARM: fix mach-versatile/pci.c warning

security: Minor improvements to no_new_privs documentation

The documentation didn't actually mention how to enable no_new_privs.
This also adds a note about possible interactions between
no_new_privs and LSMs (i.e. why teaching systemd to set no_new_privs
is not necessarily a good idea), and it references the new docs
from include/linux/prctl.h.

Suggested-by: Rob Landley <rob@landley.net>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>

Merge tag 'ecryptfs-3.5-rc6-fixes' of git://git./linux/kernel/git/tyhicks/ecryptfs

Pull eCryptfs fixes from Tyler Hicks:
"Fixes an incorrect access mode check when preparing to open a file in
  the lower filesystem.  This isn't an urgent fix, but it is simple and
  the check was obviously incorrect.

  Also fixes a couple important bugs in the eCryptfs miscdev interface.
  These changes are low risk due to the small number of users that use
  the miscdev interface.  I was able to keep the changes minimal and I
  have some cleaner, more complete changes queued up for the next merge
  window that will build on these patches."

* tag 'ecryptfs-3.5-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: Gracefully refuse miscdev file ops on inherited/passed files
  eCryptfs: Fix lockdep warning in miscdev operations
  eCryptfs: Properly check for O_RDONLY flag before doing privileged open

Merge git://git./linux/kernel/git/nab/target-pending

Pull target fixes from Nicholas Bellinger:
"Two minor target fixes.  There is really nothing exciting and/or
  controversial this time around.

  There's one fix from MDR for a RCU debug warning message within tcm_fc
  code (CC'ed to stable), and a small AC fix for qla_target.c based upon
  a recent Coverity static report.

  Also, there is one other outstanding virtio-scsi LUN scanning bugfix
  that has been uncovered with the in-flight tcm_vhost driver over the
  last days, and that needs to make it into 3.5 final too.  This patch
  has been posted to linux-scsi again here:

    http://marc.info/?l=linux-scsi&m=134160609212542&w=2

  and I've asked James to include it in his next PULL request."

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  qla2xxx: print the right array elements in qlt_async_event
  tcm_fc: Resolve suspicious RCU usage warnings

eCryptfs: Gracefully refuse miscdev file ops on inherited/passed files

File operations on /dev/ecryptfs would BUG() when the operations were
performed by processes other than the process that originally opened the
file. This could happen with open files inherited after fork() or file
descriptors passed through IPC mechanisms. Rather than calling BUG(), an
error code can be safely returned in most situations.

In ecryptfs_miscdev_release(), eCryptfs still needs to handle the
release even if the last file reference is being held by a process that
didn't originally open the file. ecryptfs_find_daemon_by_euid() will not
be successful, so a pointer to the daemon is stored in the file's
private_data. The private_data pointer is initialized when the miscdev
file is opened and only used when the file is released.

https://launchpad.net/bugs/994247

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>

qla2xxx: print the right array elements in qlt_async_event

Based upon Alan's patch from Coverity scan id 793583, these debug
messages in qlt_async_event() should be starting from byte 0, which is
always the Asynchronous Event Status Code from the parent switch statement.

Also, rename reason_code -> login_code following the language used in
2500 FW spec for Port Database Changed (0x8014) -> Port Database Changed
Event Mailbox Register for mailbox[2].

Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Chad Dupuis <chad.dupuis@qlogic.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

ARM: shmobile: fix platsmp.c build when ARCH_SH73A0=n

Fix build error in the case of SMP=y but ARCH_SH73A0=n
introduced by:

9601e87 ARM: shmobile: fix smp build

The use of of_machine_is_compatible() will link in the
the SoC-specific symbols:
"sh73a0_get_core_count", "sh73a0_smp_prepare_cpus",
"sh73a0_secondary_init" and "sh73a0_boot_secondary".

This patch adds an ugly #ifdef wrapper as a stop-gap
solution.

Signed-off-by: Magnus Damm <damm@opensource.se>
Tested-by: Tested-by: Simon Horman <horms@verge.net.au>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

tcm_fc: Resolve suspicious RCU usage warnings

Use rcu_dereference_protected to tell rcu that the ft_lport_lock
is held during ft_lport_create. This resolved "suspicious RCU usage"
warnings when debugging options are turned on.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>