Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

aio: kill the misleading rcu read locks in ioctx_add_table() and kill_ioctx()

ioctx_add_table() is the writer, it does not need rcu_read_lock() to
protect ->ioctx_table. It relies on mm->ioctx_lock and rcu locks just
add the confusion.

And it doesn't need rcu_dereference() by the same reason, it must see
any updates previously done under the same ->ioctx_lock. We could use
rcu_dereference_protected() but the patch uses rcu_dereference_raw(),
the function is simple enough.

The same for kill_ioctx(), although it does not update the pointer.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>

aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

On 04/30, Benjamin LaHaise wrote:
>
> > - ctx->mmap_size = 0;
> > -
> > - kill_ioctx(mm, ctx, NULL);
> > + if (ctx) {
> > + ctx->mmap_size = 0;
> > + kill_ioctx(mm, ctx, NULL);
> > + }
>
> Rather than indenting and moving the two lines changing mmap_size and the
> kill_ioctx() call, why not just do "if (!ctx) ... continue;"?  That reduces
> the number of lines changed and avoid excessive indentation.

OK. To me the code looks better/simpler with "if (ctx)", but this is subjective
of course, I won't argue.

The patch still removes the empty line between mmap_size = 0 and kill_ioctx(),
we reset mmap_size only for kill_ioctx(). But feel free to remove this change.

-------------------------------------------------------------------------------
Subject: [PATCH v3 1/2] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
   or even rcu_dereference().

   This mm has no users, nobody else can play with ->ioctx_table. Otherwise
   the code is buggy anyway, if we need rcu_read_lock() in a loop because
   ->ioctx_table can be updated then kfree(table) is obviously wrong.

2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
   munmap(), but another reason is that we simply can't do vm_munmap() unless
   current->mm == mm and this is not true in general, the caller is mmput().

3. We do not really need to nullify mm->ioctx_table before return, probably
   the current code does this to catch the potential problems. But in this
   case RCU_INIT_POINTER(NULL) looks better.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>

aio: fix kernel memory disclosure in io_getevents() introduced in v3.10

A kernel memory disclosure was introduced in aio_read_events_ring() in v3.10
by commit a31ad380bed817aa25f8830ad23e1a0480fef797.  The changes made to
aio_read_events_ring() failed to correctly limit the index into
ctx->ring_pages[], allowing an attacked to cause the subsequent kmap() of
an arbitrary page with a copy_to_user() to copy the contents into userspace.
This vulnerability has been assigned CVE-2014-0206.  Thanks to Mateusz and
Petr for disclosing this issue.

This patch applies to v3.12+.  A separate backport is needed for 3.10/3.11.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: stable@vger.kernel.org

aio: fix aio request leak when events are reaped by userspace

The aio cleanups and optimizations by kmo that were merged into the 3.10
tree added a regression for userspace event reaping. Specifically, the
reference counts are not decremented if the event is reaped in userspace,
leading to the application being unable to submit further aio requests.
This patch applies to 3.12+. A separate backport is required for 3.10/3.11.
This issue was uncovered as part of CVE-2014-0206.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: stable@vger.kernel.org
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>

Merge tag 'compress-3.16-rc3' of git://git./linux/kernel/git/gregkh/driver-core

Pull compress bugfixes from Greg KH:
"Here are two bugfixes for some compression functions that resolve some
  errors when uncompressing some pathalogical data.  Both were found by
  Don A  Bailey"

* tag 'compress-3.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  lz4: ensure length does not wrap
  lzo: properly check for overruns

Merge branch 'akpm' (patches from Andrew Morton)

Merge fixes from Andrew Morton:
"The nmi patch and watchdog patch aren't actually fixes - they're
  features which needed a few last-minutes touchups.

  Otherwise, a rather large batch of fixes - ocfs2 review takes a while
  and I got distracted and missed last week's batch"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits)
  ocfs2/dlm: do not purge lockres that is queued for assert master
  ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount
  ocfs2: manually do the iput once ocfs2_add_entry failed in ocfs2_symlink and ocfs2_mknod
  ocfs2: fix a tiny race when running dirop_fileop_racer
  ocfs2/dlm: fix misuse of list_move_tail() in dlm_run_purge_list()
  ocfs2: refcount: take rw_lock in ocfs2_reflink
  ocfs2: revert "ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously"
  ocfs2: fix deadlock when two nodes are converting same lock from PR to EX and idletimeout closes conn
  ocfs2: should add inode into orphan dir after updating entry in ocfs2_rename()
  mm: fix crashes from mbind() merging vmas
  checkpatch: reduce false positives when checking void function return statements
  ia64: arch/ia64/include/uapi/asm/fcntl.h needs personality.h
  DMA, CMA: fix possible memory leak
  slab: fix oops when reading /proc/slab_allocators
  shmem: fix faulting into a hole while it's punched
  mm: let mm_find_pmd fix buggy race with THP fault
  mm: thp: fix DEBUG_PAGEALLOC oops in copy_page_rep()
  kernel/watchdog.c: print traces for all cpus on lockup detection
  nmi: provide the option to issue an NMI back trace to every cpu but current
  Documentation/accounting/getdelays.c: add missing null-terminate after strncpy call
  ...

ocfs2/dlm: do not purge lockres that is queued for assert master

When workqueue is delayed, it may occur that a lockres is purged while it
is still queued for master assert.  it may trigger BUG() as follows.

N1                                         N2
dlm_get_lockres()
->dlm_do_master_requery
                                  is the master of lockres,
                                  so queue assert_master work

                                  dlm_thread() start running
                                  and purge the lockres

                                  dlm_assert_master_worker()
                                  send assert master message
                                  to other nodes
receiving the assert_master
message, set master to N2

dlmlock_remote() send create_lock message to N2, but receive DLM_IVLOCKID,
if it is RECOVERY lockres, it triggers the BUG().

Another BUG() is triggered when N3 become the new master and send
assert_master to N1, N1 will trigger the BUG() because owner doesn't
match.  So we should not purge lockres when it is queued for assert
master.

Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount

The following case may lead to endless loop during umount.

node A         node B               node C       node D
umount volume,
migrate lockres1
to B
                                                 want to lock lockres1,
                                                 send
                                                 MASTER_REQUEST_MSG
                                                 to C
                                    init block mle
               send
               MIGRATE_REQUEST_MSG
               to C
                                    find a block
                                    mle, and then
                                    return
                                    DLM_MIGRATE_RESPONSE_MASTERY_REF
                                    to B
               set C in refmap
                                    umount successfully
               try to umount, endless
               loop occurs when migrate
               lockres1 since C is in
               refmap

So we can fix this endless loop case by only returning
DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving
MIGRATE_REQUEST_MSG.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: jiangyiwen <jiangyiwen@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Xue jiufei <xuejiufei@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2: manually do the iput once ocfs2_add_entry failed in ocfs2_symlink and ocfs2_mknod

When the call to ocfs2_add_entry() failed in ocfs2_symlink() and
ocfs2_mknod(), iput() will not be called during dput(dentry) because no
d_instantiate(), and this will lead to umount hung.

Signed-off-by: jiangyiwen <jiangyiwen@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2: fix a tiny race when running dirop_fileop_racer

When running dirop_fileop_racer we found a dead lock case.

2 nodes, say Node A and Node B, mount the same ocfs2 volume.  Create
/race/16/1 in the filesystem, and let the inode number of dir 16 is less
than the inode number of dir race.

Node A                            Node B
mv /race/16/1 /race/
                                  right after Node A has got the
                                  EX mode of /race/16/, and tries to
                                  get EX mode of /race
                                  ls /race/16/

In this case, Node A has got the EX mode of /race/16/, and wants to get EX
mode of /race/.  Node B has got the PR mode of /race/, and wants to get
the PR mode of /race/16/.  Since EX and PR are mutually exclusive, dead
lock happens.

This patch fixes this case by locking in ancestor order before trying
inode number order.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2/dlm: fix misuse of list_move_tail() in dlm_run_purge_list()

When a lockres in purge list but is still in use, it should be moved to
the tail of purge list.  dlm_thread will continue to check next lockres in
purge list.  However, code list_move_tail(&dlm->purge_list,
&lockres->purge) will do *no* movements, so dlm_thread will purge the same
lockres in this loop again and again.  If it is in use for a long time,
other lockres will not be processed.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2: refcount: take rw_lock in ocfs2_reflink

This patch tries to fix this crash:

#5 [ffff88003c1cd690] do_invalid_op at ffffffff810166d5
#6 [ffff88003c1cd730] invalid_op at ffffffff8159b2de
    [exception RIP: ocfs2_direct_IO_get_blocks+359]
    RIP: ffffffffa05dfa27  RSP: ffff88003c1cd7e8  RFLAGS: 00010202
    RAX: 0000000000000000  RBX: ffff88003c1cdaa8  RCX: 0000000000000000
    RDX: 000000000000000c  RSI: ffff880027a95000  RDI: ffff88003c79b540
    RBP: ffff88003c1cd858   R8: 0000000000000000   R9: ffffffff815f6ba0
    R10: 00000000000001c9  R11: 00000000000001c9  R12: ffff88002d271500
    R13: 0000000000000001  R14: 0000000000000000  R15: 0000000000001000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#7 [ffff88003c1cd860] do_direct_IO at ffffffff811cd31b
#8 [ffff88003c1cd950] direct_IO_iovec at ffffffff811cde9c
#9 [ffff88003c1cd9b0] do_blockdev_direct_IO at ffffffff811ce764
#10 [ffff88003c1cdb80] __blockdev_direct_IO at ffffffff811ce7cc
#11 [ffff88003c1cdbb0] ocfs2_direct_IO at ffffffffa05df756 [ocfs2]
#12 [ffff88003c1cdbe0] generic_file_direct_write_iter at ffffffff8112f935
#13 [ffff88003c1cdc40] ocfs2_file_write_iter at ffffffffa0600ccc [ocfs2]
#14 [ffff88003c1cdd50] do_aio_write at ffffffff8119126c
#15 [ffff88003c1cddc0] aio_rw_vect_retry at ffffffff811d9bb4
#16 [ffff88003c1cddf0] aio_run_iocb at ffffffff811db880
#17 [ffff88003c1cde30] io_submit_one at ffffffff811dc238
#18 [ffff88003c1cde80] do_io_submit at ffffffff811dc437
#19 [ffff88003c1cdf70] sys_io_submit at ffffffff811dc530
#20 [ffff88003c1cdf80] system_call_fastpath at ffffffff8159a159

It crashes at
        BUG_ON(create && (ext_flags & OCFS2_EXT_REFCOUNTED));
in ocfs2_direct_IO_get_blocks.

ocfs2_direct_IO_get_blocks is expecting the OCFS2_EXT_REFCOUNTED be removed in
ocfs2_prepare_inode_for_write() if it was there. But no cluster lock is taken
during the time before (or inside) ocfs2_prepare_inode_for_write() and after
ocfs2_direct_IO_get_blocks().

It can happen in this case:

Node A(which crashes) Node B
------------------------                 ---------------------------
ocfs2_file_aio_write
  ocfs2_prepare_inode_for_write
    ocfs2_inode_lock
    ...
    ocfs2_inode_unlock
  #no refcount found
.... ocfs2_reflink
                                          ocfs2_inode_lock
                                          ...
                                          ocfs2_inode_unlock
                                          #now, refcount flag set on extent

                                        ...
                                        flush change to disk

ocfs2_direct_IO_get_blocks
  ocfs2_get_clusters
    #extent map miss
    #buffer_head miss
    read extents from disk
  found refcount flag on extent
  crash..

Fix:
Take rw_lock in ocfs2_reflink path

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2: revert "ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously"

75f82eaa502c ("ocfs2: fix NULL pointer dereference when dismount and
ocfs2rec simultaneously") may cause umount hang while shutting down
truncate log.

The situation is as followes:
ocfs2_dismout_volume
-> ocfs2_recovery_exit
  -> free osb->recovery_map
-> ocfs2_truncate_shutdown
  -> lock global bitmap inode
    -> ocfs2_wait_for_recovery
          -> check whether osb->recovery_map->rm_used is zero

Because osb->recovery_map is already freed, rm_used can be any other
values, so it may yield umount hang.

Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2: fix deadlock when two nodes are converting same lock from PR to EX and idletimeout closes conn

Orabug: 18639535

Two node cluster and both nodes hold a lock at PR level and both want to
convert to EX at the same time.  Master node 1 has sent BAST and then
closes the connection due to idletime out.  Node 0 receives BAST, sends
unlock req with cancel flag but gets error -ENOTCONN.  The problem is
this error is ignored in dlm_send_remote_unlock_request() on the
**incorrect** assumption that the master is dead.  See NOTE in comment
why it returns DLM_NORMAL.  Upon getting DLM_NORMAL, node 0 proceeds to
sends convert (without cancel flg) which fails with -ENOTCONN.  waits 5
sec and resends.

This time gets DLM_IVLOCKID from the master since lock not found in
grant, it had been moved to converting queue in response to conv PR->EX
req.  No way out.

Node 1 (master) Node 0
============== ======

  lock mode PR PR

  convert PR -> EX
  mv grant -> convert and que BAST
  ...
                     <-------- convert PR -> EX
  convert que looks like this: ((node 1, PR -> EX) (node 0, PR -> EX))
  ...
                        BAST (want PR -> NL)
                     ------------------>
  ...
  idle timout, conn closed
                                ...
                                In response to BAST,
                                sends unlock with cancel convert flag
                                gets -ENOTCONN. Ignores and
                                sends remote convert request
                                gets -ENOTCONN, waits 5 Sec, retries
  ...
  reconnects
                   <----------------- convert req goes through on next try
  does not find lock on grant que
                   status DLM_IVLOCKID
                   ------------------>
  ...

No way out.  Fix is to keep retrying unlock with cancel flag until it
succeeds or the master dies.

Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2: should add inode into orphan dir after updating entry in ocfs2_rename()

There are two files a and b in dir /mnt/ocfs2.

    node A                           node B

  mv a b
  In ocfs2_rename(), after calling
  ocfs2_orphan_add(), the inode of
  file b will be added into orphan
  dir.

  If ocfs2_update_entry() fails,
  ocfs2_rename return error and mv
  operation fails. But file b still
  exists in the parent dir.

  ocfs2_queue_orphan_scan
   -> ocfs2_queue_recovery_completion
   -> ocfs2_complete_recovery
   -> ocfs2_recover_orphans
  The inode of the file b will be
  put with iput().

  ocfs2_evict_inode
   -> ocfs2_delete_inode
   -> ocfs2_wipe_inode
   -> ocfs2_remove_inode
  OCFS2_VALID_FL in the inode
  i_flags will be cleared.

                                   The file b still can be accessed
                                   on node B.
                                   ls /mnt/ocfs2
                                   When first read the file b with
                                   ocfs2_read_inode_block(). It will
                                   validate the inode using
                                   ocfs2_validate_inode_block().
                                   Because OCFS2_VALID_FL not set in
                                   the inode i_flags, so the file
                                   system will be readonly.

So we should add inode into orphan dir after updating entry in
ocfs2_rename().

Signed-off-by: alex.chen <alex.chen@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix crashes from mbind() merging vmas

In v2.6.34 commit 9d8cebd4bcd7 ("mm: fix mbind vma merge problem")
introduced vma merging to mbind(), but it should have also changed the
convention of passing start vma from queue_pages_range() (formerly
check_range()) to new_vma_page(): vma merging may have already freed
that structure, resulting in BUG at mm/mempolicy.c:1738 and probably
worse crashes.

Fixes: 9d8cebd4bcd7 ("mm: fix mbind vma merge problem")
Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Tested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: <stable@vger.kernel.org> [2.6.34+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

checkpatch: reduce false positives when checking void function return statements

The previous patch had a few too many false positives on styles that
should be acceptable.

Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ia64: arch/ia64/include/uapi/asm/fcntl.h needs personality.h

fs/notify/fanotify/fanotify_user.c: In function 'SYSC_fanotify_init':
fs/notify/fanotify/fanotify_user.c:726: error: implicit declaration of function 'personality'
fs/notify/fanotify/fanotify_user.c:726: error: 'PER_LINUX32' undeclared (first use in this function)
fs/notify/fanotify/fanotify_user.c:726: error: (Each undeclared identifier is reported only once
fs/notify/fanotify/fanotify_user.c:726: error: for each function it appears in.)

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Will Woods <wwoods@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: <stable@vger.kernel.org> [3.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

DMA, CMA: fix possible memory leak

We should free memory for bitmap when we find zone mismatch, otherwise
this memory will leak.

Additionally, I copy code comment from PPC KVM's CMA code to inform why
we need to check zone mis-match.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Reviewed-by: Michal Nazarewicz <mina86@mina86.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Alexander Graf <agraf@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

slab: fix oops when reading /proc/slab_allocators

Commit b1cb0982bdd6 ("change the management method of free objects of
the slab") introduced a bug on slab leak detector
('/proc/slab_allocators').  This detector works like as following
decription.

1. traverse all objects on all the slabs.
2. determine whether it is active or not.
3. if active, print who allocate this object.

but that commit changed the way how to manage free objects, so the logic
determining whether it is active or not is also changed.  In before, we
regard object in cpu caches as inactive one, but, with this commit, we
mistakenly regard object in cpu caches as active one.

This intoduces kernel oops if DEBUG_PAGEALLOC is enabled.  If
DEBUG_PAGEALLOC is enabled, kernel_map_pages() is used to detect who
corrupt free memory in the slab.  It unmaps page table mapping if object
is free and map it if object is active.  When slab leak detector check
object in cpu caches, it mistakenly think this object active so try to
access object memory to retrieve caller of allocation.  At this point,
page table mapping to this object doesn't exist, so oops occurs.

Following is oops message reported from Dave.

It blew up when something tried to read /proc/slab_allocators
(Just cat it, and you should see the oops below)

  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in:
  [snip...]
  CPU: 1 PID: 9386 Comm: trinity-c33 Not tainted 3.14.0-rc5+ #131
  task: ffff8801aa46e890 ti: ffff880076924000 task.ti: ffff880076924000
  RIP: 0010:[<ffffffffaa1a8f4a>]  [<ffffffffaa1a8f4a>] handle_slab+0x8a/0x180
  RSP: 0018:ffff880076925de0  EFLAGS: 00010002
  RAX: 0000000000001000 RBX: 0000000000000000 RCX: 000000005ce85ce7
  RDX: ffffea00079be100 RSI: 0000000000001000 RDI: ffff880107458000
  RBP: ffff880076925e18 R08: 0000000000000001 R09: 0000000000000000
  R10: 0000000000000000 R11: 000000000000000f R12: ffff8801e6f84000
  R13: ffffea00079be100 R14: ffff880107458000 R15: ffff88022bb8d2c0
  FS:  00007fb769e45740(0000) GS:ffff88024d040000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff8801e6f84ff8 CR3: 00000000a22db000 CR4: 00000000001407e0
  DR0: 0000000002695000 DR1: 0000000002695000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
  Call Trace:
    leaks_show+0xce/0x240
    seq_read+0x28e/0x490
    proc_reg_read+0x3d/0x80
    vfs_read+0x9b/0x160
    SyS_read+0x58/0xb0
    tracesys+0xd4/0xd9
  Code: f5 00 00 00 0f 1f 44 00 00 48 63 c8 44 3b 0c 8a 0f 84 e3 00 00 00 83 c0 01 44 39 c0 72 eb 41 f6 47 1a 01 0f 84 e9 00 00 00 89 f0 <4d> 8b 4c 04 f8 4d 85 c9 0f 84 88 00 00 00 49 8b 7e 08 4d 8d 46
  RIP   handle_slab+0x8a/0x180

To fix the problem, I introduce an object status buffer on each slab.
With this, we can track object status precisely, so slab leak detector
would not access active object and no kernel oops would occur.  Memory
overhead caused by this fix is only imposed to CONFIG_DEBUG_SLAB_LEAK
which is mainly used for debugging, so memory overhead isn't big
problem.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

shmem: fix faulting into a hole while it's punched

Trinity finds that mmap access to a hole while it's punched from shmem
can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE)
from completing, until the reader chooses to stop; with the puncher's
hold on i_mutex locking out all other writers until it can complete.

It appears that the tmpfs fault path is too light in comparison with its
hole-punching path, lacking an i_data_sem to obstruct it; but we don't
want to slow down the common case.

Extend shmem_fallocate()'s existing range notification mechanism, so
shmem_fault() can refrain from faulting pages into the hole while it's
punched, waiting instead on i_mutex (when safe to sleep; or repeatedly
faulting when not).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: let mm_find_pmd fix buggy race with THP fault

Trinity has reported:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
    IP: __lock_acquire (kernel/locking/lockdep.c:3070 (discriminator 1))
    CPU: 6 PID: 16173 Comm: trinity-c364 Tainted: G        W
                            3.15.0-rc1-next-20140415-sasha-00020-gaa90d09 #398
    lock_acquire (arch/x86/include/asm/current.h:14
                  kernel/locking/lockdep.c:3602)
    _raw_spin_lock (include/linux/spinlock_api_smp.h:143
                    kernel/locking/spinlock.c:151)
    remove_migration_pte (mm/migrate.c:137)
    rmap_walk (mm/rmap.c:1628 mm/rmap.c:1699)
    remove_migration_ptes (mm/migrate.c:224)
    migrate_pages (mm/migrate.c:922 mm/migrate.c:960 mm/migrate.c:1126)
    migrate_misplaced_page (mm/migrate.c:1733)
    __handle_mm_fault (mm/memory.c:3762 mm/memory.c:3812 mm/memory.c:3925)
    handle_mm_fault (mm/memory.c:3948)
    __get_user_pages (mm/memory.c:1851)
    __mlock_vma_pages_range (mm/mlock.c:255)
    __mm_populate (mm/mlock.c:711)
    SyS_mlockall (include/linux/mm.h:1799 mm/mlock.c:817 mm/mlock.c:791)

I believe this comes about because, whereas collapsing and splitting THP
functions take anon_vma lock in write mode (which excludes concurrent
rmap walks), faulting THP functions (write protection and misplaced
NUMA) do not - and mostly they do not need to.

But they do use a pmdp_clear_flush(), set_pmd_at() sequence which, for
an instant (indeed, for a long instant, given the inter-CPU TLB flush in
there), leaves *pmd neither present not trans_huge.

Which can confuse a concurrent rmap walk, as when removing migration
ptes, seen in the dumped trace.  Although that rmap walk has a 4k page
to insert, anon_vmas containing THPs are in no way segregated from
4k-page anon_vmas, so the 4k-intent mm_find_pmd() does need to cope with
that instant when a trans_huge pmd is temporarily absent.

I don't think we need strengthen the locking at the THP end: it's easily
handled with an ACCESS_ONCE() before testing both conditions.

And since mm_find_pmd() had only one caller who wanted a THP rather than
a pmd, let's slightly repurpose it to fail when it hits a THP or
non-present pmd, and open code split_huge_page_address() again.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Lameter <cl@gentwo.org>
Cc: Dave Jones <davej@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: thp: fix DEBUG_PAGEALLOC oops in copy_page_rep()

Trinity has for over a year been reporting a CONFIG_DEBUG_PAGEALLOC oops
in copy_page_rep() called from copy_user_huge_page() called from
do_huge_pmd_wp_page().

I believe this is a DEBUG_PAGEALLOC false positive, due to the source
page being split, and a tail page freed, while copy is in progress; and
not a problem without DEBUG_PAGEALLOC, since the pmd_same() check will
prevent a miscopy from being made visible.

Fix by adding get_user_huge_page() and put_user_huge_page(): reducing to
the usual get_page() and put_page() on head page in the usual config;
but get and put references to all of the tail pages when
DEBUG_PAGEALLOC.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/watchdog.c: print traces for all cpus on lockup detection

A 'softlockup' is defined as a bug that causes the kernel to loop in
kernel mode for more than a predefined period to time, without giving
other tasks a chance to run.

Currently, upon detection of this condition by the per-cpu watchdog
task, debug information (including a stack trace) is sent to the system
log.

On some occasions, we have observed that the "victim" rather than the
actual "culprit" (i.e.  the owner/holder of the contended resource) is
reported to the user.  Often this information has proven to be
insufficient to assist debugging efforts.

To avoid loss of useful debug information, for architectures which
support NMI, this patch makes it possible to improve soft lockup
reporting.  This is accomplished by issuing an NMI to each cpu to obtain
a stack trace.

If NMI is not supported we just revert back to the old method.  A sysctl
and boot-time parameter is available to toggle this feature.

[dzickus@redhat.com: add CONFIG_SMP in certain areas]
[akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations]
[mq@suse.cz: fix warning]
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jan Moskyto Matejka <mq@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

nmi: provide the option to issue an NMI back trace to every cpu but current

Sometimes it is preferred not to use the trigger_all_cpu_backtrace()
routine when one wants to avoid capturing a back trace for current. For
instance if one was previously captured recently.

This patch provides a new routine namely
trigger_allbutself_cpu_backtrace() which offers the flexibility to issue
an NMI to every cpu but current and capture a back trace accordingly.

Patch x86 and sparc to support new routine.

[dzickus@redhat.com: add stub in #else clause]
[dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion]
[sfr@canb.auug.org.au: undo C99ism]
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Documentation/accounting/getdelays.c: add missing null-terminate after strncpy call

Added a guaranteed null-terminate after call to strncpy.

This was partly found using a static code analysis program called
cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/memstick/host/rtsx_pci_ms.c: add cancel_work when remove driver

Add cancel_work_sync() in rtsx_pci_ms_drv_remove() to cancel pending
request work when removing the driver.

Signed-off-by: Micky Ching <micky_ching@realsil.com.cn>
Cc: Samuel Ortiz <sameo@linux.intel.com> says:
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Roger Tseng <rogerable@realtek.com>
Cc: Wei WANG <wei_wang@realsil.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, pcp: allow restoring percpu_pagelist_fraction default

Oleg reports a division by zero error on zero-length write() to the
percpu_pagelist_fraction sysctl:

    divide error: 0000 [#1] SMP DEBUG_PAGEALLOC
    CPU: 1 PID: 9142 Comm: badarea_io Not tainted 3.15.0-rc2-vm-nfs+ #19
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    task: ffff8800d5aeb6e0 ti: ffff8800d87a2000 task.ti: ffff8800d87a2000
    RIP: 0010: percpu_pagelist_fraction_sysctl_handler+0x84/0x120
    RSP: 0018:ffff8800d87a3e78  EFLAGS: 00010246
    RAX: 0000000000000f89 RBX: ffff88011f7fd000 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000010
    RBP: ffff8800d87a3e98 R08: ffffffff81d002c8 R09: ffff8800d87a3f50
    R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000060
    R13: ffffffff81c3c3e0 R14: ffffffff81cfddf8 R15: ffff8801193b0800
    FS:  00007f614f1e9740(0000) GS:ffff88011f440000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007f614f1fa000 CR3: 00000000d9291000 CR4: 00000000000006e0
    Call Trace:
      proc_sys_call_handler+0xb3/0xc0
      proc_sys_write+0x14/0x20
      vfs_write+0xba/0x1e0
      SyS_write+0x46/0xb0
      tracesys+0xe1/0xe6

However, if the percpu_pagelist_fraction sysctl is set by the user, it
is also impossible to restore it to the kernel default since the user
cannot write 0 to the sysctl.

This patch allows the user to write 0 to restore the default behavior.
It still requires a fraction equal to or larger than 8, however, as
stated by the documentation for sanity.  If a value in the range [1, 7]
is written, the sysctl will return EINVAL.

This successfully solves the divide by zero issue at the same time.

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Oleg Drokin <green@linuxhacker.ru>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

lib/Kconfig.debug: let FRAME_POINTER exclude SCORE, just like exclude most of other architectures

The related warning:

scripts/kconfig/conf --allmodconfig Kconfig
warning: (FAULT_INJECTION_STACKTRACE_FILTER && LATENCYTOP && KMEMCHECK && LOCKDEP) selects FRAME_POINTER which has unmet direct dependencies (DEBUG_KERNEL && (CRIS || M68K || FRV || UML || AVR32 || SUPERH || BLACKFIN || MN10300 || METAG) || ARCH_WANT_FRAME_POINTERS)

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/watchdog.c: remove preemption restrictions when restarting lockup detector

Peter Wu noticed the following splat on his machine when updating
/proc/sys/kernel/watchdog_thresh:

  BUG: sleeping function called from invalid context at mm/slub.c:965
  in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
  3 locks held by init/1:
   #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff8117b663>] vfs_write+0x143/0x180
   #1:  (watchdog_proc_mutex){+.+.+.}, at: [<ffffffff810e02d3>] proc_dowatchdog+0x33/0x110
   #2:  (cpu_hotplug.lock){.+.+.+}, at: [<ffffffff810589c2>] get_online_cpus+0x32/0x80
  Preemption disabled at:[<ffffffff810e0384>] proc_dowatchdog+0xe4/0x110

  CPU: 0 PID: 1 Comm: init Not tainted 3.16.0-rc1-testing #34
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Call Trace:
    dump_stack+0x4e/0x7a
    __might_sleep+0x11d/0x190
    kmem_cache_alloc_trace+0x4e/0x1e0
    perf_event_alloc+0x55/0x440
    perf_event_create_kernel_counter+0x26/0xe0
    watchdog_nmi_enable+0x75/0x140
    update_timers_all_cpus+0x53/0xa0
    proc_dowatchdog+0xe4/0x110
    proc_sys_call_handler+0xb3/0xc0
    proc_sys_write+0x14/0x20
    vfs_write+0xad/0x180
    SyS_write+0x49/0xb0
    system_call_fastpath+0x16/0x1b
  NMI watchdog: disabled (cpu0): hardware events not enabled

What happened is after updating the watchdog_thresh, the lockup detector
is restarted to utilize the new value.  Part of this process involved
disabling preemption.  Once preemption was disabled, perf tried to
allocate a new event (as part of the restart).  This caused the above
BUG_ON as you can't sleep with preemption disabled.

The preemption restriction seemed agressive as we are not doing anything
on that particular cpu, but with all the online cpus (which are
protected by the get_online_cpus lock).  Remove the restriction and the
BUG_ON goes away.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Peter Wu <peter@lekensteyn.nl>
Tested-by: Peter Wu <peter@lekensteyn.nl>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [3.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: SLAB maintainer update

As discussed in various threads on the side:

Remove one inactive maintainer, add two new ones and update my email
address. Plus add Andrew. And fix the glob to include files like
mm/slab_common.c

Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hugetlb: fix copy_hugetlb_page_range() to handle migration/hwpoisoned entry

There's a race between fork() and hugepage migration, as a result we try
to "dereference" a swap entry as a normal pte, causing kernel panic.
The cause of the problem is that copy_hugetlb_page_range() can't handle
"swap entry" family (migration entry and hwpoisoned entry) so let's fix
it.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: <stable@vger.kernel.org> [2.6.37+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tmpfs: ZERO_RANGE and COLLAPSE_RANGE not currently supported

I was well aware of FALLOC_FL_ZERO_RANGE and FALLOC_FL_COLLAPSE_RANGE
support being added to fallocate(); but didn't realize until now that I
had been too stupid to future-proof shmem_fallocate() against new
additions. -EOPNOTSUPP instead of going on to ordinary fallocation.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org> [3.15]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, hotplug: probe interface is available on several platforms

Documentation/memory-hotplug.txt incorrectly states that the memory
driver "probe" interface is only supported on powerpc and is vague about
its application on x86. Clarify the platforms that make this interface
available if memory hotplug is enabled.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kexec: save PG_head_mask in VMCOREINFO

To allow filtering of huge pages, makedumpfile must be able to identify
them in the dump.  This can be done by checking the appropriate page
flag, so communicate its value to makedumpfile through the VMCOREINFO
interface.

There's only one small catch.  Depending on how many page flags are
available on a given architecture, this bit can be called PG_head or
PG_compound.

I sent a similar patch back in 2012, but Eric Biederman did not like
using an #ifdef.  So, this time I'm adding a common symbol
(PG_head_mask) instead.

See https://lkml.org/lkml/2012/11/28/91 for the previous version.

Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

CPU hotplug, smp: flush any pending IPI callbacks before CPU offline

There is a race between the CPU offline code (within stop-machine) and
the smp-call-function code, which can lead to getting IPIs on the
outgoing CPU, *after* it has gone offline.

Specifically, this can happen when using
smp_call_function_single_async() to send the IPI, since this API allows
sending asynchronous IPIs from IRQ disabled contexts.  The exact race
condition is described below.

During CPU offline, in stop-machine, we don't enforce any rule in the
_DISABLE_IRQ stage, regarding the order in which the outgoing CPU and
the other CPUs disable their local interrupts.  Due to this, we can
encounter a situation in which an IPI is sent by one of the other CPUs
to the outgoing CPU (while it is *still* online), but the outgoing CPU
ends up noticing it only *after* it has gone offline.

              CPU 1                                         CPU 2
          (Online CPU)                               (CPU going offline)

       Enter _PREPARE stage                          Enter _PREPARE stage

                                                     Enter _DISABLE_IRQ stage

                                                   =
       Got a device interrupt, and                 | Didn't notice the IPI
       the interrupt handler sent an               | since interrupts were
       IPI to CPU 2 using                          | disabled on this CPU.
       smp_call_function_single_async()            |
                                                   =

       Enter _DISABLE_IRQ stage

       Enter _RUN stage                              Enter _RUN stage

                                  =
       Busy loop with interrupts  |                  Invoke take_cpu_down()
       disabled.                  |                  and take CPU 2 offline
                                  =

       Enter _EXIT stage                             Enter _EXIT stage

       Re-enable interrupts                          Re-enable interrupts

                                                     The pending IPI is noted
                                                     immediately, but alas,
                                                     the CPU is offline at
                                                     this point.

This of course, makes the smp-call-function IPI handler code running on
CPU 2 unhappy and it complains about "receiving an IPI on an offline
CPU".

One real example of the scenario on CPU 1 is the block layer's
complete-request call-path:

__blk_complete_request() [interrupt-handler]
    raise_blk_irq()
        smp_call_function_single_async()

However, if we look closely, the block layer does check that the target
CPU is online before firing the IPI.  So in this case, it is actually
the unfortunate ordering/timing of events in the stop-machine phase that
leads to receiving IPIs after the target CPU has gone offline.

In reality, getting a late IPI on an offline CPU is not too bad by
itself (this can happen even due to hardware latencies in IPI
send-receive).  It is a bug only if the target CPU really went offline
without executing all the callbacks queued on its list.  (Note that a
CPU is free to execute its pending smp-call-function callbacks in a
batch, without waiting for the corresponding IPIs to arrive for each one
of those callbacks).

So, fixing this issue can be broken up into two parts:

1. Ensure that a CPU goes offline only after executing all the
   callbacks queued on it.

2. Modify the warning condition in the smp-call-function IPI handler
   code such that it warns only if an offline CPU got an IPI *and* that
   CPU had gone offline with callbacks still pending in its queue.

Achieving part 1 is straight-forward - just flush (execute) all the
queued callbacks on the outgoing CPU in the CPU_DYING stage[1],
including those callbacks for which the source CPU's IPIs might not have
been received on the outgoing CPU yet.  Once we do this, an IPI that
arrives late on the CPU going offline (either due to the race mentioned
above, or due to hardware latencies) will be completely harmless, since
the outgoing CPU would have executed all the queued callbacks before
going offline.

Overall, this fix (parts 1 and 2 put together) additionally guarantees
that we will see a warning only when the *IPI-sender code* is buggy -
that is, if it queues the callback _after_ the target CPU has gone
offline.

[1].  The CPU_DYING part needs a little more explanation: by the time we
execute the CPU_DYING notifier callbacks, the CPU would have already
been marked offline.  But we want to flush out the pending callbacks at
this stage, ignoring the fact that the CPU is offline.  So restructure
the IPI handler code so that we can by-pass the "is-cpu-offline?" check
in this particular case.  (Of course, the right solution here is to fix
CPU hotplug to mark the CPU offline _after_ invoking the CPU_DYING
notifiers, but this requires a lot of audit to ensure that this change
doesn't break any existing code; hence lets go with the solution
proposed above until that is done).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Suggested-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Galbraith <mgalbraith@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sachin Kamat <sachin.kamat@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: nommu: per-thread vma cache fix

mm could be removed from current task struct, using previous vma->vm_mm

It will crash on blackfin after updated to Linux 3.15.  The commit "mm:
per-thread vma caching" caused the crash.  mm could be removed from
current task struct before

  mmput()->
    exit_mmap()->
      delete_vma_from_mm()

the detailed fault information:

    NULL pointer access
    Kernel OOPS in progress
    Deferred Exception context
    CURRENT PROCESS:
    COMM=modprobe PID=278  CPU=0
    invalid mm
    return address: [0x000531de]; contents of:
    0x000531b0:  c727  acea  0c42  181d  0000  0000  0000  a0a8
    0x000531c0:  b090  acaa  0c42  1806  0000  0000  0000  a0e8
    0x000531d0:  b0d0  e801  0000  05b3  0010  e522  0046 [a090]
    0x000531e0:  6408  b090  0c00  17cc  3042  e3ff  f37b  2fc8

    CPU: 0 PID: 278 Comm: modprobe Not tainted 3.15.0-ADI-2014R1-pre-00345-gea9f446 #25
    task: 0572b720 ti: 0569e000 task.ti: 0569e000
    Compiled for cpu family 0x27fe (Rev 0), but running on:0x0000 (Rev 0)
    ADSP-BF609-0.0 500(MHz CCLK) 125(MHz SCLK) (mpu off)
    Linux version 3.15.0-ADI-2014R1-pre-00345-gea9f446 (steven@steven-OptiPlex-390) (gcc version 4.3.5 (ADI-trunk/svn-5962) ) #25 Tue Jun 10 17:47:46 CST 2014

    SEQUENCER STATUS: Not tainted
     SEQSTAT: 00000027  IPEND: 8008  IMASK: ffff  SYSCFG: 2806
      EXCAUSE   : 0x27
      physical IVG3 asserted : <0xffa00744> { _trap + 0x0 }
      physical IVG15 asserted : <0xffa00d68> { _evt_system_call + 0x0 }
      logical irq   6 mapped  : <0xffa003bc> { _bfin_coretmr_interrupt + 0x0 }
      logical irq   7 mapped  : <0x00008828> { _bfin_fault_routine + 0x0 }
      logical irq  11 mapped  : <0x00007724> { _l2_ecc_err + 0x0 }
      logical irq  13 mapped  : <0x00008828> { _bfin_fault_routine + 0x0 }
      logical irq  39 mapped  : <0x00150788> { _bfin_twi_interrupt_entry + 0x0 }
      logical irq  40 mapped  : <0x00150788> { _bfin_twi_interrupt_entry + 0x0 }
     RETE: <0x00000000> /* Maybe null pointer? */
     RETN: <0x0569fe50> /* kernel dynamic memory (maybe user-space) */
     RETX: <0x00000480> /* Maybe fixed code section */
     RETS: <0x00053384> { _exit_mmap + 0x28 }
     PC  : <0x000531de> { _delete_vma_from_mm + 0x92 }
    DCPLB_FAULT_ADDR: <0x00000008> /* Maybe null pointer? */
    ICPLB_FAULT_ADDR: <0x000531de> { _delete_vma_from_mm + 0x92 }
    PROCESSOR STATE:
     R0 : 00000004    R1 : 0569e000    R2 : 00bf3db4    R3 : 00000000
     R4 : 057f9800    R5 : 00000001    R6 : 0569ddd0    R7 : 0572b720
     P0 : 0572b854    P1 : 00000004    P2 : 00000000    P3 : 0569dda0
     P4 : 0572b720    P5 : 0566c368    FP : 0569fe5c    SP : 0569fd74
     LB0: 057f523f    LT0: 057f523e    LC0: 00000000
     LB1: 0005317c    LT1: 00053172    LC1: 00000002
     B0 : 00000000    L0 : 00000000    M0 : 0566f5bc    I0 : 00000000
     B1 : 00000000    L1 : 00000000    M1 : 00000000    I1 : ffffffff
     B2 : 00000001    L2 : 00000000    M2 : 00000000    I2 : 00000000
     B3 : 00000000    L3 : 00000000    M3 : 00000000    I3 : 057f8000
    A0.w: 00000000   A0.x: 00000000   A1.w: 00000000   A1.x: 00000000
    USP : 056ffcf8  ASTAT: 02003024

    Hardware Trace:
       0 Target : <0x00003fb8> { _trap_c + 0x0 }
         Source : <0xffa006d8> { _exception_to_level5 + 0xa0 } JUMP.L
       1 Target : <0xffa00638> { _exception_to_level5 + 0x0 }
         Source : <0xffa004f2> { _bfin_return_from_exception + 0x6 } RTX
       2 Target : <0xffa004ec> { _bfin_return_from_exception + 0x0 }
         Source : <0xffa00590> { _ex_trap_c + 0x70 } JUMP.S
       3 Target : <0xffa00520> { _ex_trap_c + 0x0 }
         Source : <0xffa0076e> { _trap + 0x2a } JUMP (P4)
       4 Target : <0xffa00744> { _trap + 0x0 }
          FAULT : <0x000531de> { _delete_vma_from_mm + 0x92 } P0 = W[P2 + 2]
         Source : <0x000531da> { _delete_vma_from_mm + 0x8e } P2 = [P4 + 0x18]
       5 Target : <0x000531da> { _delete_vma_from_mm + 0x8e }
         Source : <0x00053176> { _delete_vma_from_mm + 0x2a } IF CC JUMP pcrel
       6 Target : <0x0005314c> { _delete_vma_from_mm + 0x0 }
         Source : <0x00053380> { _exit_mmap + 0x24 } JUMP.L
       7 Target : <0x00053378> { _exit_mmap + 0x1c }
         Source : <0x00053394> { _exit_mmap + 0x38 } IF !CC JUMP pcrel (BP)
       8 Target : <0x00053390> { _exit_mmap + 0x34 }
         Source : <0xffa020e0> { __cond_resched + 0x20 } RTS
       9 Target : <0xffa020c0> { __cond_resched + 0x0 }
         Source : <0x0005338c> { _exit_mmap + 0x30 } JUMP.L
      10 Target : <0x0005338c> { _exit_mmap + 0x30 }
         Source : <0x0005333a> { _delete_vma + 0xb2 } RTS
      11 Target : <0x00053334> { _delete_vma + 0xac }
         Source : <0x0005507a> { _kmem_cache_free + 0xba } RTS
      12 Target : <0x00055068> { _kmem_cache_free + 0xa8 }
         Source : <0x0005505e> { _kmem_cache_free + 0x9e } IF !CC JUMP pcrel (BP)
      13 Target : <0x00055052> { _kmem_cache_free + 0x92 }
         Source : <0x0005501a> { _kmem_cache_free + 0x5a } IF CC JUMP pcrel
      14 Target : <0x00054ff4> { _kmem_cache_free + 0x34 }
         Source : <0x00054fce> { _kmem_cache_free + 0xe } IF CC JUMP pcrel (BP)
      15 Target : <0x00054fc0> { _kmem_cache_free + 0x0 }
         Source : <0x00053330> { _delete_vma + 0xa8 } JUMP.L
    Kernel Stack
    Stack info:
     SP: [0x0569ff24] <0x0569ff24> /* kernel dynamic memory (maybe user-space) */
     Memory from 0x0569ff20 to 056a0000
    0569ff20: 00000001 [04e8da5a] 00008000  00000000  00000000  056a0000  04e8da5a  04e8da5a
    0569ff40: 04eb9eea  ffa00dce  02003025  04ea09c5  057f523f  04ea09c4  057f523e  00000000
    0569ff60: 00000000  00000000  00000000  00000000  00000000  00000000  00000001  00000000
    0569ff80: 00000000  00000000  00000000  00000000  00000000  00000000  00000000  00000000
    0569ffa0: 0566f5bc  057f8000  057f8000  00000001  04ec0170  056ffcf8  056ffd04  057f9800
    0569ffc0: 04d1d498  057f9800  057f8fe4  057f8ef0  00000001  057f928c  00000001  00000001
    0569ffe0: 057f9800  00000000  00000008  00000007  00000001  00000001  00000001 <00002806>
    Return addresses in stack:
        address : <0x00002806> { _show_cpuinfo + 0x2d2 }
    Modules linked in:
    Kernel panic - not syncing: Kernel exception
    [ end Kernel panic - not syncing: Kernel exception

Signed-off-by: Steven Miao <realmz6@gmail.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [3.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

lz4: ensure length does not wrap

Given some pathologically compressed data, lz4 could possibly decide to
wrap a few internal variables, causing unknown things to happen. Catch
this before the wrapping happens and abort the decompression.

Reported-by: "Don A. Bailey" <donb@securitymouse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

lzo: properly check for overruns

The lzo decompressor can, if given some really crazy data, possibly
overrun some variable types. Modify the checking logic to properly
detect overruns before they happen.

Reported-by: "Don A. Bailey" <donb@securitymouse.com>
Tested-by: "Don A. Bailey" <donb@securitymouse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux 3.16-rc2

Merge branch 'i2c/for-next' of git://git./linux/kernel/git/wsa/linux

Pull i2c new drivers from Wolfram Sang:
"Here is a pull request from i2c hoping for the "new driver" rule.

  Originally, I wanted to send this request during the merge window, but
  code checkers with very recent additions complained, so a few fixups
  were needed.  So, some more time went by and I merged rc1 to get a
  stable base"

So the "new driver" rule is really about drivers that people absolutely
need for the kernel to work on new hardware, which is not so much the
case for i2c.  So I considered not pulling this, but eventually
relented.

Just for FYI: the whole (and only) point of "new drivers" is not that
new drivers cannot regress things (they can, and they have - by
triggering badly tested code on machines that never triggered that code
before), but because they can bring to life machines that otherwise
wouldn't be useful at all without the drivers.

So the new driver rule is for essential things that actual consumers
would care about, ie devices like networking or disk drivers that matter
to normal people (not server people - they run old kernels anyway, so
mainlining new drivers is irrelevant for them).

* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: sun6-p2wi: fix call to snprintf
  i2c: rk3x: add NULL entry to the end of_device_id array
  i2c: sun6i-p2wi: use proper return value in probe
  i2c: sunxi: add P2WI (Push/Pull 2 Wire Interface) controller support
  i2c: sunxi: add P2WI DT bindings documentation
  i2c: rk3x: add driver for Rockchip RK3xxx SoC I2C adapter

Merge tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux

Pull file locking fixes from Jeff Layton:
"File locking related bugfixes

  Nothing too earth-shattering here.  A fix for a potential regression
  due to a patch in pile #1, and the addition of a memory barrier to
  prevent a race condition between break_deleg and generic_add_lease"

* tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux:
  locks: set fl_owner for leases back to current->files
  locks: add missing memory barrier in break_deleg

Merge branch 'rc-fixes' of git://git./linux/kernel/git/mmarek/kbuild

Pull kbuild fixes from Michal Marek:
"There are three fixes for regressions caused by the relative paths
  series: deb-pkg, tar-pkg and *docs did not work with O=.

  Plus, there is a fix for the linux-headers deb package and a fixed
  typo.  These are not regression fixes but are safe enough"

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: fix a typo in a kbuild document
  builddeb: fix missing headers in linux-headers package
  Documentation: Fix DocBook build with relative $(srctree)
  kbuild: Fix tar-pkg with relative $(objtree)
  deb-pkg: Fix for relative paths

Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"This fixes some lockups in btrfs reported with rc1.  It probably has
  some performance impact because it is backing off our spinning locks
  more often and switching to a blocking lock.  I'll be able to nail
  that down next week, but for now I want to get the lockups taken care
  of.

  Otherwise some more stack reduction and assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix wrong error handle when the device is missing or is not writeable
  Btrfs: fix deadlock when mounting a degraded fs
  Btrfs: use bio_endio_nodec instead of open code
  Btrfs: fix NULL pointer crash when running balance and scrub concurrently
  btrfs: Skip scrubbing removed chunks to avoid -ENOENT.
  Btrfs: fix broken free space cache after the system crashed
  Btrfs: make free space cache write out functions more readable
  Btrfs: remove unused wait queue in struct extent_buffer
  Btrfs: fix deadlocks with trylock on tree nodes

Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfixes from Bruce Fields:
"Fixes for a new regression from the xdr encoding rewrite, and a
  delegation problem we've had for a while (made somewhat more annoying
  by the vfs delegation support added in 3.13)"

* 'for-3.16' of git://linux-nfs.org/~bfields/linux:
  NFSD: fix bug for readdir of pseudofs
  NFSD: Don't hand out delegations for 30 seconds after recalling them.

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"This is larger than usual: the main reason are the ARM symbol lookup
  speedups that came in late and were hard to resist.

  There's also a kprobes fix and various tooling fixes, plus the minimal
  re-enablement of the mmap2 support interface"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  x86/kprobes: Fix build errors and blacklist context_track_user
  perf tests: Add test for closing dso objects on EMFILE error
  perf tests: Add test for caching dso file descriptors
  perf tests: Allow reuse of test_file function
  perf tests: Spawn child for each test
  perf tools: Add dso__data_* interface descriptons
  perf tools: Allow to close dso fd in case of open failure
  perf tools: Add file size check and factor dso__data_read_offset
  perf tools: Cache dso data file descriptor
  perf tools: Add global count of opened dso objects
  perf tools: Add global list of opened dso objects
  perf tools: Add data_fd into dso object
  perf tools: Separate dso data related variables
  perf tools: Cache register accesses for unwind processing
  perf record: Fix to honor user freq/interval properly
  perf timechart: Reflow documentation
  perf probe: Improve error messages in --line option
  perf probe: Improve an error message of perf probe --vars mode
  perf probe: Show error code and description in verbose mode
  perf probe: Improve error message for unknown member of data structure
  ...

Merge branch 'locking-urgent-for-linus.patch' of git://git./linux/kernel/git/tip/tip

Pull rtmutex fixes from Thomas Gleixner:
"Another three patches to make the rtmutex code more robust.  That's
  the last urgent fallout from the big futex/rtmutex investigation"

* 'locking-urgent-for-linus.patch' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rtmutex: Plug slow unlock race
  rtmutex: Detect changes in the pi lock chain
  rtmutex: Handle deadlock detection smarter

Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux

Pull s390 patches from Martin Schwidefsky:
"A couple of bug fixes, a debug change for qdio, an update for the
  default config, and one small extension.

  The watchdog module based on diagnose 0x288 is converted to the
  watchdog API and it now works under LPAR as well"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/ccwgroup: use ccwgroup_ungroup wrapper
  s390/ccwgroup: fix an uninitialized return code
  s390/ccwgroup: obtain extra reference for asynchronous processing
  qdio: Keep device-specific dbf entries
  s390/compat: correct ucontext layout for high gprs
  s390/cio: set device name as early as possible
  s390: update default configuration
  s390: avoid format strings leaking into names
  s390/airq: silence lockdep warning
  s390/watchdog: add support for LPAR operation (diag288)
  s390/watchdog: use watchdog API
  s390/sclp_vt220: Enable ASCII console per default
  s390/qdio: replace shift loop by ilog2
  s390/cio: silence lockdep warning
  s390/uaccess: always load the kernel ASCE after task switch
  s390/ap_bus: Make modules parameters visible in sysfs

Merge tag 'for-linus' of git://github.com/gxt/linux

Pull UniCore32 bug fixes from Guan Xuetao:
"This includes bugfixes to make unicore32 successfully build under
  defconfig, and some changes for allmodconfig (though not finished)"

* tag 'for-linus' of git://github.com/gxt/linux:
  unicore32: Remove ARCH_HAS_CPUFREQ config option
  UniCore32: Change git tree location information in MAINTAINERS
  arch: unicore32: ksyms: export '__cpuc_coherent_kern_range' to avoid compiling failure
  arch: unicore32: ksyms: export 'pm_power_off' to avoid compiling failure.
  arch: unicore32: ksyms: export additional find_first_*() to avoid compiling failure
  arch:unicore32:mm: add devmem_is_allowed() to support STRICT_DEVMEM
  unicore32: include: asm: add missing ')' for PAGE_* macros in pgtable.h
  arch/unicore32/kernel/setup.c: add generic 'screen_info' to avoid compiling failure
  drivers: scsi: mvsas: fix compiling issue by adding 'MVS_' for "enum pci_interrupt_cause"
  arch: unicore32: kernel: ksyms: remove 'bswapsi2' and 'muldi3' to avoid compiling failure
  arch/unicore32/kernel/ksyms.c: remove 2 export symbols to avoid compiling failure
  drivers/rtc/rtc-puv3.c: remove "&dev->" for typo issue MIME-Version: 1.0
  drivers/rtc/rtc-puv3.c: use dev_dbg() instead of dev_debug() for typo issue
  arch/unicore32/include/asm/io.h: add readl_relaxed() generic definition
  arch/unicore32/include/asm/ptrace.h: add generic definition for profile_pc()
  arch/unicore32/mm/alignment.c: include "asm/pgtable.h" to avoid compiling error
  arch/unicore32/kernel/clock.c: add readl() and writel() for 'PM_' macros
  arch/unicore32/kernel/module.c: use __vmalloc_node_range() instead of __vmalloc_area()
  arch/unicore32/kernel/ksyms.c: remove several undefined exported symbols

Merge tag 'char-misc-3.16-rc2' of git://git./linux/kernel/git/gregkh/char-misc

Pull char / misc driver fixes from Greg KH:
"Here are 3 patches, one a revert of the UIO patch you objected to in
  3.16-rc1 and that no one wanted to defend, a w1 driver bugfix, and a
  MAINTAINERS update for the vmware balloon driver.

  All of these, except for the MAINTAINERS update which just got added,
  have been in linux-next just fine"

* tag 'char-misc-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  MAINTAINERS: add entry for VMware Balloon driver
  w1: mxc_w1: Fix incorrect "presence" status
  Revert "uio: fix vma io range check in mmap"

Merge tag 'staging-3.16-rc2' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are a few fixes for staging and iio drivers that resolve issues
  reported in 3.16-rc1.

  All have been in linux-next just fine"

* tag 'staging-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  imx-drm: parallel-display: Fix DPMS default state.
  staging: android: timed_output: fix use after free of dev
  staging: comedi: addi_apci_1564: add addi_watchdog dependency
  staging: rtl8723au: Reference correct firmwarefiles with MODULE_FIRMWARE()
  staging: rtl8723au: Request correct firmware file for A-cut parts
  iio: adc: checking for NULL instead of IS_ERR() in probe
  iio: adc: at91: signedness bug in at91_adc_get_trigger_value_by_name()
  iio: mxs-lradc: fix divider
  iio: Fix endianness issue in ak8975_read_axis()
  staging/iio: IIO_SIMPLE_DUMMY_BUFFER neds IIO_BUFFER
  twl4030-madc: Request processed values in twl4030_get_madc_conversion
  staging: iio: tsl2x7x_core: fix proximity treshold
  iio: Fix two mpl3115 issues in measurement conversion
  iio: hid-sensors: Get feature report from sensor hub after changing power state

Merge tag 'tty-3.16-rc2' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial bugfixes from Greg KH:
"Here are some tty / serial driver bugfixes for 3.16-rc2 that resolve
  some reported issues.  The samsung driver build error itself has been
  reported by a bunch of people, sorry about that one.  The others are
  all tiny and everyone seems to like them in linux-next so far"

* tag 'tty-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty/serial: fix 8250 early console option passing to regular console
  tty: Correct INPCK handling
  serial: Fix IGNBRK handling
  serial: samsung: Fix build error

Merge tag 'usb-3.16-rc2' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some USB fixes for 3.16-rc2 that resolve some reported
  issues.  All of these have been in linux-next for a while with no
  problems"

* tag 'usb-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: usbtest: add a timeout for scatter-gather tests
  USB: EHCI: avoid BIOS handover on the HASEE E200
  usb: fix hub-port pm_runtime_enable() vs runtime pm transitions
  usb: quiet peer failure warning, disable poweroff
  usb: improve "not suspended yet" message in hub_suspend()
  xhci: Fix sleeping with IRQs disabled in xhci_stop_device()
  usb: fix ->update_hub_device() vs hdev->maxchild

MAINTAINERS: add entry for VMware Balloon driver

Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge tag 'pm+acpi-3.16-rc2' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael Wysocki:
"These are fixes mostly (ia64 regression related to the ACPI
  enumeration of devices, cpufreq regressions, fix for I2C controllers
  included in Intel SoCs, mvebu cpuidle driver fix related to sysfs)
  plus additional kernel command line arguments from Kees to make it
  possible to build kernel images with hibernation and the kernel
  address space randomization included simultaneously, a new ACPI
  battery driver quirk for a system with a broken BIOS and a couple of
  ACPI core cleanups.

  Specifics:

   - Fix for an ia64 regression introduced during the 3.11 cycle by a
     commit that modified the hardware initialization ordering and made
     device discovery fail on some systems.

   - Fix for a build problem on systems where the cpufreq-cpu0 driver is
     built-in and the cpu-thermal driver is modular from Arnd Bergmann.

   - Fix for a recently introduced computational mistake in the
     intel_pstate driver that leads to excessive rounding errors from
     Doug Smythies.

   - Fix for a failure code path in cpufreq_update_policy() that fails
     to unlock the locks acquired previously from Aaron Plattner.

   - Fix for the cpuidle mvebu driver to use shorter state names which
     will prevent the sysfs interface from returning mangled strings.
     From Gregory Clement.

   - ACPI LPSS driver fix to make sure that the I2C controllers included
     in BayTrail SoCs are not held in the reset state while they are
     being probed from Mika Westerberg.

   - New kernel command line arguments making it possible to build
     kernel images with hibernation and kASLR included at the same time
     and to select which of them will be used via the command line (they
     are still functionally mutually exclusive, though).  From Kees
     Cook.

   - ACPI battery driver quirk for Acer Aspire V5-573G that fails to
     send battery status change notifications timely from Alexander
     Mezin.

   - Two ACPI core cleanups from Christoph Jaeger and Fabian Frederick"

* tag 'pm+acpi-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: mvebu: Fix the name of the states
  cpufreq: unlock when failing cpufreq_update_policy()
  intel_pstate: Correct rounding in busy calculation
  ACPI: use kstrto*() instead of simple_strto*()
  ACPI / processor replace __attribute__((packed)) by __packed
  ACPI / battery: add quirk for Acer Aspire V5-573G
  ACPI / battery: use callback for setting up quirks
  ACPI / LPSS: Take I2C host controllers out of reset
  x86, kaslr: boot-time selectable with hibernation
  PM / hibernate: introduce "nohibernate" boot parameter
  cpufreq: cpufreq-cpu0: fix CPU_THERMAL dependency
  ACPI / ia64 / sba_iommu: Restore the working initialization ordering

Merge tag 'sound-3.16-rc2' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"The significant part here is a few security fixes for ALSA core
  control API by Lars.  Besides that, there are a few fixes for ASoC
  sigmadsp (again by Lars) for building properly, and small fixes for
  ASoC rsnd, MMP, PXA and FSL, in addition to a fix for bogus WARNING in
  i915/HD-audio binding"

* tag 'sound-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: control: Make sure that id->index does not overflow
  ALSA: control: Handle numid overflow
  ALSA: control: Don't access controls outside of protected regions
  ALSA: control: Fix replacing user controls
  ALSA: control: Protect user controls against concurrent access
  drm/i915, HD-audio: Don't continue probing when nomodeset is given
  ASoC: fsl: Fix build problem
  ASoC: rsnd: fixup index of src/dst mod when capture
  ASoC: fsl_spdif: Fix integer overflow when calculating divisors
  ASoC: fsl_spdif: Fix incorrect usage of regmap_read()
  ASoC: dapm: Make sure register value is in sync with DAPM kcontrol state
  ASoC: sigmadsp: Split regmap and I2C support into separate modules
  ASoC: MMP audio needs sram support
  ASoC: pxa: add I2C dependencies as needed

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"This looks bigger than it is, as one of the nouveau firmware fixes
  ("drm/gf100-/gr: report class data to host on fwmthd failure")
  regenerates a bunch of the firmware files after changing the assembly
  by a few lines, without that, its more of a

    36 files changed, 370 insertions(+), 129 deletions(-)

  It contains some vt.c fixes acked by Greg, for rare hard hangs on i915
  loading, that also fixes hangs on reload and spurious register write
  errors.

  drm core: one fix for uninit memory

  nouveau: displayport rework caused a few regressions, Ben has been
     fixing them as the appear, along with some other fixes

  radeon: pageflipping regression fix, deep color fix, mode validation
     fixes

  i915: fbc disable, vga console kick off, backlight fix, divide-by-zero
     fix"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (33 commits)
  drm: fix uninitialized acquire_ctx fields (v2)
  drm/radeon: Fix radeon_irq_kms_pflip_irq_get/put() imbalance
  Revert "drm/radeon: remove drm_vblank_get|put from pflip handling"
  drm/radeon: improve dvi_mode_valid
  drm/radeon: update mode_valid testing for DP
  drm/radeon: Use dce5/6 hdmi deep color clock setup also on dce8+
  drm/nouveau/disp: fix oops in destructor with headless cards
  drm/gf117/i2c: no aux channels on this chipset
  drm/nouveau/doc: update the thermal documentation
  drm/nouveau/pwr: fix typo in fifo wrap handling
  drm/nv50/disp: fix a potential oops in supervisor handling
  drm/nouveau/disp/dp: don't touch link config after success
  drm/nouveau/kms: reference vblank for crtc during pageflip.
  drm/gk104/fb/ram: fixups from an earlier search+replace
  drm/nv50/gr: remove an unneeded write while initialising PGRAPH
  drm/nv50/gr: fix overlap while zeroing zcull regions
  drm/gf100-/gr: report class data to host on fwmthd failure
  drm/gk104/ibus: increase various random timeouts
  drm/gk104/clk: only touch divider for mode we'll be using
  drm/radeon: Bypass hw lut's for > 8 bpc framebuffer scanout.
  ...

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"A smaller collection of fixes for the block core that would be nice to
  have in -rc2.  This pull request contains:

   - Fixes for races in the wait/wakeup logic used in blk-mq from
     Alexander.  No issues have been observed, but it is definitely a
     bit flakey currently.  Alternatively, we may drop the cyclic
     wakeups going forward, but that needs more testing.

   - Some cleanups from Christoph.

   - Fix for an oops in null_blk if queue_mode=1 and softirq completions
     are used.  From me.

   - A fix for a regression caused by the chunk size setting.  It
     inadvertently used max_hw_sectors instead of max_sectors, which is
     incorrect, and causes hangs on btrfs multi-disk setups (where hw
     sectors apparently isn't set).  From me.

   - Removal of WQ_POWER_EFFICIENT in the kblockd creation.  This was a
     recent addition as well, but it actually breaks blk-mq which relies
     on strict scheduling.  If the workqueue power_efficient mode is
     turned on, this breaks blk-mq.  From Matias.

   - null_blk module parameter description fix from Mike"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: bitmap tag: fix races in bt_get() function
  blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt
  blk-mq: bitmap tag: fix races on shared ::wake_index fields
  block: blk_max_size_offset() should check ->max_sectors
  null_blk: fix softirq completions for queue_mode == 1
  blk-mq: merge blk_mq_drain_queue and __blk_mq_drain_queue
  blk-mq: properly drain stopped queues
  block: remove WQ_POWER_EFFICIENT from kblockd
  null_blk: fix name and description of 'queue_mode' module parameter
  block: remove elv_abort_queue and blk_abort_flushes

Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
"A first set of bug fixes that didn't make it for the merge window, and
  two Kconfig cleanups that still make sense at this point.

  Unfortunately, one of the two cleanups caused an unintended change in
  the original version, so we had to revert one part of it and do some
  more testing to ensure the rest is really fine.  There was also a
  last-minute rebase of the patches to remove another bad commit"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: use menuconfig for sub-arch menus
  ARM: multi_v7_defconfig: re-enable SDHCI drivers
  ARM: EXYNOS: Fix compilation warning
  ARM: exynos: move sysram info to exynos.c
  ARM: dts: Specify the NAND ECC scheme explicitly on Armada 385 DB board
  ARM: dts: Specify the NAND ECC scheme explicitly on Armada 375 DB board
  ARM: exynos: cleanup kconfig option display
  misc: vexpress: fix error handling vexpress_syscfg_regmap_init()
  ARM: Remove ARCH_HAS_CPUFREQ config option
  ARM: integrator: fix section mismatch problem
  ARM: mvebu: DT: fix OpenBlocks AX3-4 RAM size
  ARM: samsung: make SAMSUNG_DMADEV optional
  remoteproc: da8xx: don't select CMA on no-MMU
  bus/arm-cci: add dependency on OF && CPU_V7
  ARM: keystone requires ARM_PATCH_PHYS_VIRT
  ARM: omap2: fix am43xx dependency on l2x0 cache

w1: mxc_w1: Fix incorrect "presence" status

W1 reset_bus() should return zero if slave device is present.
This patch fix this issue.

Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

imx-drm: parallel-display: Fix DPMS default state.

If connector->dpms is left untouched, it defaults
to DRM_MODE_DPMS_ON (0).

As a result, drm_helper_connector_dpms will exit when
it will be asked to set the state to DRM_MODE_DPMS_ON,
because it is already set.

That issue prevented displays from turning on at boot.

Signed-off-by: Denis Carikli <denis@eukrea.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: android: timed_output: fix use after free of dev

tdev->dev has been freed in device_destroy(), it's not right to
use dev_set_drvdata() after that;

Signed-off-by: Yi Zhang <yizhang@marvell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

unicore32: Remove ARCH_HAS_CPUFREQ config option

This config exists entirely to hide the cpufreq menu from the
kernel configuration unless a platform has selected it. Nothing
is actually built if this config is 'Y' and it just leads to more
patches that add a select under a platform Kconfig so that some
other CPUfreq option can be chosen. Let's remove the option so
that we can always enable CPUfreq drivers on unicore32 platforms.

Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

UniCore32: Change git tree location information in MAINTAINERS

UniCore32 git repo has moved to github.
Branch 'unicore32' is used for prepared patches, and automatically merged to linux-next.
Branch 'unicore32-working' is used for development.

Signed-off-by: Guan Xuetao <gxt@mprc.pku.edu.cn>

arch: unicore32: ksyms: export '__cpuc_coherent_kern_range' to avoid compiling failure

flush_icache_range() is '__cpuc_coherent_kern_range' under unicore32,
and lkdtm.ko needs it. At present, '__cpuc_coherent_kern_range' is
still used by unicore32, so export it to avoid compiling failure.

The related error (with allmodconfig under unicore32):

ERROR: "__cpuc_coherent_kern_range" [drivers/misc/lkdtm.ko] undefined!

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

arch: unicore32: ksyms: export 'pm_power_off' to avoid compiling failure.

Two driver modules need 'pm_power_off', so export it.

The related error (with allmodconfig under unicore32):

    MODPOST 4039 modules
  ERROR: "pm_power_off" [drivers/mfd/retu-mfd.ko] undefined!
  ERROR: "pm_power_off" [drivers/char/ipmi/ipmi_poweroff.ko] undefined!

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

arch: unicore32: ksyms: export additional find_first_*() to avoid compiling failure

Some modules need find_first_bit() and find_first_zero_bit(), so export
them.

The related error (with allmodconfig under unicore32):

    MODPOST 4039 modules
  ERROR: "find_first_bit" [sound/soc/codecs/snd-soc-uda1380.ko] undefined!
  ERROR: "find_first_zero_bit" [net/sctp/sctp.ko] undefined!
  ...

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

arch:unicore32:mm: add devmem_is_allowed() to support STRICT_DEVMEM

unicore32 supports STRICT_DEVMEM, so it needs devmem_is_allowed(), like
some of other architectures have done (e.g. arm, powerpc, x86 ...).

The related error with allmodconfig:

    CC      drivers/char/mem.o
  drivers/char/mem.c: In function â\80\98range_is_allowedâ\80\99:
  drivers/char/mem.c:69: error: implicit declaration of function â\80\98devmem_is_allowedâ\80\99
  make[2]: *** [drivers/char/mem.o] Error 1
  make[1]: *** [drivers/char] Error 2
  make: *** [drivers] Error 2

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

unicore32: include: asm: add missing ')' for PAGE_* macros in pgtable.h

Missing related ')', the related compiling error:

    CC [M]  drivers/gpu/drm/udl/udl_fb.o
  drivers/gpu/drm/udl/udl_fb.c: In function Â‘udl_fb_mmapÂ’:
  drivers/gpu/drm/udl/udl_fb.c:273: error: expected Â‘)Â’ before Â‘returnÂ’
  drivers/gpu/drm/udl/udl_fb.c:281: error: expected expression before Â‘}Â’ token
  make[4]: *** [drivers/gpu/drm/udl/udl_fb.o] Error 1
  make[3]: *** [drivers/gpu/drm/udl] Error 2
  make[2]: *** [drivers/gpu/drm] Error 2
  make[1]: *** [drivers/gpu] Error 2
  make: *** [drivers] Error 2

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

arch/unicore32/kernel/setup.c: add generic 'screen_info' to avoid compiling failure

Add generic 'screen_info' just like another architectures have done
(e.g. tile, sh, score, ia64, hexagon, and cris).

The related error (with allmodconfig under unicore32):

    LD      init/built-in.o
  drivers/built-in.o: In function `vgacon_save_screen':
  powercap_sys.c:(.text+0x21788): undefined reference to `screen_info'
  drivers/built-in.o: In function `vgacon_resize':
  powercap_sys.c:(.text+0x21b54): undefined reference to `screen_info'
  drivers/built-in.o: In function `vgacon_switch':
  powercap_sys.c:(.text+0x21cb4): undefined reference to `screen_info'
  drivers/built-in.o: In function `vgacon_init':
  powercap_sys.c:(.text+0x2296c): undefined reference to `screen_info'
  drivers/built-in.o: In function `vgacon_startup':
  powercap_sys.c:(.text+0x22e80): undefined reference to `screen_info'

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

drivers: scsi: mvsas: fix compiling issue by adding 'MVS_' for "enum pci_interrupt_cause"

The direct cause is IRQ_SPI is already defined as a macro in unicore32
architecture (also, blackfin and mips architectures define it). The
related error (unicore32  with allmodconfig)

    CC [M]  drivers/scsi/mvsas/mv_94xx.o
  In file included from drivers/scsi/mvsas/mv_94xx.c:27:
  drivers/scsi/mvsas/mv_94xx.h:176: error: expected identifier before numeric constant

And IRQ_SAS_A and IRQ_SAS_B are used as 'u32' (although "enum
pci_interrupt_cause" is not used directly, now).

All together, need add 'MVS_' for "enum pci_interrupt_cause".

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

arch: unicore32: kernel: ksyms: remove 'bswapsi2' and 'muldi3' to avoid compiling failure

After check the code, 'bswapsi2' and 'muldi3' are useless for
unicore32, so can remove them to avoid compiling failure.

The related error (with allmodconfig under unicore32):

    LD      init/built-in.o
  arch/unicore32/kernel/built-in.o:(___ksymtab+__muldi3+0x0): undefined reference to `__muldi3'
  arch/unicore32/kernel/built-in.o:(___ksymtab+__bswapsi2+0x0): undefined reference to `__bswapsi2'

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

arch/unicore32/kernel/ksyms.c: remove 2 export symbols to avoid compiling failure

'csum_partial' and 'csum_partial_copy_from_user' have already been
exported in "lib/", so need not export them again, or it will cause
compiling error.

The related error (with allmodconfig under unicore32):

    LD      vmlinux.o
  lib/built-in.o:(___ksymtab+csum_partial+0x0): multiple definition of `__ksymtab_csum_partial'
  arch/unicore32/kernel/built-in.o:(___ksymtab+csum_partial+0x0): first defined here
  lib/built-in.o:(___ksymtab+csum_partial_copy_from_user+0x0): multiple definition of `__ksymtab_csum_partial_copy_from_user'
  arch/unicore32/kernel/built-in.o:(___ksymtab+csum_partial_copy_from_user+0x0): first defined here
  make: *** [vmlinux] Error 1

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

drivers/rtc/rtc-puv3.c: remove "&dev->" for typo issue MIME-Version: 1.0

It is only a typo issue, the related commit:

  "1fbc4c4 drivers/rtc/rtc-puv3.c: use dev_dbg() instead of pr_debug()"

The related error (for unicore32 with allmodconfig):

    CC [M]  drivers/rtc/rtc-puv3.o
  drivers/rtc/rtc-puv3.c: In function 'puv3_rtc_setalarm':
  drivers/rtc/rtc-puv3.c:143: error: 'struct device' has no member named 'dev'

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

drivers/rtc/rtc-puv3.c: use dev_dbg() instead of dev_debug() for typo issue

It is only a typo issue, the related commit:

  "1fbc4c4 drivers/rtc/rtc-puv3.c: use dev_dbg() instead of pr_debug()"

The related error (unicore32 with allmodconfig):

    CC [M]  drivers/rtc/rtc-puv3.o
  drivers/rtc/rtc-puv3.c: In function 'puv3_rtc_setpie':
  drivers/rtc/rtc-puv3.c:74: error: implicit declaration of function 'dev_debug'

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

arch/unicore32/include/asm/io.h: add readl_relaxed() generic definition

Need generic definition for readl_relaxed(), like other architectures
have done. Or can not pass compiling with allmodconfig, the related
error:

    CC [M]  drivers/message/fusion/mptbase.o
  drivers/message/fusion/mptbase.c: In function 'mpt_send_handshake_request':
  drivers/message/fusion/mptbase.c:1224: error: implicit declaration of function 'readl_relaxed'

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

arch/unicore32/include/asm/ptrace.h: add generic definition for profile_pc()

Add generic definition just like another architectures have done, or
can not pass compiling with allmodconfig, the related error:

    CC      kernel/profile.o
  kernel/profile.c: In function 'profile_tick':
  kernel/profile.c:419: error: implicit declaration of function 'profile_pc'
  make[1]: *** [kernel/profile.o] Error 1
  make: *** [kernel] Error 2

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

arch/unicore32/mm/alignment.c: include "asm/pgtable.h" to avoid compiling error

Need include "asm/pgtable.h" to include "asm-generic/pgtable-nopmd.h",
so can let 'pmd_t' defined. The related error with allmodconfig:

    CC      arch/unicore32/mm/alignment.o
  In file included from arch/unicore32/mm/alignment.c:24:
  arch/unicore32/include/asm/tlbflush.h:135: error: expected .). before .*. token
  arch/unicore32/include/asm/tlbflush.h:154: error: expected .). before .*. token
  In file included from arch/unicore32/mm/alignment.c:27:
  arch/unicore32/mm/mm.h:15: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token
  arch/unicore32/mm/mm.h:20: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token
  arch/unicore32/mm/mm.h:25: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token
  make[1]: *** [arch/unicore32/mm/alignment.o] Error 1
  make: *** [arch/unicore32/mm] Error 2

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

arch/unicore32/kernel/clock.c: add readl() and writel() for 'PM_' macros

Add readl() and writel() for 'PM_' macros, just like another areas have
done within unicored32, or will cause compiling issue.

The related error (allmodconfig for unicored32):

    CC      arch/unicore32/kernel/clock.o
  arch/unicore32/kernel/clock.c: In function 'clk_set_rate':
  arch/unicore32/kernel/clock.c:182: warning: initialization makes integer from pointer without a cast
  arch/unicore32/kernel/clock.c:204: error: lvalue required as left operand of assignment
  arch/unicore32/kernel/clock.c:206: error: lvalue required as left operand of assignment
  arch/unicore32/kernel/clock.c:207: error: invalid operands to binary & (have 'void *' and 'long unsigned int')
  make[1]: *** [arch/unicore32/kernel/clock.o] Error 1
  make: *** [arch/unicore32/kernel] Error 2

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

arch/unicore32/kernel/module.c: use __vmalloc_node_range() instead of __vmalloc_area()

__vmalloc_area() has already been removed from upstream kernel, need
use __vmalloc_node_range() instead of.

The related commit: "d0a2126 mm: unify module_alloc code for vmalloc".

The related error (allmodconfig for unicore32):

    CC      arch/unicore32/kernel/module.o
  arch/unicore32/kernel/module.c: In function 'module_alloc' :
  arch/unicore32/kernel/module.c:34: error: implicit declaration of function '__vmalloc_area'
  arch/unicore32/kernel/module.c:34: warning: return makes pointer from integer without a cast
  make[1]: *** [arch/unicore32/kernel/module.o] Error 1
  make: *** [arch/unicore32/kernel] Error 2

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

arch/unicore32/kernel/ksyms.c: remove several undefined exported symbols

For 'csum_partial_copy_nocheck()', it has default definition in
'asm-generic'.

For '__raw_reads?()' and '__raw_writes?()' are used by the drivers
which no relationship with allmodconfig for unicode32, the related
modules are:

  drivers/mmc/host/omap.c
  drivers/mtd/nand/atmel_nand.c
  drivers/mtd/nand/pxa3xx_nand.c
  drivers/usb/gadget/at91_udc.c

Others are only within some architectures (not kernel wide).

The related error with allmodconfig for unicode32:

    CC      arch/unicore32/kernel/ksyms.o
  arch/unicore32/kernel/ksyms.c:29: error: ._backtrace. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:29: error: type defaults to .nt. in declaration of ._backtrace.
  arch/unicore32/kernel/ksyms.c:38: error: .sum_partial_copy_nocheck. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:38: error: type defaults to .nt. in declaration of .sum_partial_copy_nocheck.
  arch/unicore32/kernel/ksyms.c:39: error: ._csum_ipv6_magic. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:39: error: type defaults to .nt. in declaration of ._csum_ipv6_magic.
  arch/unicore32/kernel/ksyms.c:43: error: ._raw_readsb. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:43: error: type defaults to .nt. in declaration of ._raw_readsb.
  arch/unicore32/kernel/ksyms.c:46: error: ._raw_readsw. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:46: error: type defaults to .nt. in declaration of ._raw_readsw.
  arch/unicore32/kernel/ksyms.c:49: error: ._raw_readsl. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:49: error: type defaults to .nt. in declaration of ._raw_readsl.
  arch/unicore32/kernel/ksyms.c:52: error: ._raw_writesb. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:52: error: type defaults to .nt. in declaration of ._raw_writesb.
  arch/unicore32/kernel/ksyms.c:55: error: ._raw_writesw. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:55: error: type defaults to .nt. in declaration of ._raw_writesw.
  arch/unicore32/kernel/ksyms.c:58: error: ._raw_writesl. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:58: error: type defaults to .nt. in declaration of ._raw_writesl.
  arch/unicore32/kernel/ksyms.c:79: error: ._get_user_1. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:79: error: type defaults to .nt. in declaration of ._get_user_1.
  arch/unicore32/kernel/ksyms.c:80: error: ._get_user_2. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:80: error: type defaults to .nt. in declaration of ._get_user_2.
  arch/unicore32/kernel/ksyms.c:81: error: ._get_user_4. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:81: error: type defaults to .nt. in declaration of ._get_user_4.
  arch/unicore32/kernel/ksyms.c:83: error: ._put_user_1. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:83: error: type defaults to .nt. in declaration of ._put_user_1.
  arch/unicore32/kernel/ksyms.c:84: error: ._put_user_2. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:84: error: type defaults to .nt. in declaration of ._put_user_2.
  arch/unicore32/kernel/ksyms.c:85: error: ._put_user_4. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:85: error: type defaults to .nt. in declaration of ._put_user_4.
  arch/unicore32/kernel/ksyms.c:86: error: ._put_user_8. undeclared here (not in a function)
  arch/unicore32/kernel/ksyms.c:86: error: type defaults to .nt. in declaration of ._put_user_8.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>

Btrfs: fix wrong error handle when the device is missing or is not writeable

The original bio might be submitted, so we shoud increase bi_remaining to
account for it when we deal with the error that the device is missing or
is not writeable, or we would skip the endio handle.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: fix deadlock when mounting a degraded fs

The deadlock happened when we mount degraded filesystem, the reproduced
steps are following:
# mkfs.btrfs -f -m raid1 -d raid1 <dev0> <dev1>
# echo 1 > /sys/block/`basename <dev0>`/device/delete
# mount -o degraded <dev1> <mnt>

The reason was that the counter -- bi_remaining was wrong. If the missing
or unwriteable device was the last device in the mapping array, we would
not submit the original bio, so we shouldn't increase bi_remaining of it
in btrfs_end_bio(), or we would skip the final endio handle.

Fix this problem by adding a flag into btrfs bio structure. If we submit
the original bio, we will set the flag, and we increase bi_remaining counter,
or we don't.

Though there is another way to fix it -- decrease bi_remaining counter of the
original bio when we make sure the original bio is not submitted, this method
need add more check and is easy to make mistake.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: use bio_endio_nodec instead of open code

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: fix NULL pointer crash when running balance and scrub concurrently

While running balance, scrub, fsstress concurrently we hit the
following kernel crash:

[56561.448845] BTRFS info (device sde): relocating block group 11005853696 flags 132
[56561.524077] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
[56561.524237] IP: [<ffffffffa038956d>] scrub_chunk.isra.12+0xdd/0x130 [btrfs]
[56561.524297] PGD 9be28067 PUD 7f3dd067 PMD 0
[56561.524325] Oops: 0000 [#1] SMP
[....]
[56561.527237] Call Trace:
[56561.527309]  [<ffffffffa038980e>] scrub_enumerate_chunks+0x24e/0x490 [btrfs]
[56561.527392]  [<ffffffff810abe00>] ? abort_exclusive_wait+0x50/0xb0
[56561.527476]  [<ffffffffa038add4>] btrfs_scrub_dev+0x1a4/0x530 [btrfs]
[56561.527561]  [<ffffffffa0368107>] btrfs_ioctl+0x13f7/0x2a90 [btrfs]
[56561.527639]  [<ffffffff811c82f0>] do_vfs_ioctl+0x2e0/0x4c0
[56561.527712]  [<ffffffff8109c384>] ? vtime_account_user+0x54/0x60
[56561.527788]  [<ffffffff810f768c>] ? __audit_syscall_entry+0x9c/0xf0
[56561.527870]  [<ffffffff811c8551>] SyS_ioctl+0x81/0xa0
[56561.527941]  [<ffffffff815707f7>] tracesys+0xdd/0xe2
[...]
[56561.528304] RIP  [<ffffffffa038956d>] scrub_chunk.isra.12+0xdd/0x130 [btrfs]
[56561.528395]  RSP <ffff88004c0f5be8>
[56561.528454] CR2: 0000000000000078

This is because in btrfs_relocate_chunk(), we will free @bdev directly while
scrub may still hold extent mapping, and may access freed memory.

Fix this problem by wrapping freeing @bdev work into free_extent_map() which
is based on reference count.

Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>

btrfs: Skip scrubbing removed chunks to avoid -ENOENT.

When run scrub with balance, sometimes -ENOENT will be returned, since
in scrub_enumerate_chunks() will search dev_extent in *COMMIT_ROOT*, but
btrfs_lookup_block_group() will search block group in *MEMORY*, so if a
chunk is removed but not committed, -ENOENT will be returned.

However, there is no need to stop scrubbing since other chunks may be
scrubbed without problem.

So this patch changes the behavior to skip removed chunks and continue
to scrub the rest.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: fix broken free space cache after the system crashed

When we mounted the filesystem after the crash, we got the following
message:
  BTRFS error (device xxx): block group xxxx has wrong amount of free space
  BTRFS error (device xxx): failed to load free space cache for block group xxx

It is because we didn't update the metadata of the allocated space (in extent
tree) until the file data was written into the disk. During this time, there was
no information about the allocated spaces in either the extent tree nor the
free space cache. when we wrote out the free space cache at this time (commit
transaction), those spaces were lost. In fact, only the free space that is
used to store the file data had this problem, the others didn't because
the metadata of them is updated in the same transaction context.

There are many methods which can fix the above problem
- track the allocated space, and write it out when we write out the free
  space cache
- account the size of the allocated space that is used to store the file
  data, if the size is not zero, don't write out the free space cache.

The first one is complex and may make the performance drop down.
This patch chose the second method, we use a per-block-group variant to
account the size of that allocated space. Besides that, we also introduce
a per-block-group read-write semaphore to avoid the race between
the allocation and the free space cache write out.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: make free space cache write out functions more readable

This patch makes the free space cache write out functions more readable,
and beisdes that, it also reduces the stack space that the function --
__btrfs_write_out_cache uses from 194bytes to 144bytes.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: remove unused wait queue in struct extent_buffer

The lock_wq wait queue is not used anywhere, therefore just remove it.
On a x86_64 system, this reduced sizeof(struct extent_buffer) from 320
bytes down to 296 bytes, which means a 4Kb page can now be used for
13 extent buffers instead of 12.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: fix deadlocks with trylock on tree nodes

The Btrfs tree trylock function is poorly named.  It always takes
the spinlock and backs off if the blocking lock is held.  This
can lead to surprising lockups because people expect it to really be a
trylock.

This commit makes it a pure trylock, both for the spinlock and the
blocking lock.  It also reworks the nested lock handling slightly to
avoid taking the read lock while a spinning write lock might be held.

Signed-off-by: Chris Mason <clm@fb.com>

tty/serial: fix 8250 early console option passing to regular console

In the conversion to generic early console, the passing of options from
the early 8250 console to the regular ttyS console was broken. This
resulted in the baud rate changing when switching consoles during boot.

This feature allows specifying a single console option on the kernel
command line rather than both an early console and regular serial tty
console. It would be nice to generalize this feature. However, it only
works if the correct baud rate can be probed early which is not the
case on many platforms which have non-standard UART clock rates. So for
now, this is left as an 8250 specific feature.

Reported-and-tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tty: Correct INPCK handling

If INPCK is not set, input parity detection should be disabled. This means
parity errors should not be received from the tty driver, and the data
received should be treated normally.

SUS v3, 11.2.2, General Terminal Interface - Input Modes, states:
  "If INPCK is set, input parity checking shall be enabled. If INPCK is
   not set, input parity checking shall be disabled, allowing output parity
   generation without input parity errors. Note that whether input parity
   checking is enabled or disabled is independent of whether parity detection
   is enabled or disabled (see Control Modes). If parity detection is enabled
   but input parity checking is disabled, the hardware to which the terminal
   is connected shall recognize the parity bit, but the terminal special file
   shall not check whether or not this bit is correctly set."

Ignore parity errors reported by the tty driver when INPCK is not set, and
handle the received data normally.

Fixes: Bugzilla #71681, 'Improvement of n_tty_receive_parity_error from n_tty.c'
Reported-by: Ivan <athlon_@mail.ru>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

serial: Fix IGNBRK handling

If IGNBRK is set without either BRKINT or PARMRK set, some uart
drivers send a 0x00 byte for BREAK without the TTYBREAK flag to the
line discipline, when it should send either nothing or the TTYBREAK flag
set. This happens because the read_status_mask masks out the BI
condition, which uart_insert_char() then interprets as a normal 0x00 byte.

SUS v3 is clear regarding the meaning of IGNBRK; Section 11.2.2, General
Terminal Interface - Input Modes, states:
  "If IGNBRK is set, a break condition detected on input shall be ignored;
   that is, not put on the input queue and therefore not read by any
   process."

Fix read_status_mask to include the BI bit if IGNBRK is set; the
lsr status retains the BI bit if a BREAK is recv'd, which is
subsequently ignored in uart_insert_char() when masked with the
ignore_status_mask.

Affected drivers:
8250 - all
serial_txx9
mfd
amba-pl010
amba-pl011
atmel_serial
bfin_uart
dz
ip22zilog
max310x
mxs-auart
netx-serial
pnx8xxx_uart
pxa
sb1250-duart
sccnxp
serial_ks8695
sirfsoc_uart
st-asc
vr41xx_siu
zs
sunzilog
fsl_lpuart
sunsab
ucc_uart
bcm63xx_uart
sunsu
efm32-uart
pmac_zilog
mpsc
msm_serial
m32r_sio

Unaffected drivers:
omap-serial
rp2
sa1100
imx
icom

Annotated for fixes:
altera_uart
mcf

Drivers without break detection:
21285
xilinx-uartps
altera_jtaguart
apbuart
arc-uart
clps711x
max3100
uartlite
msm_serial_hs
nwpserial
lantiq
vt8500_serial

Unknown:
samsung
mpc52xx_uart
bfin_sport_uart
cpm_uart/core

Fixes: Bugzilla #71651, '8250_core.c incorrectly handles IGNBRK flag'
Reported-by: Ivan <athlon_@mail.ru>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

Pull security maintainership update from James Morris:
"Add Serge Hallyn as security subsystem co-maintainer"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
security: add Serge Hallyn as a maintainer

Merge tag 'stable/for-linus-3.16-rc1-tag' of git://git./linux/kernel/git/xen/tip

Pull Xen fixes from David Vrabel:
"Xen regression and PVH fixes for 3.16-rc1

   - fix dom0 PVH memory setup on latest unstable Xen releases
   - fix 64-bit x86 PV guest boot failure on Xen 3.1 and earlier
   - fix resume regression on non-PV (auto-translated physmap) guests"

* tag 'stable/for-linus-3.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/grant-table: fix suspend for non-PV guests
  x86/xen: no need to explicitly register an NMI callback
  Revert "xen/pvh: Update E820 to work with PVH (v2)"
  x86/xen: fix memory setup for PVH dom0

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
"These are primarily bug fixes with a performance improvement patch for
  the GHASH crypto algorithm (which went in during this merging window)
  and dts/defconfig/Kconfig updates.

   - ftrace_return_addr() macro fix for arm (introduced earlier via the
     arm64 tree)
   - stack alignment exception entry code fix
   - GHASH crypto algorithm fix and performance improvement
   - CMA buffer limited to 32-bit (until a better way to describe the
     system topology in DT)
   - UAPI sigcontext.h build fix
   - __kernel_old_{gid,uid}_t definitions fix (affecting 32-bit LTP)
   - ptrace fixes (kernel fault and 32-bit arm core dump)
   - pte_mknotpresent() fix
   - dts updates (APM SoC)
   - defconfig and Kconfig update"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: remove broken &= operator from pmd_mknotpresent
  arm64: fix build error in sigcontext.h
  arm64: dts: Add more serial port nodes in APM X-Gene device tree
  arm64/dma: Removing ARCH_HAS_DMA_GET_REQUIRED_MASK macro
  arm64: ptrace: fix empty registers set in prstatus of aarch32 process core
  arm64: uid16: fix __kernel_old_{gid,uid}_t definitions
  arm64: ptrace: change fs when passing kernel pointer to regset code
  arm64: Limit the CMA buffer to 32-bit if ZONE_DMA
  arm/ftrace: fix ftrace_return_addr() to ftrace_return_address()
  arm64/crypto: improve performance of GHASH algorithm
  arm64/crypto: fix data corruption bug in GHASH algorithm
  arm64: defconfig update for LTP
  arm64: ftrace: Fix comment typo 'CONFIG_FUNCTION_GRAPH_FP_TEST'
  arm64: add ARCH_HAS_OPP to allow enabling OPP library
  arm64: restore alphabetic order in Kconfig
  arm64: Bug fix in stack alignment exception

Merge git://git./linux/kernel/git/davem/sparc-next

Pull sparc fixes from David Miller:
"Sparc sparse fixes from Sam Ravnborg"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (67 commits)
  sparc64: fix sparse warnings in int_64.c
  sparc64: fix sparse warning in ftrace.c
  sparc64: fix sparse warning in kprobes.c
  sparc64: fix sparse warning in kgdb_64.c
  sparc64: fix sparse warnings in compat_audit.c
  sparc64: fix sparse warnings in init_64.c
  sparc64: fix sparse warnings in aes_glue.c
  sparc: fix sparse warnings in smp_32.c + smp_64.c
  sparc64: fix sparse warnings in perf_event.c
  sparc64: fix sparse warnings in kprobes.c
  sparc64: fix sparse warning in tsb.c
  sparc64: clean up compat_sigset_t.seta handling
  sparc64: fix sparse "Should it be static?" warnings in signal32.c
  sparc64: fix sparse warnings in sys_sparc32.c
  sparc64: fix sparse warning in pci.c
  sparc64: fix sparse warnings in smp_64.c
  sparc64: fix sparse warning in prom_64.c
  sparc64: fix sparse warning in btext.c
  sparc64: fix sparse warnings in sys_sparc_64.c + unaligned_64.c
  sparc64: fix sparse warning in process_64.c
  ...

Conflicts:
arch/sparc/include/asm/pgtable_64.h

Merge branches 'pm-cpufreq' and 'pm-cpuidle'

* pm-cpufreq:
  cpufreq: unlock when failing cpufreq_update_policy()
  intel_pstate: Correct rounding in busy calculation
  cpufreq: cpufreq-cpu0: fix CPU_THERMAL dependency

* pm-cpuidle:
  cpuidle: mvebu: Fix the name of the states

Merge branch 'pm-sleep'

* pm-sleep:
x86, kaslr: boot-time selectable with hibernation
PM / hibernate: introduce "nohibernate" boot parameter

Merge branches 'acpi-general', 'acpi-processor', 'acpi-lpss' and 'acpi-battery'

* acpi-general:
  ACPI: use kstrto*() instead of simple_strto*()

* acpi-processor:
  ACPI / processor replace __attribute__((packed)) by __packed

* acpi-lpss:
  ACPI / LPSS: Take I2C host controllers out of reset

* acpi-battery:
  ACPI / battery: add quirk for Acer Aspire V5-573G
  ACPI / battery: use callback for setting up quirks