Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android

Merge tag 'v4.4.66' into linux-linaro-lsk-v4.4

This is the 4.4.66 stable release

Linux 4.4.66

ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram

commit 34a477e5297cbaa6ecc6e17c042a866e1cbe80d6 upstream.

On x86-32, with CONFIG_FIRMWARE and multiple CPUs, if you enable function
graph tracing and then suspend to RAM, it will triple fault and reboot when
it resumes.

The first fault happens when booting a secondary CPU:

startup_32_smp()
  load_ucode_ap()
    prepare_ftrace_return()
      ftrace_graph_is_dead()
        (accesses 'kill_ftrace_graph')

The early head_32.S code calls into load_ucode_ap(), which has an an
ftrace hook, so it calls prepare_ftrace_return(), which calls
ftrace_graph_is_dead(), which tries to access the global
'kill_ftrace_graph' variable with a virtual address, causing a fault
because the CPU is still in real mode.

The fix is to add a check in prepare_ftrace_return() to make sure it's
running in protected mode before continuing.  The check makes sure the
stack pointer is a virtual kernel address.  It's a bit of a hack, but
it's not very intrusive and it works well enough.

For reference, here are a few other (more difficult) ways this could
have potentially been fixed:

- Move startup_32_smp()'s call to load_ucode_ap() down to *after* paging
  is enabled.  (No idea what that would break.)

- Track down load_ucode_ap()'s entire callee tree and mark all the
  functions 'notrace'.  (Probably not realistic.)

- Pause graph tracing in ftrace_suspend_notifier_call() or bringup_cpu()
  or __cpu_up(), and ensure that the pause facility can be queried from
  real mode.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: linux-acpi@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/5c1272269a580660703ed2eccf44308e790c7a98.1492123841.git.jpoimboe@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARCv2: save r30 on kernel entry as gcc uses it for code-gen

commit ecd43afdbe72017aefe48080631eb625e177ef4d upstream.

This is not exposed to userspace debugers yet, which can be done
independently as a seperate patch !

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: check for oversized NFSv2/v3 arguments

commit e6838a29ecb484c97e4efef9429643b9851fba6e upstream.

A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.

Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply.  This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE).  But a client that sends an incorrectly long reply
can violate those assumptions.  This was observed to cause crashes.

Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.

So, following a suggestion from Neil Brown, add a central check to
enforce our expectation that no NFSv2/v3 call has both a large call and
a large reply.

As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.

We may also consider rejecting calls that have any extra garbage
appended.  That would be safer, and within our rights by spec, but given
the age of our server and the NFS protocol, and the fact that we've
never enforced this before, we may need to balance that against the
possibility of breaking some oddball client.

Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: i8042 - add Clevo P650RS to the i8042 reset list

commit 7c5bb4ac2b76d2a09256aec8a7d584bf3e2b0466 upstream.

Clevo P650RS and other similar devices require i8042 to be reset in order
to detect Synaptics touchpad.

Reported-by: Paweł Bylica <chfast@gmail.com>
Tested-by: Ed Bordin <edbordin@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=190301
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

p9_client_readdir() fix

commit 71d6ad08379304128e4bdfaf0b4185d54375423e upstream.

Don't assume that server is sane and won't return more data than
asked for.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: Avoid BUG warning in arch_check_elf

commit c46f59e90226fa5bfcc83650edebe84ae47d454b upstream.

arch_check_elf contains a usage of current_cpu_data that will call
smp_processor_id() with preemption enabled and therefore triggers a
"BUG: using smp_processor_id() in preemptible" warning when an fpxx
executable is loaded.

As a follow-up to commit b244614a60ab ("MIPS: Avoid a BUG warning during
prctl(PR_SET_FP_MODE, ...)"), apply the same fix to arch_check_elf by
using raw_current_cpu_data instead. The rationale quoted from the previous
commit:

"It is assumed throughout the kernel that if any CPU has an FPU, then
all CPUs would have an FPU as well, so it is safe to perform the check
with preemption enabled - change the code to use raw_ variant of the
check to avoid the warning."

Fixes: 46490b572544 ("MIPS: kernel: elf: Improve the overall ABI and FPU mode checks")
Signed-off-by: James Cowgill <James.Cowgill@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15951/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: KGDB: Use kernel context for sleeping threads

commit 162b270c664dca2e0944308e92f9fcc887151a72 upstream.

KGDB is a kernel debug stub and it can't be used to debug userland as it
can only safely access kernel memory.

On MIPS however KGDB has always got the register state of sleeping
processes from the userland register context at the beginning of the
kernel stack. This is meaningless for kernel threads (which never enter
userland), and for user threads it prevents the user seeing what it is
doing while in the kernel:

(gdb) info threads
  Id   Target Id         Frame
  ...
  3    Thread 2 (kthreadd) 0x0000000000000000 in ?? ()
  2    Thread 1 (init)   0x000000007705c4b4 in ?? ()
  1    Thread -2 (shadowCPU0) 0xffffffff8012524c in arch_kgdb_breakpoint () at arch/mips/kernel/kgdb.c:201

Get the register state instead from the (partial) kernel register
context stored in the task's thread_struct for resume() to restore. All
threads now correctly appear to be in context_switch():

(gdb) info threads
  Id   Target Id         Frame
  ...
  3    Thread 2 (kthreadd) context_switch (rq=<optimized out>, cookie=..., next=<optimized out>, prev=0x0) at kernel/sched/core.c:2903
  2    Thread 1 (init)   context_switch (rq=<optimized out>, cookie=..., next=<optimized out>, prev=0x0) at kernel/sched/core.c:2903
  1    Thread -2 (shadowCPU0) 0xffffffff8012524c in arch_kgdb_breakpoint () at arch/mips/kernel/kgdb.c:201

Call clobbered registers which aren't saved and exception registers
(BadVAddr & Cause) which can't be easily determined without stack
unwinding are reported as 0. The PC is taken from the return address,
such that the state presented matches that found immediately after
returning from resume().

Fixes: 8854700115ec ("[MIPS] kgdb: add arch support for the kernel's kgdb core")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15829/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: seq: Don't break snd_use_lock_sync() loop by timeout

commit 4e7655fd4f47c23e5249ea260dc802f909a64611 upstream.

The snd_use_lock_sync() (thus its implementation
snd_use_lock_sync_helper()) has the 5 seconds timeout to break out of
the sync loop. It was introduced from the beginning, just to be
"safer", in terms of avoiding the stupid bugs.

However, as Ben Hutchings suggested, this timeout rather introduces a
potential leak or use-after-free that was apparently fixed by the
commit 2d7d54002e39 ("ALSA: seq: Fix race during FIFO resize"):
for example, snd_seq_fifo_event_in() -> snd_seq_event_dup() ->
copy_from_user() could block for a long time, and snd_use_lock_sync()
goes timeout and still leaves the cell at releasing the pool.

For fixing such a problem, we remove the break by the timeout while
still keeping the warning.

Suggested-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: firewire-lib: fix inappropriate assignment between signed/unsigned type

commit dfb00a56935186171abb5280b3407c3f910011f1 upstream.

An abstraction of asynchronous transaction for transmission of MIDI
messages was introduced in Linux v4.4. Each driver can utilize this
abstraction to transfer MIDI messages via fixed-length payload of
transaction to a certain unit address. Filling payload of the transaction
is done by callback. In this callback, each driver can return negative
error code, however current implementation assigns the return value to
unsigned variable.

This commit changes type of the variable to fix the bug.

Reported-by: Julia Lawall <Julia.Lawall@lip6.fr>
Fixes: 585d7cba5e1f ("ALSA: firewire-lib: add helper functions for asynchronous transactions to transfer MIDI messages")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipv6: check raw payload size correctly in ioctl

[ Upstream commit 105f5528b9bbaa08b526d3405a5bcd2ff0c953c8 ]

In situations where an skb is paged, the transport header pointer and
tail pointer can be the same because the skb contents are in frags.

This results in ioctl(SIOCINQ/FIONREAD) incorrectly returning a
length of 0 when the length to receive is actually greater than zero.

skb->len is already correctly set in ip6_input_finish() with
pskb_pull(), so use skb->len as it always returns the correct result
for both linear and paged data.

Signed-off-by: Jamie Bainbridge <jbainbri@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipv6: check skb->protocol before lookup for nexthop

[ Upstream commit 199ab00f3cdb6f154ea93fa76fd80192861a821d ]

Andrey reported a out-of-bound access in ip6_tnl_xmit(), this
is because we use an ipv4 dst in ip6_tnl_xmit() and cast an IPv4
neigh key as an IPv6 address:

        neigh = dst_neigh_lookup(skb_dst(skb),
                                 &ipv6_hdr(skb)->daddr);
        if (!neigh)
                goto tx_err_link_failure;

        addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE
        addr_type = ipv6_addr_type(addr6);

        if (addr_type == IPV6_ADDR_ANY)
                addr6 = &ipv6_hdr(skb)->daddr;

        memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));

Also the network header of the skb at this point should be still IPv4
for 4in6 tunnels, we shold not just use it as IPv6 header.

This patch fixes it by checking if skb->protocol is ETH_P_IPV6: if it
is, we are safe to do the nexthop lookup using skb_dst() and
ipv6_hdr(skb)->daddr; if not (aka IPv4), we have no clue about which
dest address we can pick here, we have to rely on callers to fill it
from tunnel config, so just fall to ip6_route_output() to make the
decision.

Fixes: ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

macvlan: Fix device ref leak when purging bc_queue

[ Upstream commit f6478218e6edc2a587b8f132f66373baa7b2497c ]

When a parent macvlan device is destroyed we end up purging its
broadcast queue without dropping the device reference count on
the packet source device. This causes the source device to linger.

This patch drops that reference count.

Fixes: 260916dfb48c ("macvlan: Fix potential use-after free for...")
Reported-by: Joe Ghalam <Joe.Ghalam@dell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ip6mr: fix notification device destruction

[ Upstream commit 723b929ca0f79c0796f160c2eeda4597ee98d2b8 ]

Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
because we call unregister_netdevice_many for a device that is already
being destroyed. In IPv4's ipmr that has been resolved by two commits
long time ago by introducing the "notify" parameter to the delete
function and avoiding the unregister when called from a notifier, so
let's do the same for ip6mr.

The trace from Andrey:
------------[ cut here ]------------
kernel BUG at net/core/dev.c:6813!
invalid opcode: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Workqueue: netns cleanup_net
task: ffff880069208000 task.stack: ffff8800692d8000
RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813
RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297
RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569
RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070
R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000
FS: 0000000000000000(0000) GS:ffff88006cb00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0
Call Trace:
unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880
ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346
notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
__raw_notifier_call_chain kernel/notifier.c:394
raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647
call_netdevice_notifiers net/core/dev.c:1663
rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841
unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
unregister_netdevice_many net/core/dev.c:7880
default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333
ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144
cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463
process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097
worker_thread+0x223/0x19c0 kernel/workqueue.c:2231
kthread+0x35e/0x430 kernel/kthread.c:231
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89
47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f>
0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00
RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0
---[ end trace e0b29c57e9b3292c ]---

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netpoll: Check for skb->queue_mapping

[ Upstream commit c70b17b775edb21280e9de7531acf6db3b365274 ]

Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
otherwise skbs with queue_mapping greater than real_num_tx_queues
can be sent to the underlying driver and can result in kernel panic.

One such event is running netconsole and enabling VF on the same
device. Or running netconsole and changing number of tx queues via
ethtool on same device.

e.g.
Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 0000000000001525
tsk->{mm,active_mm}->pgd = fff800130ff9a000
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
kworker/48:1(475): Oops [#1]
CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G           OE
4.11.0-rc3-davem-net+ #7
Workqueue: events queue_process
task: fff80013113299c0 task.stack: fff800131132c000
TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y:
00000000    Tainted: G           OE
TPC: <ixgbe_xmit_frame_ring+0x7c/0x6c0 [ixgbe]>
g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3:
0000000000000001
g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7:
00000000000000c0
o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3:
0000000000000003
o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc:
000000000049ed94
RPC: <set_next_entity+0x34/0xb80>
l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3:
0000000000000000
l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7:
fff8001fa7605028
i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3:
0000000000000000
i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7:
00000000103fa4b0
I7: <ixgbe_xmit_frame+0x30/0xa0 [ixgbe]>
Call Trace:
[00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
[0000000000998c74] netpoll_start_xmit+0xf4/0x200
[0000000000998e10] queue_process+0x90/0x160
[0000000000485fa8] process_one_work+0x188/0x480
[0000000000486410] worker_thread+0x170/0x4c0
[000000000048c6b8] kthread+0xd8/0x120
[0000000000406064] ret_from_fork+0x1c/0x2c
[0000000000000000]           (null)
Disabling lock debugging due to kernel taint
Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200
Caller[0000000000998e10]: queue_process+0x90/0x160
Caller[0000000000485fa8]: process_one_work+0x188/0x480
Caller[0000000000486410]: worker_thread+0x170/0x4c0
Caller[000000000048c6b8]: kthread+0xd8/0x120
Caller[0000000000406064]: ret_from_fork+0x1c/0x2c
Caller[0000000000000000]:           (null)

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: ipv6: RTF_PCPU should not be settable from userspace

[ Upstream commit 557c44be917c322860665be3d28376afa84aa936 ]

Andrey reported a fault in the IPv6 route code:

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff880069809600 task.stack: ffff880062dc8000
RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975
RSP: 0018:ffff880062dced30 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006
RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018
RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000
FS: 00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0
Call Trace:
ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128
ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212
...

Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
set. Flags passed to the kernel are blindly copied to the allocated
rt6_info by ip6_route_info_create making a newly inserted route appear
as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
and expects rt->dst.from to be set - which it is not since it is not
really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
generates the fault.

Fix by checking for the flag and failing with EINVAL.

Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dp83640: don't recieve time stamps twice

[ Upstream commit 9d386cd9a755c8293e8916264d4d053878a7c9c7 ]

This patch is prompted by a static checker warning about a potential
use after free.  The concern is that netif_rx_ni() can free "skb" and we
call it twice.

When I look at the commit that added this, it looks like some stray
lines were added accidentally.  It doesn't make sense to me that we
would recieve the same data two times.  I asked the author but never
recieved a response.

I can't test this code, but I'm pretty sure my patch is correct.

Fixes: 4b063258ab93 ("dp83640: Delay scheduled work.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tcp: clear saved_syn in tcp_disconnect()

[ Upstream commit 17c3060b1701fc69daedb4c90be6325d3d9fca8e ]

In the (very unlikely) case a passive socket becomes a listener,
we do not want to duplicate its saved SYN headers.

This would lead to double frees, use after free, and please hackers and
various fuzzers

Tested:
    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, IPPROTO_TCP, TCP_SAVE_SYN, [1], 4) = 0
   +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0

   +0 bind(3, ..., ...) = 0
   +0 listen(3, 5) = 0

   +0 < S 0:0(0) win 32972 <mss 1460,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <...>
  +.1 < . 1:1(0) ack 1 win 257
   +0 accept(3, ..., ...) = 4

   +0 connect(4, AF_UNSPEC, ...) = 0
   +0 close(3) = 0
   +0 bind(4, ..., ...) = 0
   +0 listen(4, 5) = 0

   +0 < S 0:0(0) win 32972 <mss 1460,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <...>
  +.1 < . 1:1(0) ack 1 win 257

Fixes: cd8ae85299d5 ("tcp: provide SYN headers for passive connections")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sctp: listen on the sock only when it's state is listening or closed

[ Upstream commit 34b2789f1d9bf8dcca9b5cb553d076ca2cd898ee ]

Now sctp doesn't check sock's state before listening on it. It could
even cause changing a sock with any state to become a listening sock
when doing sctp_listen.

This patch is to fix it by checking sock's state in sctp_listen, so
that it will listen on the sock with right state.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given

[ Upstream commit a8801799c6975601fd58ae62f48964caec2eb83f ]

inet_rtm_getroute synthesizes a skeletal ICMP skb, which is passed to
ip_route_input when iif is given. If a multipath route is present for
the designated destination, ip_multipath_icmp_hash ends up being called,
which uses the source/destination addresses within the skb to calculate
a hash. However, those are not set in the synthetic skb, causing it to
return an arbitrary and incorrect result.

Instead, use UDP, which gets no such special treatment.

Signed-off-by: Florian Larysch <fl@n621.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

l2tp: fix PPP pseudo-wire auto-loading

[ Upstream commit 249ee819e24c180909f43c1173c8ef6724d21faf ]

PPP pseudo-wire type is 7 (11 is L2TP_PWTYPE_IP).

Fixes: f1f39f911027 ("l2tp: auto load type modules")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

l2tp: take reference on sessions being dumped

[ Upstream commit e08293a4ccbcc993ded0fdc46f1e57926b833d63 ]

Take a reference on the sessions returned by l2tp_session_find_nth()
(and rename it l2tp_session_get_nth() to reflect this change), so that
caller is assured that the session isn't going to disappear while
processing it.

For procfs and debugfs handlers, the session is held in the .start()
callback and dropped in .show(). Given that pppol2tp_seq_session_show()
dereferences the associated PPPoL2TP socket and that
l2tp_dfs_seq_session_show() might call pppol2tp_show(), we also need to
call the session's .ref() callback to prevent the socket from going
away from under us.

Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Fixes: 0ad6614048cf ("l2tp: Add debugfs files for dumping l2tp debug info")
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/packet: fix overflow in check for tp_reserve

[ Upstream commit bcc5364bdcfe131e6379363f089e7b4108d35b70 ]

When calculating po->tp_hdrlen + po->tp_reserve the result can overflow.

Fix by checking that tp_reserve <= INT_MAX on assign.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/packet: fix overflow in check for tp_frame_nr

[ Upstream commit 8f8d28e4d6d815a391285e121c3a53a0b6cb9e7b ]

When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow.

Add a check that tp_block_size * tp_block_nr <= UINT_MAX.

Since frames_per_block <= tp_block_size, the expression would
never overflow.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

l2tp: purge socket queues in the .destruct() callback

[ Upstream commit e91793bb615cf6cdd59c0b6749fe173687bb0947 ]

The Rx path may grab the socket right before pppol2tp_release(), but
nothing guarantees that it will enqueue packets before
skb_queue_purge(). Therefore, the socket can be destroyed without its
queues fully purged.

Fix this by purging queues in pppol2tp_session_destruct() where we're
guaranteed nothing is still referencing the socket.

Fixes: 9e9cb6221aa7 ("l2tp: fix userspace reception on plain L2TP sockets")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: phy: handle state correctly in phy_stop_machine

[ Upstream commit 49d52e8108a21749dc2114b924c907db43358984 ]

If the PHY is halted on stop, then do not set the state to PHY_UP. This
ensures the phy will be restarted later in phy_start when the machine is
started again.

Fixes: 00db8189d984 ("This patch adds a PHY Abstraction Layer to the Linux Kernel, enabling ethernet drivers to remain as ignorant as is reasonable of the connected PHY's design and operation details.")
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Acked-by: Xander Huff <xander.huff@ni.com>
Acked-by: Kyle Roeschley <kyle.roeschley@ni.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: neigh: guard against NULL solicit() method

[ Upstream commit 48481c8fa16410ffa45939b13b6c53c2ca609e5f ]

Dmitry posted a nice reproducer of a bug triggering in neigh_probe()
when dereferencing a NULL neigh->ops->solicit method.

This can happen for arp_direct_ops/ndisc_direct_ops and similar,
which can be used for NUD_NOARP neighbours (created when dev->header_ops
is NULL). Admin can then force changing nud_state to some other state
that would fire neigh timer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()

[ Upstream commit 9ae34dbd8afd790cb5f52467e4f816434379eafa ]

This commit moves sparc64's prototype of pmd_write() outside
of the CONFIG_TRANSPARENT_HUGEPAGE ifdef.

In 2013, commit a7b9403f0e6d ("sparc64: Encode huge PMDs using PTE
encoding.") exposed a path where pmd_write() could be called without
CONFIG_TRANSPARENT_HUGEPAGE defined.  This can result in the panic below.

The diff is awkward to read, but the changes are straightforward.
pmd_write() was moved outside of #ifdef CONFIG_TRANSPARENT_HUGEPAGE.
Also, __HAVE_ARCH_PMD_WRITE was defined.

kernel BUG at include/asm-generic/pgtable.h:576!
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
oracle_8114_cdb(8114): Kernel bad sw trap 5 [#1]
CPU: 120 PID: 8114 Comm: oracle_8114_cdb Not tainted
4.1.12-61.7.1.el6uek.rc1.sparc64 #1
task: fff8400700a24d60 ti: fff8400700bc4000 task.ti: fff8400700bc4000
TSTATE: 0000004411e01607 TPC: 00000000004609f8 TNPC: 00000000004609fc Y:
00000005    Not tainted
TPC: <gup_huge_pmd+0x198/0x1e0>
g0: 000000000001c000 g1: 0000000000ef3954 g2: 0000000000000000 g3: 0000000000000001
g4: fff8400700a24d60 g5: fff8001fa5c10000 g6: fff8400700bc4000 g7: 0000000000000720
o0: 0000000000bc5058 o1: 0000000000000240 o2: 0000000000006000 o3: 0000000000001c00
o4: 0000000000000000 o5: 0000048000080000 sp: fff8400700bc6ab1 ret_pc: 00000000004609f0
RPC: <gup_huge_pmd+0x190/0x1e0>
l0: fff8400700bc74fc l1: 0000000000020000 l2: 0000000000002000 l3: 0000000000000000
l4: fff8001f93250950 l5: 000000000113f800 l6: 0000000000000004 l7: 0000000000000000
i0: fff8400700ca46a0 i1: bd0000085e800453 i2: 000000026a0c4000 i3: 000000026a0c6000
i4: 0000000000000001 i5: fff800070c958de8 i6: fff8400700bc6b61 i7: 0000000000460dd0
I7: <gup_pud_range+0x170/0x1a0>
Call Trace:
[0000000000460dd0] gup_pud_range+0x170/0x1a0
[0000000000460e84] get_user_pages_fast+0x84/0x120
[00000000006f5a18] iov_iter_get_pages+0x98/0x240
[00000000005fa744] do_direct_IO+0xf64/0x1e00
[00000000005fbbc0] __blockdev_direct_IO+0x360/0x15a0
[00000000101f74fc] ext4_ind_direct_IO+0xdc/0x400 [ext4]
[00000000101af690] ext4_ext_direct_IO+0x1d0/0x2c0 [ext4]
[00000000101af86c] ext4_direct_IO+0xec/0x220 [ext4]
[0000000000553bd4] generic_file_read_iter+0x114/0x140
[00000000005bdc2c] __vfs_read+0xac/0x100
[00000000005bf254] vfs_read+0x54/0x100
[00000000005bf368] SyS_pread64+0x68/0x80

Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sparc64: kern_addr_valid regression

[ Upstream commit adfae8a5d833fa2b46577a8081f350e408851f5b ]

I encountered this bug when using /proc/kcore to examine the kernel. Plus a
coworker inquired about debugging tools. We computed pa but did
not use it during the maximum physical address bits test. Instead we used
the identity mapped virtual address which will always fail this test.

I believe the defect came in here:
[bpicco@zareason linus.git]$ git describe --contains bb4e6e85daa52
v3.18-rc1~87^2~4
.

Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xen/x86: don't lose event interrupts

commit c06b6d70feb32d28f04ba37aa3df17973fd37b6b upstream.

On slow platforms with unreliable TSC, such as QEMU emulated machines,
it is possible for the kernel to request the next event in the past. In
that case, in the current implementation of xen_vcpuop_clockevent, we
simply return -ETIME. To be precise the Xen returns -ETIME and we pass
it on. However the result of this is a missed event, which simply causes
the kernel to hang.

Instead it is better to always ask the hypervisor for a timer event,
even if the timeout is in the past. That way there are no lost
interrupts and the kernel survives. To do that, remove the
VCPU_SSHOTTMR_future flag.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: gadget: f_midi: Fixed a bug when buflen was smaller than wMaxPacketSize

commit 03d27ade4941076b34c823d63d91dc895731a595 upstream.

buflen by default (256) is smaller than wMaxPacketSize (512) in high-speed
devices.

That caused the OUT endpoint to freeze if the host send any data packet of
length greater than 256 bytes.

This is an example dump of what happended on that enpoint:
HOST:   [DATA][Length=260][...]
DEVICE: [NAK]
HOST:   [PING]
DEVICE: [NAK]
HOST:   [PING]
DEVICE: [NAK]
...
HOST:   [PING]
DEVICE: [NAK]

This patch fixes this problem by setting the minimum usb_request's buffer size
for the OUT endpoint as its wMaxPacketSize.

Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Felipe F. Tonello <eu@felipetonello.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

regulator: core: Clear the supply pointer if enabling fails

commit 8e5356a73604f53da6a1e0756727cb8f9f7bba17 upstream.

During the resolution of a regulator's supply, we may attempt to enable
the supply if the regulator itself is already enabled. If enabling the
supply fails, then we will call _regulator_put() for the supply.
However, the pointer to the supply has not been cleared for the
regulator and this will cause a crash if we then unregister the
regulator and attempt to call regulator_put() a second time for the
supply. Fix this by clearing the supply pointer if enabling the supply
after fails when resolving the supply for a regulator.

Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

RDS: Fix the atomicity for congestion map update

commit e47db94e10447fc467777a40302f2b393e9af2fa upstream.

Two different threads with different rds sockets may be in
rds_recv_rcvbuf_delta() via receive path. If their ports
both map to the same word in the congestion map, then
using non-atomic ops to update it could cause the map to
be incorrect. Lets use atomics to avoid such an issue.

Full credit to Wengang <wen.gang.wang@oracle.com> for
finding the issue, analysing it and also pointing out
to offending code with spin lock based fix.

Reviewed-by: Leon Romanovsky <leon@leon.nu>
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net_sched: close another race condition in tcf_mirred_release()

commit dc327f8931cb9d66191f489eb9a852fc04530546 upstream.

We saw the following extra refcount release on veth device:

  kernel: [7957821.463992] unregister_netdevice: waiting for mesos50284 to become free. Usage count = -1

Since we heavily use mirred action to redirect packets to veth, I think
this is caused by the following race condition:

CPU0:
tcf_mirred_release(): (in RCU callback)
struct net_device *dev = rcu_dereference_protected(m->tcfm_dev, 1);

CPU1:
mirred_device_event():
        spin_lock_bh(&mirred_list_lock);
        list_for_each_entry(m, &mirred_list, tcfm_list) {
                if (rcu_access_pointer(m->tcfm_dev) == dev) {
                        dev_put(dev);
                        /* Note : no rcu grace period necessary, as
                         * net_device are already rcu protected.
                         */
                        RCU_INIT_POINTER(m->tcfm_dev, NULL);
                }
        }
        spin_unlock_bh(&mirred_list_lock);

CPU0:
tcf_mirred_release():
        spin_lock_bh(&mirred_list_lock);
        list_del(&m->tcfm_list);
        spin_unlock_bh(&mirred_list_lock);
        if (dev)               // <======== Stil refers to the old m->tcfm_dev
                dev_put(dev);  // <======== dev_put() is called on it again

The action init code path is good because it is impossible to modify
an action that is being removed.

So, fix this by moving everything under the spinlock.

Fixes: 2ee22a90c7af ("net_sched: act_mirred: remove spinlock in fast path")
Fixes: 6bd00b850635 ("act_mirred: fix a race condition on mirred_list")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata

commit 8e6ce7ebeb34f0992f56de078c3744fb383657fa upstream.

The label lio_xmit_failed is used 3 times through liquidio_xmit() but it
always makes a call to dma_unmap_single() using potentially
uninitialized variables from "ndata" variable. Out of the 3 gotos, 2 run
after ndata has been initialized, and had a prior dma_map_single() call.

Fix this by adding a new error label: lio_xmit_dma_failed which does
this dma_unmap_single() and then processed with the lio_xmit_failed
fallthrough.

Fixes: f21fb3ed364bb ("Add support of Cavium Liquidio ethernet adapters")
Reported-by: coverity (CID 1309740)
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: Fix crash registers on non-crashing CPUs

commit c80e1b62ffca52e2d1d865ee58bc79c4c0c55005 upstream.

As part of handling a crash on an SMP system, an IPI is send to
all other CPUs to save their current registers and stop. It was
using task_pt_regs(current) to get the registers, but that will
only be accurate if the CPU was interrupted running in userland.
Instead allow the architecture to pass in the registers (all
pass NULL now, but allow for the future) and then use get_irq_regs()
which should be accurate as we are in an interrupt. Fall back to
task_pt_regs(current) if nothing else is available.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13050/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md:raid1: fix a dead loop when read from a WriteMostly disk

commit 816b0acf3deb6d6be5d0519b286fdd4bafade905 upstream.

If first_bad == this_sector when we get the WriteMostly disk
in read_balance(), valid disk will be returned with zero
max_sectors. It'll lead to a dead loop in make_request(), and
OOM will happen because of endless allocation of struct bio.

Since we can't get data from this disk in this case, so
continue for another disk.

Signed-off-by: Wei Fang <fangwei1@huawei.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()

commit 9e92f48c34eb2b9af9d12f892e2fe1fce5e8ce35 upstream.

We aren't checking to see if the in-inode extended attribute is
corrupted before we try to expand the inode's extra isize fields.

This can lead to potential crashes caused by the BUG_ON() check in
ext4_xattr_shift_entries().

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu: fix array out of bounds

commit 484f689fc9d4eb91c68f53e97dc355b1b06c3edb upstream.

When the initial value of i is greater than zero,
it may cause endless loop, resulting in array out
of bounds, fix it.

This is a port of the radeon fix to amdgpu.

Signed-off-by: tom will <os@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: testmgr - fix out of bound read in __test_aead()

commit abfa7f4357e3640fdee87dfc276fd0f379fb5ae6 upstream.

__test_aead() reads MAX_IVLEN bytes from template[i].iv, but the
actual length of the initialisation vector can be shorter.
The length of the IV is already calculated earlier in the
function. Let's just reuses that. Also the IV length is currently
calculated several time for no reason. Let's fix that too.
This fix an out-of-bound error detected by KASan.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: sunxi: Add apb0 gates for H3

commit 6e17b4181603d183d20c73f4535529ddf2a2a020 upstream.

This patch adds support for APB0 in H3. It seems to be compatible with
earlier SOCs. apb0 gates controls R_ block peripherals (R_PIO, R_IR,
etc).

Since this gates behave just like any Allwinner clock gate, add a generic
compatible that can be reused if we don't have any clock to protect.

Signed-off-by: Krzysztof Adamski <k@japko.eu>
[Maxime: Removed the H3 compatible from the simple-gates driver, reworked
the commit log a bit]
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: OMAP2+: timer: add probe for clocksources

commit 970f9091d25df14e9540ec7ff48a2f709e284cd1 upstream.

A few platforms are currently missing clocksource_probe() completely
in their time_init functionality. On OMAP3430 for example, this is
causing cpuidle to be pretty much dead, as the counter32k is not
going to be registered and instead a gptimer is used as a clocksource.
This will tick in periodic mode, preventing any deeper idle states.

While here, also drop one unnecessary check for populated DT before
existing clocksource_probe() call.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xc2028: unlock on error in xc2028_set_config()

commit 210bd104c6acd31c3c6b8b075b3f12d4a9f6b60d upstream.

We have to unlock before returning -ENOMEM.

Fixes: 8dfbcc4351a0 ('[media] xc2028: avoid use after free')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: do more integrity verification for superblock

commit 9a59b62fd88196844cee5fff851bee2cfd7afb6e upstream.

Do more sanity check for superblock during ->mount.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ANDROID: uid_sys_stats: fix access of task_uid(task)

struct task_struct *task should be proteced by tasklist_lock.

Change-Id: Iefcd13442a9b9d855a2bbcde9fd838a4132fee58
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>

BACKPORT: f2fs: sanity check log_blocks_per_seg

f2fs currently only supports 4KB block size and 2MB segment size.
Sanity check log_blocks_per_seg == 9, i.e. 2MB/4KB = (1 << 9)

Partially
(cherry-picked from commit 9a59b62fd88196844cee5fff851bee2cfd7afb6e)

f2fs: do more integrity verification for superblock

Do more sanity check for superblock during ->mount.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Bug: 36817013
Change-Id: I0be52e54fba82083068337ceb9f7ad985a87319f
Signed-off-by: Jin Qian <jinqian@google.com>

ANDROID: sdcardfs: Call lower fs's revalidate

We should be calling the lower filesystem's revalidate
inside of sdcardfs's revalidate, as wrapfs does.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35766959
Change-Id: I939d1c4192fafc1e21678aeab43fe3d588b8e2f4

ANDROID: sdcardfs: Avoid setting GIDs outside of valid ranges

When setting up the ownership of files on the lower filesystem,
ensure that these values are in reasonable ranges for apps. If
they aren't, default to AID_MEDIA_RW

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 37516160
Change-Id: I0bec76a61ac72aff0b993ab1ad04be8382178a00

ANDROID: sdcardfs: Copy meta-data from lower inode

From wrapfs commit 3ee9b365e38c ("Wrapfs: properly copy meta-data after
AIO operations from lower inode")

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35766959
Change-Id: I9a789222e27a17b8d85ce61c45397d1839f9a675

Revert "Revert "Android: sdcardfs: Don't do d_add for lower fs""

This reverts commit ffa75fdb9c408f49b9622b6d55752ed99ff61488.

Turns out we just needed the right hash.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 37231161
Change-Id: I6a6de7f7df99ad42b20fa062913b219f64020c31

ANDROID: sdcardfs: Use filesystem specific hash

We weren't accounting for FS specific hash functions,
causing us to miss negative dentries for any FS that
had one.

Similar to a patch from esdfs
commit 75bd25a9476d ("esdfs: support lower's own hash")

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I32d1ba304d728e0ca2648cacfb4c2e441ae63608

ANDROID: Update init/do_mounts_dm.c to the latest ChromiumOS version.

This is needed for AVB integration work.

Bug: 31796270
Test: Manually tested (other arch).
Change-Id: I32fd37c1578c6414e3e6ff277d16ad94df7886b8
Signed-off-by: David Zeuthen <zeuthen@google.com>

ANDROID: AVB error handler to invalidate vbmeta partition.

If androidboot.vbmeta.device is set and points to a device with vbmeta
magic, this header will be overwritten upon an irrecoverable dm-verity
error. The side-effect of this is that the slot will fail to verify on
next reboot, effectively triggering the boot loader to fallback to
another slot. This work both if the vbmeta struct is at the start of a
partition or if there's an AVB footer at the end.

This code is based on drivers/md/dm-verity-chromeos.c from ChromiumOS.

Example:

[    0.000000] Kernel command line: rootfstype=ext4 init=/init console=ttyS0,115200 androidboot.console=ttyS0 androidboot.hardware=uefi_x86_64 enforcing=0 androidboot.selinux=permissive androidboot.debuggable=1 buildvariant=eng dm="1 vroot none ro 1,0 2080496 verity 1 PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b 4096 4096 260062 260062 sha1 4f76354c86e430e27426d584a726f2fbffecae32 7e4085342d634065269631ac9a199e1a43f4632c 1 ignore_zero_blocks" root=0xfd00 androidboot.vbmeta.device=PARTUUID=b865935d-38fb-4c4e-b8b4-70dc67321552 androidboot.slot_suffix=_a androidboot.vbmeta.device_state=unlocked androidboot.vbmeta.hash_alg=sha256 androidboot.vbmeta.size=3200 androidboot.vbmeta.digest=14fe41c2b3696c31b7ad5eae7877d7d188995e1ab122c604aaaf4785850b91f7 skip_initramfs
[...]
[    0.612802] device-mapper: verity-avb: AVB error handler initialized with vbmeta device: PARTUUID=b865935d-38fb-4c4e-b8b4-70dc67321552
[...]
[    1.213804] device-mapper: init: attempting early device configuration.
[    1.214752] device-mapper: init: adding target '0 2080496 verity 1 PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b 4096 4096 260062 260062 sha1 4f76354c86e430e27426d584a726f2fbffecae32 7e4085342d634065269631ac9a199e1a43f4632c 1 ignore_zero_blocks'
[    1.217643] device-mapper: init: dm-0 is ready
[    1.226694] device-mapper: verity: 8:6: data block 0 is corrupted
[    1.227666] device-mapper: verity-avb: AVB error handler called for PARTUUID=b865935d-38fb-4c4e-b8b4-70dc67321552
[    1.234308] device-mapper: verity-avb: invalidate_vbmeta: found vbmeta partition
[    1.235848] device-mapper: verity-avb: invalidate_vbmeta: completed.
[...]

Bug: 31622239
Test: Manually tested (other arch).
Change-Id: Idf6be32d6a3d28e15de9302aa26ad6a516d663aa
Signed-off-by: David Zeuthen <zeuthen@google.com>

Revert "USB: gadget: u_ether: Fix data stall issue in RNDIS tethering mode"

This reverts commit 78281f6ed79413e5a16eac6edc4a72c6836ae5a9.

This data stall fix is no longer required in AOSP. It is already
skipped in android-4.9 patchset. Also core change from this
data stall fix is already undone by android-4.4 merge commit
324e88de4aba ("Merge tag 'v4.4.32' into android-4.4.y").

This revert patch just clean up the left overs. It also reverts the
compile fix from Change-Id: I38c4f4a850b0329fb4a06b2c7e45558e16d66151
40ceb2c69964 ("usb: gadget: Fix compilation problem with tx_qlen field").

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>

ANDROID: uid_sys_stats: reduce update_io_stats overhead

Replaced read_lock with rcu_read_lock to reduce time that preemption
is disabled.

Added a function to update io stats for specific uid and moved
hash table lookup, user_namespace out of loops.

Bug: 37319300
Change-Id: I2b81b5cd3b6399b40d08c3c14b42cad044556970
Signed-off-by: Jin Qian <jinqian@google.com>

ANDROID: usb: gadget: fix MTP enumeration issue under super speed mode

MTP function doesn't show as a drive in Windows when the device
is connected to PC's USB3 port, because device fails to respond
ACK to BULK OUT transfer request.

This patch modifies MTP OUT request length as multiple of MaxPacketSize
per databook requirement in order to fix this issue.

Patchset: mtp

Change-Id: I090d7880ff00c499dc5ba7fd644b1fe7cd87fcb5
Signed-off-by: Jiebing Li <jiebing.li@intel.com>
Signed-off-by: Wang, Yu <yu.y.wang@intel.com>
Signed-off-by: Russ Weight <russell.h.weight@intel.com>

Android: sdcardfs: Don't complain in fixup_lower_ownership

Not all filesystems support changing the owner of a file.
We shouldn't complain if it doesn't happen.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 37488099
Change-Id: I403e44ab7230f176e6df82f6adb4e5c82ce57f33

ANDROID: sdcardfs: ->iget fixes

Adapted from wrapfs
commit 8c49eaa0sb9c ("Wrapfs: ->iget fixes")

Change where we igrab/iput to ensure we always hold a valid lower_inode.
Return ENOMEM (not EACCES) if iget5_locked returns NULL.

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35766959

Change-Id: Id8d4e0c0cbc685a0a77685ce73c923e9a3ddc094

Android: sdcardfs: Change cache GID value

Change-Id: Ieb955dd26493da26a458bc20fbbe75bca32b094f
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 37193650

BACKPORT: [UPSTREAM] ext2: convert to mbcache2

(Cherry-pick from commit be0726d33cb8f411945884664924bed3cb8c70ee)

The conversion is generally straightforward. We convert filesystem from
a global cache to per-fs one. Similarly to ext4 the tricky part is that
xattr block corresponding to found mbcache entry can get freed before we
get buffer lock for that block. So we have to check whether the entry is
still valid after getting the buffer lock.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Bug: 32461228

BACKPORT [UPSTREAM] ext4: convert to mbcache2

(Cherry-pick from commit 82939d7999dfc1f1998c4b1c12e2f19edbdff272)

The conversion is generally straightforward. The only tricky part is
that xattr block corresponding to found mbcache entry can get freed
before we get buffer lock for that block. So we have to check whether
the entry is still valid after getting buffer lock.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Bug; 32461228

BACKPORT: [UPSTREAM] mbcache2: reimplement mbcache

(Cherry-pick from commit f9a61eb4e2471c56a63cd804c7474128138c38ac)

Original mbcache was designed to have more features than what ext?
filesystems ended up using. It supported entry being in more hashes, it
had a home-grown rwlocking of each entry, and one cache could cache
entries from multiple filesystems. This genericity also resulted in more
complex locking, larger cache entries, and generally more code
complexity.

This is reimplementation of the mbcache functionality to exactly fit the
purpose ext? filesystems use it for. Cache entries are now considerably
smaller (7 instead of 13 longs), the code is considerably smaller as
well (414 vs 913 lines of code), and IMO also simpler. The new code is
also much more lightweight.

I have measured the speed using artificial xattr-bench benchmark, which
spawns P processes, each process sets xattr for F different files, and
the value of xattr is randomly chosen from a pool of V values. Averages
of runtimes for 5 runs for various combinations of parameters are below.
The first value in each cell is old mbache, the second value is the new
mbcache.

V=10
F\P 1 2 4 8 16 32 64
10 0.158,0.157 0.208,0.196 0.500,0.277 0.798,0.400 3.258,0.584 13.807,1.047 61.339,2.803
100 0.172,0.167 0.279,0.222 0.520,0.275 0.825,0.341 2.981,0.505 12.022,1.202 44.641,2.943
1000 0.185,0.174 0.297,0.239 0.445,0.283 0.767,0.340 2.329,0.480 6.342,1.198 16.440,3.888

V=100
F\P 1 2 4 8 16 32 64
10 0.162,0.153 0.200,0.186 0.362,0.257 0.671,0.496 1.433,0.943 3.801,1.345 7.938,2.501
100 0.153,0.160 0.221,0.199 0.404,0.264 0.945,0.379 1.556,0.485 3.761,1.156 7.901,2.484
1000 0.215,0.191 0.303,0.246 0.471,0.288 0.960,0.347 1.647,0.479 3.916,1.176 8.058,3.160

V=1000
F\P 1 2 4 8 16 32 64
10 0.151,0.129 0.210,0.163 0.326,0.245 0.685,0.521 1.284,0.859 3.087,2.251 6.451,4.801
100 0.154,0.153 0.211,0.191 0.276,0.282 0.687,0.506 1.202,0.877 3.259,1.954 8.738,2.887
1000 0.145,0.179 0.202,0.222 0.449,0.319 0.899,0.333 1.577,0.524 4.221,1.240 9.782,3.579

V=10000
F\P 1 2 4 8 16 32 64
10 0.161,0.154 0.198,0.190 0.296,0.256 0.662,0.480 1.192,0.818 2.989,2.200 6.362,4.746
100 0.176,0.174 0.236,0.203 0.326,0.255 0.696,0.511 1.183,0.855 4.205,3.444 19.510,17.760
1000 0.199,0.183 0.240,0.227 1.159,1.014 2.286,2.154 6.023,6.039 ---,10.933 ---,36.620

V=100000
F\P 1 2 4 8 16 32 64
10 0.171,0.162 0.204,0.198 0.285,0.230 0.692,0.500 1.225,0.881 2.990,2.243 6.379,4.771
100 0.151,0.171 0.220,0.210 0.295,0.255 0.720,0.518 1.226,0.844 3.423,2.831 19.234,17.544
1000 0.192,0.189 0.249,0.225 1.162,1.043 2.257,2.093 5.853,4.997 ---,10.399 ---,32.198

We see that the new code is faster in pretty much all the cases and
starting from 4 processes there are significant gains with the new code
resulting in upto 20-times shorter runtimes. Also for large numbers of
cached entries all values for the old code could not be measured as the
kernel started hitting softlockups and died before the test completed.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Bug: 32461228

UPSTREAM: net: socket: Make unnecessarily global sockfs_setattr() static

Make sockfs_setattr() static as it is not used outside of net/socket.c

This fixes the following GCC warning:
net/socket.c:534:5: warning: no previous prototype for ‘sockfs_setattr’ [-Wmissing-prototypes]

Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ie613c441b3fe081bdaec8c480d3aade482873bf8
Fixes: Change-Id: Idbc3e9a0cec91c4c6e01916b967b6237645ebe59
("net: core: Add a UID field to struct sock.")
(cherry picked from commit dc647ec88e029307e60e6bf9988056605f11051a)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>

UPSTREAM: net: ipv4: Don't crash if passing a null sk to ip_do_redirect.

Commit e2d118a1cb5e ("net: inet: Support UID-based routing in IP
protocols.") made ip_do_redirect call sock_net(sk) to determine
the network namespace of the passed-in socket. This crashes if sk
is NULL.

Fix this by getting the network namespace from the skb instead.

Fixes: e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.")
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I16a3c343cb142c482ca6dd363c28b3a12d73a46d
Fixes: Change-Id: I910504b508948057912bc188fd1e8aca28294de3
("net: inet: Support UID-based routing in IP protocols.")
(cherry picked from commit 7d99569460eae28b187d574aec930a4cf8b90441)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>

ANDROID: sdcardfs: Directly pass lower file for mmap

Instead of relying on a copy hack, pass the lower file
as private data. This lets the kernel find the vma
mapping for pages used by the file, allowing pages
used by mapping to be reclaimed.

This is adapted from following esdfs patches
commit 0647e638d: ("esdfs: store lower file in vm_file for mmap")
commit 064850866: ("esdfs: keep a counter for mmaped file")

Change-Id: I75b74d1e5061db1b8c13be38d184e118c0851a1a
Signed-off-by: Daniel Rosenberg <drosen@google.com>

UPSTREAM: checkpatch: special audit for revert commit line

Currently checkpatch.pl does not recognize git's default
commit revert message and will complain about the hash format.
Add special audit for revert commit message line to fix it.

Signed-off-by: Wei Wang <wvw@google.com>
Acked-by: Joe Perches <joe@perches.com>
Bug: 37158168
Test: checkpatch.pl --patch [diff] and no longer see failure
Change-Id: I65cf9a46874621dd6d5c349d2d3ca3b862d61ba3

UPSTREAM: PM / sleep: make PM notifiers called symmetrically

(cherry picked from commit ea00f4f4f00cc2bc3b63ad512a4e6df3b20832b9)

This makes pm notifier PREPARE/POST symmetrical: if PREPARE
fails, we will only undo what ever happened on PREPARE.

It fixes the unbalanced CPU hotplug enable in CPU PM notifier.

Change-Id: I01dce3cc95c5d6b8913b7b6be301f2909258c745
Signed-off-by: Lianwei Wang <lianwei.wang@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

BACKPORT: arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions

The core and the cluster sleep state entry latencies can't be same as
cluster sleep involves more work compared to core level e.g. shared
cache maintenance.

Experiments have shown on an average about 100us more latency for the
cluster sleep state compared to the core level sleep. This patch fixes
the entry latency for the cluster sleep state.

Fixes: 28e10a8f3a03 ("arm64: dts: juno: Add idle-states to device tree")
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: "Jon Medhurst (Tixy)" <tixy@linaro.org>
Reviewed-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: Change-Id: I7b2d81fa66f8ce8b229457cfefff06e9edd545c7
(arm64: dts: juno: Add idle-states to device tree)
(cherry picked from commit 909e481e2467f202b97d42beef246e8829416a85)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>

Merge tag 'v4.4.65' into linux-linaro-lsk-v4.4

This is the 4.4.65 stable release

Linux 4.4.65

perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race

commit 321027c1fe77f892f4ea07846aeae08cefbbb290 upstream.

Di Shen reported a race between two concurrent sys_perf_event_open()
calls where both try and move the same pre-existing software group
into a hardware context.

The problem is exactly that described in commit:

f63a8daa5812 ("perf: Fix event->ctx locking")

... where, while we wait for a ctx->mutex acquisition, the event->ctx
relation can have changed under us.

That very same commit failed to recognise sys_perf_event_context() as an
external access vector to the events and thereby didn't apply the
established locking rules correctly.

So while one sys_perf_event_open() call is stuck waiting on
mutex_lock_double(), the other (which owns said locks) moves the group
about. So by the time the former sys_perf_event_open() acquires the
locks, the context we've acquired is stale (and possibly dead).

Apply the established locking rules as per perf_event_ctx_lock_nested()
to the mutex_lock_double() for the 'move_group' case. This obviously means
we need to validate state after we acquire the locks.

Reported-by: Di Shen (Keen Lab)
Tested-by: John Dias <joaodias@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Min Chong <mchong@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: f63a8daa5812 ("perf: Fix event->ctx locking")
Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 4.4:
- Test perf_event::group_flags instead of group_caps
- Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ping: implement proper locking

commit 43a6684519ab0a6c52024b5e25322476cabad893 upstream.

We got a report of yet another bug in ping

http://www.openwall.com/lists/oss-security/2017/03/24/6

->disconnect() is not called with socket lock held.

Fix this by acquiring ping rwlock earlier.

Thanks to Daniel, Alexander and Andrey for letting us know this problem.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Daniel Jiang <danieljiang0415@gmail.com>
Reported-by: Solar Designer <solar@openwall.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/android/ion : fix a race condition in the ion driver

commit 9590232bb4f4cc824f3425a6e1349afbe6d6d2b7 upstream.

There is a use-after-free problem in the ion driver.
This is caused by a race condition in the ion_ioctl()
function.

A handle has ref count of 1 and two tasks on different
cpus calls ION_IOC_FREE simultaneously.

cpu 0                                   cpu 1
-------------------------------------------------------
ion_handle_get_by_id()
(ref == 2)
                            ion_handle_get_by_id()
                            (ref == 3)

ion_free()
(ref == 2)

ion_handle_put()
(ref == 1)

                            ion_free()
                            (ref == 0 so ion_handle_destroy() is
                            called
                            and the handle is freed.)

                            ion_handle_put() is called and it
                            decreases the slub's next free pointer

The problem is detected as an unaligned access in the
spin lock functions since it uses load exclusive
instruction. In some cases it corrupts the slub's
free pointer which causes a mis-aligned access to the
next free pointer.(kmalloc returns a pointer like
ffffc0745b4580aa). And it causes lots of other
hard-to-debug problems.

This symptom is caused since the first member in the
ion_handle structure is the reference count and the
ion driver decrements the reference after it has been
freed.

To fix this problem client->lock mutex is extended
to protect all the codes that uses the handle.

Signed-off-by: Eun Taik Lee <eun.taik.lee@samsung.com>
Reviewed-by: Laura Abbott <labbott@redhat.com>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
index 7ff2a7ec871f..33b390e7ea31

vfio/pci: Fix integer overflows, bitmask check

commit 05692d7005a364add85c6e25a6c4447ce08f913a upstream.

The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize
user-supplied integers, potentially allowing memory corruption. This
patch adds appropriate integer overflow checks, checks the range bounds
for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element
in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set.
VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in
vfio_pci_set_irqs_ioctl().

Furthermore, a kzalloc is changed to a kcalloc because the use of a
kzalloc with an integer multiplication allowed an integer overflow
condition to be reached without this patch. kcalloc checks for overflow
and should prevent a similar occurrence.

Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tipc: check minimum bearer MTU

commit 3de81b758853f0b29c61e246679d20b513c4cfec upstream.

Qian Zhang (张谦) reported a potential socket buffer overflow in
tipc_msg_build() which is also known as CVE-2016-8632: due to
insufficient checks, a buffer overflow can occur if MTU is too short for
even tipc headers. As anyone can set device MTU in a user/net namespace,
this issue can be abused by a regular user.

As agreed in the discussion on Ben Hutchings' original patch, we should
check the MTU at the moment a bearer is attached rather than for each
processed packet. We also need to repeat the check when bearer MTU is
adjusted to new device MTU. UDP case also needs a check to avoid
overflow when calculating bearer MTU.

Fixes: b97bf3fd8f6a ("[TIPC] Initial merge")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 4.4:
- Adjust context
- NETDEV_GOING_DOWN and NETDEV_CHANGEMTU cases in net notifier were combined]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: nfnetlink: correctly validate length of batch messages

commit c58d6c93680f28ac58984af61d0a7ebf4319c241 upstream.

If nlh->nlmsg_len is zero then an infinite loop is triggered because
'skb_pull(skb, msglen);' pulls zero bytes.

The calculation in nlmsg_len() underflows if 'nlh->nlmsg_len <
NLMSG_HDRLEN' which bypasses the length validation and will later
trigger an out-of-bound read.

If the length validation does fail then the malformed batch message is
copied back to userspace. However, we cannot do this because the
nlh->nlmsg_len can be invalid. This leads to an out-of-bounds read in
netlink_ack:

    [   41.455421] ==================================================================
    [   41.456431] BUG: KASAN: slab-out-of-bounds in memcpy+0x1d/0x40 at addr ffff880119e79340
    [   41.456431] Read of size 4294967280 by task a.out/987
    [   41.456431] =============================================================================
    [   41.456431] BUG kmalloc-512 (Not tainted): kasan: bad access detected
    [   41.456431] -----------------------------------------------------------------------------
    ...
    [   41.456431] Bytes b4 ffff880119e79310: 00 00 00 00 d5 03 00 00 b0 fb fe ff 00 00 00 00  ................
    [   41.456431] Object ffff880119e79320: 20 00 00 00 10 00 05 00 00 00 00 00 00 00 00 00   ...............
    [   41.456431] Object ffff880119e79330: 14 00 0a 00 01 03 fc 40 45 56 11 22 33 10 00 05  .......@EV."3...
    [   41.456431] Object ffff880119e79340: f0 ff ff ff 88 99 aa bb 00 14 00 0a 00 06 fe fb  ................
                                            ^^ start of batch nlmsg with
                                               nlmsg_len=4294967280
    ...
    [   41.456431] Memory state around the buggy address:
    [   41.456431]  ffff880119e79400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [   41.456431]  ffff880119e79480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [   41.456431] >ffff880119e79500: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
    [   41.456431]                                ^
    [   41.456431]  ffff880119e79580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [   41.456431]  ffff880119e79600: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb
    [   41.456431] ==================================================================

Fix this with better validation of nlh->nlmsg_len and by setting
NFNL_BATCH_FAILURE if any batch message fails length validation.

CAP_NET_ADMIN is required to trigger the bugs.

Fixes: 9ea2aa8b7dba ("netfilter: nfnetlink: validate nfnetlink header from batch")
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xc2028: avoid use after free

commit 8dfbcc4351a0b6d2f2d77f367552f48ffefafe18 upstream.

If struct xc2028_config is passed without a firmware name,
the following trouble may happen:

[11009.907205] xc2028 5-0061: type set to XCeive xc2028/xc3028 tuner
[11009.907491] ==================================================================
[11009.907750] BUG: KASAN: use-after-free in strcmp+0x96/0xb0 at addr ffff8803bd78ab40
[11009.907992] Read of size 1 by task modprobe/28992
[11009.907994] =============================================================================
[11009.907997] BUG kmalloc-16 (Tainted: G        W      ): kasan: bad access detected
[11009.907999] -----------------------------------------------------------------------------

[11009.908008] INFO: Allocated in xhci_urb_enqueue+0x214/0x14c0 [xhci_hcd] age=0 cpu=3 pid=28992
[11009.908012] ___slab_alloc+0x581/0x5b0
[11009.908014] __slab_alloc+0x51/0x90
[11009.908017] __kmalloc+0x27b/0x350
[11009.908022] xhci_urb_enqueue+0x214/0x14c0 [xhci_hcd]
[11009.908026] usb_hcd_submit_urb+0x1e8/0x1c60
[11009.908029] usb_submit_urb+0xb0e/0x1200
[11009.908032] usb_serial_generic_write_start+0xb6/0x4c0
[11009.908035] usb_serial_generic_write+0x92/0xc0
[11009.908039] usb_console_write+0x38a/0x560
[11009.908045] call_console_drivers.constprop.14+0x1ee/0x2c0
[11009.908051] console_unlock+0x40d/0x900
[11009.908056] vprintk_emit+0x4b4/0x830
[11009.908061] vprintk_default+0x1f/0x30
[11009.908064] printk+0x99/0xb5
[11009.908067] kasan_report_error+0x10a/0x550
[11009.908070] __asan_report_load1_noabort+0x43/0x50
[11009.908074] INFO: Freed in xc2028_set_config+0x90/0x630 [tuner_xc2028] age=1 cpu=3 pid=28992
[11009.908077] __slab_free+0x2ec/0x460
[11009.908080] kfree+0x266/0x280
[11009.908083] xc2028_set_config+0x90/0x630 [tuner_xc2028]
[11009.908086] xc2028_attach+0x310/0x8a0 [tuner_xc2028]
[11009.908090] em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb]
[11009.908094] em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb]
[11009.908098] em28xx_dvb_init+0x81/0x8a [em28xx_dvb]
[11009.908101] em28xx_register_extension+0xd9/0x190 [em28xx]
[11009.908105] em28xx_dvb_register+0x10/0x1000 [em28xx_dvb]
[11009.908108] do_one_initcall+0x141/0x300
[11009.908111] do_init_module+0x1d0/0x5ad
[11009.908114] load_module+0x6666/0x9ba0
[11009.908117] SyS_finit_module+0x108/0x130
[11009.908120] entry_SYSCALL_64_fastpath+0x16/0x76
[11009.908123] INFO: Slab 0xffffea000ef5e280 objects=25 used=25 fp=0x          (null) flags=0x2ffff8000004080
[11009.908126] INFO: Object 0xffff8803bd78ab40 @offset=2880 fp=0x0000000000000001

[11009.908130] Bytes b4 ffff8803bd78ab30: 01 00 00 00 2a 07 00 00 9d 28 00 00 01 00 00 00  ....*....(......
[11009.908133] Object ffff8803bd78ab40: 01 00 00 00 00 00 00 00 b0 1d c3 6a 00 88 ff ff  ...........j....
[11009.908137] CPU: 3 PID: 28992 Comm: modprobe Tainted: G    B   W       4.5.0-rc1+ #43
[11009.908140] Hardware name:                  /NUC5i7RYB, BIOS RYBDWi35.86A.0350.2015.0812.1722 08/12/2015
[11009.908142]  ffff8803bd78a000 ffff8802c273f1b8 ffffffff81932007 ffff8803c6407a80
[11009.908148]  ffff8802c273f1e8 ffffffff81556759 ffff8803c6407a80 ffffea000ef5e280
[11009.908153]  ffff8803bd78ab40 dffffc0000000000 ffff8802c273f210 ffffffff8155ccb4
[11009.908158] Call Trace:
[11009.908162]  [<ffffffff81932007>] dump_stack+0x4b/0x64
[11009.908165]  [<ffffffff81556759>] print_trailer+0xf9/0x150
[11009.908168]  [<ffffffff8155ccb4>] object_err+0x34/0x40
[11009.908171]  [<ffffffff8155f260>] kasan_report_error+0x230/0x550
[11009.908175]  [<ffffffff81237d71>] ? trace_hardirqs_off_caller+0x21/0x290
[11009.908179]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908182]  [<ffffffff8155f5c3>] __asan_report_load1_noabort+0x43/0x50
[11009.908185]  [<ffffffff8155ea00>] ? __asan_register_globals+0x50/0xa0
[11009.908189]  [<ffffffff8194cea6>] ? strcmp+0x96/0xb0
[11009.908192]  [<ffffffff8194cea6>] strcmp+0x96/0xb0
[11009.908196]  [<ffffffffa13ba4ac>] xc2028_set_config+0x15c/0x630 [tuner_xc2028]
[11009.908200]  [<ffffffffa13bac90>] xc2028_attach+0x310/0x8a0 [tuner_xc2028]
[11009.908203]  [<ffffffff8155ea78>] ? memset+0x28/0x30
[11009.908206]  [<ffffffffa13ba980>] ? xc2028_set_config+0x630/0x630 [tuner_xc2028]
[11009.908211]  [<ffffffffa157a59a>] em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb]
[11009.908215]  [<ffffffffa157aa2a>] ? em28xx_dvb_init.part.3+0x37c/0x5cf4 [em28xx_dvb]
[11009.908219]  [<ffffffffa157a3a1>] ? hauppauge_hvr930c_init+0x487/0x487 [em28xx_dvb]
[11009.908222]  [<ffffffffa01795ac>] ? lgdt330x_attach+0x1cc/0x370 [lgdt330x]
[11009.908226]  [<ffffffffa01793e0>] ? i2c_read_demod_bytes.isra.2+0x210/0x210 [lgdt330x]
[11009.908230]  [<ffffffff812e87d0>] ? ref_module.part.15+0x10/0x10
[11009.908233]  [<ffffffff812e56e0>] ? module_assert_mutex_or_preempt+0x80/0x80
[11009.908238]  [<ffffffffa157af92>] em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb]
[11009.908242]  [<ffffffffa157a6ae>] ? em28xx_attach_xc3028.constprop.7+0x30d/0x30d [em28xx_dvb]
[11009.908245]  [<ffffffff8195222d>] ? string+0x14d/0x1f0
[11009.908249]  [<ffffffff8195381f>] ? symbol_string+0xff/0x1a0
[11009.908253]  [<ffffffff81953720>] ? uuid_string+0x6f0/0x6f0
[11009.908257]  [<ffffffff811a775e>] ? __kernel_text_address+0x7e/0xa0
[11009.908260]  [<ffffffff8104b02f>] ? print_context_stack+0x7f/0xf0
[11009.908264]  [<ffffffff812e9846>] ? __module_address+0xb6/0x360
[11009.908268]  [<ffffffff8137fdc9>] ? is_ftrace_trampoline+0x99/0xe0
[11009.908271]  [<ffffffff811a775e>] ? __kernel_text_address+0x7e/0xa0
[11009.908275]  [<ffffffff81240a70>] ? debug_check_no_locks_freed+0x290/0x290
[11009.908278]  [<ffffffff8104a24b>] ? dump_trace+0x11b/0x300
[11009.908282]  [<ffffffffa13e8143>] ? em28xx_register_extension+0x23/0x190 [em28xx]
[11009.908285]  [<ffffffff81237d71>] ? trace_hardirqs_off_caller+0x21/0x290
[11009.908289]  [<ffffffff8123ff56>] ? trace_hardirqs_on_caller+0x16/0x590
[11009.908292]  [<ffffffff812404dd>] ? trace_hardirqs_on+0xd/0x10
[11009.908296]  [<ffffffffa13e8143>] ? em28xx_register_extension+0x23/0x190 [em28xx]
[11009.908299]  [<ffffffff822dcbb0>] ? mutex_trylock+0x400/0x400
[11009.908302]  [<ffffffff810021a1>] ? do_one_initcall+0x131/0x300
[11009.908306]  [<ffffffff81296dc7>] ? call_rcu_sched+0x17/0x20
[11009.908309]  [<ffffffff8159e708>] ? put_object+0x48/0x70
[11009.908314]  [<ffffffffa1579f11>] em28xx_dvb_init+0x81/0x8a [em28xx_dvb]
[11009.908317]  [<ffffffffa13e81f9>] em28xx_register_extension+0xd9/0x190 [em28xx]
[11009.908320]  [<ffffffffa0150000>] ? 0xffffffffa0150000
[11009.908324]  [<ffffffffa0150010>] em28xx_dvb_register+0x10/0x1000 [em28xx_dvb]
[11009.908327]  [<ffffffff810021b1>] do_one_initcall+0x141/0x300
[11009.908330]  [<ffffffff81002070>] ? try_to_run_init_process+0x40/0x40
[11009.908333]  [<ffffffff8123ff56>] ? trace_hardirqs_on_caller+0x16/0x590
[11009.908337]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908340]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908343]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908346]  [<ffffffff8155ea37>] ? __asan_register_globals+0x87/0xa0
[11009.908350]  [<ffffffff8144da7b>] do_init_module+0x1d0/0x5ad
[11009.908353]  [<ffffffff812f2626>] load_module+0x6666/0x9ba0
[11009.908356]  [<ffffffff812e9c90>] ? symbol_put_addr+0x50/0x50
[11009.908361]  [<ffffffffa1580037>] ? em28xx_dvb_init.part.3+0x5989/0x5cf4 [em28xx_dvb]
[11009.908366]  [<ffffffff812ebfc0>] ? module_frob_arch_sections+0x20/0x20
[11009.908369]  [<ffffffff815bc940>] ? open_exec+0x50/0x50
[11009.908374]  [<ffffffff811671bb>] ? ns_capable+0x5b/0xd0
[11009.908377]  [<ffffffff812f5e58>] SyS_finit_module+0x108/0x130
[11009.908379]  [<ffffffff812f5d50>] ? SyS_init_module+0x1f0/0x1f0
[11009.908383]  [<ffffffff81004044>] ? lockdep_sys_exit_thunk+0x12/0x14
[11009.908394]  [<ffffffff822e6936>] entry_SYSCALL_64_fastpath+0x16/0x76
[11009.908396] Memory state around the buggy address:
[11009.908398]  ffff8803bd78aa00: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908401]  ffff8803bd78aa80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908403] >ffff8803bd78ab00: fc fc fc fc fc fc fc fc 00 00 fc fc fc fc fc fc
[11009.908405]                                            ^
[11009.908407]  ffff8803bd78ab80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908409]  ffff8803bd78ac00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908411] ==================================================================

In order to avoid it, let's set the cached value of the firmware
name to NULL after freeing it. While here, return an error if
the memory allocation fails.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mnt: Add a per mount namespace limit on the number of mounts

commit d29216842a85c7970c536108e093963f02714498 upstream.

CAI Qian <caiqian@redhat.com> pointed out that the semantics
of shared subtrees make it possible to create an exponentially
increasing number of mounts in a mount namespace.

    mkdir /tmp/1 /tmp/2
    mount --make-rshared /
    for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done

Will create create 2^20 or 1048576 mounts, which is a practical problem
as some people have managed to hit this by accident.

As such CVE-2016-6213 was assigned.

Ian Kent <raven@themaw.net> described the situation for autofs users
as follows:

> The number of mounts for direct mount maps is usually not very large because of
> the way they are implemented, large direct mount maps can have performance
> problems. There can be anywhere from a few (likely case a few hundred) to less
> than 10000, plus mounts that have been triggered and not yet expired.
>
> Indirect mounts have one autofs mount at the root plus the number of mounts that
> have been triggered and not yet expired.
>
> The number of autofs indirect map entries can range from a few to the common
> case of several thousand and in rare cases up to between 30000 and 50000. I've
> not heard of people with maps larger than 50000 entries.
>
> The larger the number of map entries the greater the possibility for a large
> number of active mounts so it's not hard to expect cases of a 1000 or somewhat
> more active mounts.

So I am setting the default number of mounts allowed per mount
namespace at 100,000.  This is more than enough for any use case I
know of, but small enough to quickly stop an exponential increase
in mounts.  Which should be perfect to catch misconfigurations and
malfunctioning programs.

For anyone who needs a higher limit this can be changed by writing
to the new /proc/sys/fs/mount-max sysctl.

Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tipc: fix socket timer deadlock

commit f1d048f24e66ba85d3dabf3d076cefa5f2b546b0 upstream.

We sometimes observe a 'deadly embrace' type deadlock occurring
between mutually connected sockets on the same node. This happens
when the one-hour peer supervision timers happen to expire
simultaneously in both sockets.

The scenario is as follows:

CPU 1:                          CPU 2:
--------                        --------
tipc_sk_timeout(sk1)            tipc_sk_timeout(sk2)
  lock(sk1.slock)                 lock(sk2.slock)
  msg_create(probe)               msg_create(probe)
  unlock(sk1.slock)               unlock(sk2.slock)
  tipc_node_xmit_skb()            tipc_node_xmit_skb()
    tipc_node_xmit()                tipc_node_xmit()
      tipc_sk_rcv(sk2)                tipc_sk_rcv(sk1)
        lock(sk2.slock)                 lock((sk1.slock)
        filter_rcv()                    filter_rcv()
          tipc_sk_proto_rcv()             tipc_sk_proto_rcv()
            msg_create(probe_rsp)           msg_create(probe_rsp)
            tipc_sk_respond()               tipc_sk_respond()
              tipc_node_xmit_skb()            tipc_node_xmit_skb()
                tipc_node_xmit()                tipc_node_xmit()
                  tipc_sk_rcv(sk1)                tipc_sk_rcv(sk2)
                    lock((sk1.slock)                lock((sk2.slock)
                    ===> DEADLOCK                   ===> DEADLOCK

Further analysis reveals that there are three different locations in the
socket code where tipc_sk_respond() is called within the context of the
socket lock, with ensuing risk of similar deadlocks.

We now solve this by passing a buffer queue along with all upcalls where
sk_lock.slock may potentially be held. Response or rejected message
buffers are accumulated into this queue instead of being sent out
directly, and only sent once we know we are safely outside the slock
context.

Reported-by: GUNA <gbalasun@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tipc: fix random link resets while adding a second bearer

commit d2f394dc4816b7bd1b44981d83509f18f19c53f0 upstream.

In a dual bearer configuration, if the second tipc link becomes
active while the first link still has pending nametable "bulk"
updates, it randomly leads to reset of the second link.

When a link is established, the function named_distribute(),
fills the skb based on node mtu (allows room for TUNNEL_PROTOCOL)
with NAME_DISTRIBUTOR message for each PUBLICATION.
However, the function named_distribute() allocates the buffer by
increasing the node mtu by INT_H_SIZE (to insert NAME_DISTRIBUTOR).
This consumes the space allocated for TUNNEL_PROTOCOL.

When establishing the second link, the link shall tunnel all the
messages in the first link queue including the "bulk" update.
As size of the NAME_DISTRIBUTOR messages while tunnelling, exceeds
the link mtu the transmission fails (-EMSGSIZE).

Thus, the synch point based on the message count of the tunnel
packets is never reached leading to link timeout.

In this commit, we adjust the size of name distributor message so that
they can be tunnelled.

Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gfs2: avoid uninitialized variable warning

commit 67893f12e5374bbcaaffbc6e570acbc2714ea884 upstream.

We get a bogus warning about a potential uninitialized variable
use in gfs2, because the compiler does not figure out that we
never use the leaf number if get_leaf_nr() returns an error:

fs/gfs2/dir.c: In function 'get_first_leaf':
fs/gfs2/dir.c:802:9: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]
fs/gfs2/dir.c: In function 'dir_split_leaf':
fs/gfs2/dir.c:1021:8: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]

Changing the 'if (!error)' to 'if (!IS_ERR_VALUE(error))' is
sufficient to let gcc understand that this is exactly the same
condition as in IS_ERR() so it can optimize the code path enough
to understand it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hostap: avoid uninitialized variable use in hfa384x_get_rid

commit 48dc5fb3ba53b20418de8514700f63d88c5de3a3 upstream.

The driver reads a value from hfa384x_from_bap(), which may fail,
and then assigns the value to a local variable. gcc detects that
in in the failure case, the 'rlen' variable now contains
uninitialized data:

In file included from ../drivers/net/wireless/intersil/hostap/hostap_pci.c:220:0:
drivers/net/wireless/intersil/hostap/hostap_hw.c: In function 'hfa384x_get_rid':
drivers/net/wireless/intersil/hostap/hostap_hw.c:842:5: warning: 'rec' may be used uninitialized in this function [-Wmaybe-uninitialized]
if (le16_to_cpu(rec.len) == 0) {

This restructures the function as suggested by Russell King, to
make it more readable and get more reliable error handling, by
handling each failure mode using a goto.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tty: nozomi: avoid a harmless gcc warning

commit a4f642a8a3c2838ad09fe8313d45db46600e1478 upstream.

The nozomi wireless data driver has its own helper function to
transfer data from a FIFO, doing an extra byte swap on big-endian
architectures, presumably to bring the data back into byte-serial
order after readw() or readl() perform their implicit byteswap.

This helper function is used in the receive_data() function to
first read the length into a 32-bit variable, which causes
a compile-time warning:

drivers/tty/nozomi.c: In function 'receive_data':
drivers/tty/nozomi.c:857:9: warning: 'size' may be used uninitialized in this function [-Wmaybe-uninitialized]

The problem is that gcc is unsure whether the data was actually
read or not. We know that it is at this point, so we can replace
it with a single readl() to shut up that warning.

I am leaving the byteswap in there, to preserve the existing
behavior, even though this seems fishy: Reading the length of
the data into a cpu-endian variable should normally not use
a second byteswap on big-endian systems, unless the hardware
is aware of the CPU endianess.

There appears to be a lot more confusion about endianess in this
driver, so it probably has not worked on big-endian systems in
a long time, if ever, and I have no way to test it. It's well
possible that this driver has not been used by anyone in a while,
the last patch that looks like it was tested on the hardware is
from 2008.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tipc: correct error in node fsm

commit c4282ca76c5b81ed73ef4c5eb5c07ee397e51642 upstream.

commit 88e8ac7000dc ("tipc: reduce transmission rate of reset messages
when link is down") revealed a flaw in the node FSM, as defined in
the log of commit 66996b6c47ed ("tipc: extend node FSM").

We see the following scenario:
1: Node B receives a RESET message from node A before its link endpoint
   is fully up, i.e., the node FSM is in state SELF_UP_PEER_COMING. This
   event will not change the node FSM state, but the (distinct) link FSM
   will move to state RESETTING.
2: As an effect of the previous event, the local endpoint on B will
   declare node A lost, and post the event SELF_DOWN to the its node
   FSM. This moves the FSM state to SELF_DOWN_PEER_LEAVING, meaning
   that no messages will be accepted from A until it receives another
   RESET message that confirms that A's endpoint has been reset. This
   is  wasteful, since we know this as a fact already from the first
   received RESET, but worse is that the link instance's FSM has not
   wasted this information, but instead moved on to state ESTABLISHING,
   meaning that it repeatedly sends out ACTIVATE messages to the reset
   peer A.
3: Node A will receive one of the ACTIVATE messages, move its link FSM
   to state ESTABLISHED, and start repeatedly sending out STATE messages
   to node B.
4: Node B will consistently drop these messages, since it can only accept
   accept a RESET according to its node FSM.
5: After four lost STATE messages node A will reset its link and start
   repeatedly sending out RESET messages to B.
6: Because of the reduced send rate for RESET messages, it is very
   likely that A will receive an ACTIVATE (which is sent out at a much
   higher frequency) before it gets the chance to send a RESET, and A
   may hence quickly move back to state ESTABLISHED and continue sending
   out STATE messages, which will again be dropped by B.
7: GOTO 5.
8: After having repeated the cycle 5-7 a number of times, node A will
   by chance get in between with sending a RESET, and the situation is
   resolved.

Unfortunately, we have seen that it may take a substantial amount of
time before this vicious loop is broken, sometimes in the order of
minutes.

We correct this by making a small correction to the node FSM: When a
node in state SELF_UP_PEER_COMING receives a SELF_DOWN event, it now
moves directly back to state SELF_DOWN_PEER_DOWN, instead of as now
SELF_DOWN_PEER_LEAVING. This is logically consistent, since we don't
need to wait for RESET confirmation from of an endpoint that we alread
know has been reset. It also means that node B in the scenario above
will not be dropping incoming STATE messages, and the link can come up
immediately.

Finally, a symmetry comparison reveals that the  FSM has a similar
error when receiving the event PEER_DOWN in state PEER_UP_SELF_COMING.
Instead of moving to PERR_DOWN_SELF_LEAVING, it should move directly
to SELF_DOWN_PEER_DOWN. Although we have never seen any negative effect
of this logical error, we choose fix this one, too.

The node FSM looks as follows after those changes:

                           +----------------------------------------+
                           |                           PEER_DOWN_EVT|
                           |                                        |
  +------------------------+----------------+                       |
  |SELF_DOWN_EVT           |                |                       |
  |                        |                |                       |
  |              +-----------+          +-----------+               |
  |              |NODE_      |          |NODE_      |               |
  |   +----------|FAILINGOVER|<---------|SYNCHING   |-----------+   |
  |   |SELF_     +-----------+ FAILOVER_+-----------+   PEER_   |   |
  |   |DOWN_EVT   |          A BEGIN_EVT  A         |   DOWN_EVT|   |
  |   |           |          |            |         |           |   |
  |   |           |          |            |         |           |   |
  |   |           |FAILOVER_ |FAILOVER_   |SYNCH_   |SYNCH_     |   |
  |   |           |END_EVT   |BEGIN_EVT   |BEGIN_EVT|END_EVT    |   |
  |   |           |          |            |         |           |   |
  |   |           |          |            |         |           |   |
  |   |           |         +--------------+        |           |   |
  |   |           +-------->|   SELF_UP_   |<-------+           |   |
  |   |   +-----------------|   PEER_UP    |----------------+   |   |
  |   |   |SELF_DOWN_EVT    +--------------+   PEER_DOWN_EVT|   |   |
  |   |   |                    A        A                   |   |   |
  |   |   |                    |        |                   |   |   |
  |   |   |         PEER_UP_EVT|        |SELF_UP_EVT        |   |   |
  |   |   |                    |        |                   |   |   |
  V   V   V                    |        |                   V   V   V
+------------+       +-----------+    +-----------+       +------------+
|SELF_DOWN_  |       |SELF_UP_   |    |PEER_UP_   |       |PEER_DOWN   |
|PEER_LEAVING|       |PEER_COMING|    |SELF_COMING|       |SELF_LEAVING|
+------------+       +-----------+    +-----------+       +------------+
       |               |       A        A       |                |
       |               |       |        |       |                |
       |       SELF_   |       |SELF_   |PEER_  |PEER_           |
       |       DOWN_EVT|       |UP_EVT  |UP_EVT |DOWN_EVT        |
       |               |       |        |       |                |
       |               |       |        |       |                |
       |               |    +--------------+    |                |
       |PEER_DOWN_EVT  +--->|  SELF_DOWN_  |<---+   SELF_DOWN_EVT|
       +------------------->|  PEER_DOWN   |<--------------------+
                            +--------------+

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tipc: re-enable compensation for socket receive buffer double counting

commit 7c8bcfb1255fe9d929c227d67bdcd84430fd200b upstream.

In the refactoring commit d570d86497ee ("tipc: enqueue arrived buffers
in socket in separate function") we did by accident replace the test

if (sk->sk_backlog.len == 0)
atomic_set(&tsk->dupl_rcvcnt, 0);

with

if (sk->sk_backlog.len)
atomic_set(&tsk->dupl_rcvcnt, 0);

This effectively disables the compensation we have for the double
receive buffer accounting that occurs temporarily when buffers are
moved from the backlog to the socket receive queue. Until now, this
has gone unnoticed because of the large receive buffer limits we are
applying, but becomes indispensable when we reduce this buffer limit
later in this series.

We now fix this by inverting the mentioned condition.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tipc: make dist queue pernet

commit 541726abe7daca64390c2ec34e6a203145f1686d upstream.

Nametable updates received from the network that cannot be applied
immediately are placed on a defer queue. This queue is global to the
TIPC module, which might cause problems when using TIPC in containers.
To prevent nametable updates from escaping into the wrong namespace,
we make the queue pernet instead.

Signed-off-by: Erik Hugne <erik.hugne@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tipc: make sure IPv6 header fits in skb headroom

commit 9bd160bfa27fa41927dbbce7ee0ea779700e09ef upstream.

Expand headroom further in order to be able to fit the larger IPv6
header. Prior to this patch this caused a skb under panic for certain
tipc packets when using IPv6 UDP bearer(s).

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android

Merge tag 'v4.4.64' into linux-linaro-lsk-v4.4

This is the 4.4.64 stable release

Linux 4.4.64

tipc: fix crash during node removal

commit d25a01257e422a4bdeb426f69529d57c73b235fe upstream.

When the TIPC module is unloaded, we have identified a race condition
that allows a node reference counter to go to zero and the node instance
being freed before the node timer is finished with accessing it. This
leads to occasional crashes, especially in multi-namespace environments.

The scenario goes as follows:

CPU0:(node_stop)                       CPU1:(node_timeout)  // ref == 2

1:                                          if(!mod_timer())
2: if (del_timer())
3:   tipc_node_put()                                        // ref -> 1
4: tipc_node_put()                                          // ref -> 0
5:   kfree_rcu(node);
6:                                               tipc_node_get(node)
7:                                               // BOOM!

We now clean up this functionality as follows:

1) We remove the node pointer from the node lookup table before we
   attempt deactivating the timer. This way, we reduce the risk that
   tipc_node_find() may obtain a valid pointer to an instance marked
   for deletion; a harmless but undesirable situation.

2) We use del_timer_sync() instead of del_timer() to safely deactivate
   the node timer without any risk that it might be reactivated by the
   timeout handler. There is no risk of deadlock here, since the two
   functions never touch the same spinlocks.

3: We remove a pointless tipc_node_get() + tipc_node_put() from the
   timeout handler.

Reported-by: Zhijiang Hu <huzhijiang@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

block: fix del_gendisk() vs blkdev_ioctl crash

commit ac34f15e0c6d2fd58480052b6985f6991fb53bcc upstream.

When tearing down a block device early in its lifetime, userspace may
still be performing discovery actions like blkdev_ioctl() to re-read
partitions.

The nvdimm_revalidate_disk() implementation depends on
disk->driverfs_dev to be valid at entry.  However, it is set to NULL in
del_gendisk() and fatally this is happening *before* the disk device is
deleted from userspace view.

There's no reason for del_gendisk() to clear ->driverfs_dev.  That
device is the parent of the disk.  It is guaranteed to not be freed
until the disk, as a child, drops its ->parent reference.

We could also fix this issue locally in nvdimm_revalidate_disk() by
using disk_to_dev(disk)->parent, but lets fix it globally since
->driverfs_dev follows the lifetime of the parent.  Longer term we
should probably just add a @parent parameter to add_disk(), and stop
carrying this pointer in the gendisk.

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffffa00340a8>] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm]
CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G           O    4.4.0-rc5 #2257
[..]
Call Trace:
  [<ffffffff8143e5c7>] rescan_partitions+0x87/0x2c0
  [<ffffffff810f37f9>] ? __lock_is_held+0x49/0x70
  [<ffffffff81438c62>] __blkdev_reread_part+0x72/0xb0
  [<ffffffff81438cc5>] blkdev_reread_part+0x25/0x40
  [<ffffffff8143982d>] blkdev_ioctl+0x4fd/0x9c0
  [<ffffffff811246c9>] ? current_kernel_time64+0x69/0xd0
  [<ffffffff812916dd>] block_ioctl+0x3d/0x50
  [<ffffffff81264c38>] do_vfs_ioctl+0x308/0x560
  [<ffffffff8115dbd1>] ? __audit_syscall_entry+0xb1/0x100
  [<ffffffff810031d6>] ? do_audit_syscall_entry+0x66/0x70
  [<ffffffff81264f09>] SyS_ioctl+0x79/0x90
  [<ffffffff81902672>] entry_SYSCALL_64_fastpath+0x12/0x76

Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Reported-by: Robert Hu <robert.hu@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions

commit 11e63f6d920d6f2dfd3cd421e939a4aec9a58dcd upstream.

Before we rework the "pmem api" to stop abusing __copy_user_nocache()
for memcpy_to_pmem() we need to fix cases where we may strand dirty data
in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
for arbitrary data transfers from userspace. There is no guarantee that
these transfers, performed by dax_iomap_actor(), will have aligned
destinations or aligned transfer lengths. Backstop the usage
__copy_user_nocache() with explicit cache management in these unaligned
cases.

Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
that is saved for a later patch that moves the entirety of the "pmem
api" into the pmem driver directly.

Fixes: 5de490daec8b ("pmem: add copy_from_iter_pmem() and clear_pmem()")
Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hv: don't reset hv_context.tsc_page on crash

commit 56ef6718a1d8d77745033c5291e025ce18504159 upstream.

It may happen that secondary CPUs are still alive and resetting
hv_context.tsc_page will cause a consequent crash in read_hv_clock_tsc()
as we don't check for it being not NULL there. It is safe as we're not
freeing this page anyways.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Drivers: hv: balloon: account for gaps in hot add regions

commit cb7a5724c7e1bfb5766ad1c3beba14cc715991cf upstream.

I'm observing the following hot add requests from the WS2012 host:

hot_add_req: start_pfn = 0x108200 count = 330752
hot_add_req: start_pfn = 0x158e00 count = 193536
hot_add_req: start_pfn = 0x188400 count = 239616

As the host doesn't specify hot add regions we're trying to create
128Mb-aligned region covering the first request, we create the 0x108000 -
0x160000 region and we add 0x108000 - 0x158e00 memory. The second request
passes the pfn_covered() check, we enlarge the region to 0x108000 -
0x190000 and add 0x158e00 - 0x188200 memory. The problem emerges with the
third request as it starts at 0x188400 so there is a 0x200 gap which is
not covered. As the end of our region is 0x190000 now it again passes the
pfn_covered() check were we just adjust the covered_end_pfn and make it
0x188400 instead of 0x188200 which means that we'll try to online
0x188200-0x188400 pages but these pages were never assigned to us and we
crash.

We can't react to such requests by creating new hot add regions as it may
happen that the whole suggested range falls into the previously identified
128Mb-aligned area so we'll end up adding nothing or create intersecting
regions and our current logic doesn't allow that. Instead, create a list of
such 'gaps' and check for them in the page online callback.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Drivers: hv: balloon: keep track of where ha_region starts

commit 7cf3b79ec85ee1a5bbaaf936bb1d050dc652983b upstream.

Windows 2012 (non-R2) does not specify hot add region in hot add requests
and the logic in hot_add_req() is trying to find a 128Mb-aligned region
covering the request. It may also happen that host's requests are not 128Mb
aligned and the created ha_region will start before the first specified
PFN. We can't online these non-present pages but we don't remember the real
start of the region.

This is a regression introduced by the commit 5abbbb75d733 ("Drivers: hv:
hv_balloon: don't lose memory when onlining order is not natural"). While
the idea of keeping the 'moving window' was wrong (as there is no guarantee
that hot add requests come ordered) we should still keep track of
covered_start_pfn. This is not a revert, the logic is different.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Tools: hv: kvp: ensure kvp device fd is closed on exec

commit 26840437cbd6d3625ea6ab34e17cd34bb810c861 upstream.

KVP daemon does fork()/exec() (with popen()) so we need to close our fds
to avoid sharing them with child processes. The immediate implication of
not doing so I see is SELinux complaining about 'ip' trying to access
'/dev/vmbus/hv_kvp'.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

commit 8b3405e345b5a098101b0c31b264c812bba045d9 upstream.

In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
unmap_stage2_range() on the entire memory range for the guest. This could
cause problems with other callers (e.g, munmap on a memslot) trying to
unmap a range. And since we have to unmap the entire Guest memory range
holding a spinlock, make sure we yield the lock if necessary, after we
unmap each PUD range.

Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Cc: Paolo Bonzini <pbonzin@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[ Avoid vCPU starvation and lockup detector warnings ]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>