Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

Merge branch 'inet-frag-fixes'

Florian Westphal says:

====================
inet: ip defrag bug fixes

Johan Schuijt and Frank Schreuder reported crash and softlockup after the
inet workqueue eviction change:

general protection fault: 0000 [#1] SMP
CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.18.18-transip-1.5 #1
Workqueue: events inet_frag_worker
task: ffff880224935130 ti: ffff880224938000 task.ti: ffff880224938000
RIP: 0010:[<ffffffff8149288c>] [<ffffffff8149288c>] inet_evict_bucket+0xfc/0x160
RSP: 0018:ffff88022493bd58 EFLAGS: 00010286
RAX: ffff88021f4f3e80 RBX: dead000000100100 RCX: 000000000000006b
RDX: 000000000000006c RSI: ffff88021f4f3e80 RDI: dead0000001000a8
RBP: 0000000000000002 R08: ffff880222273900 R09: ffff880036e49200
R10: ffff8800c6e86500 R11: ffff880036f45500 R12: ffffffff81a87100
R13: ffff88022493bd70 R14: 0000000000000000 R15: ffff8800c9b26280
[..]
Call Trace:
[<ffffffff814929e0>] ? inet_frag_worker+0x60/0x210
[<ffffffff8107e3a2>] ? process_one_work+0x142/0x3b0
[<ffffffff8107eb94>] ? worker_thread+0x114/0x440
[..]

A second issue results in softlockup since the evictor may restart the
eviction loop for a (potentially) unlimited number of times while local
softirqs are disabled.

Frank reports that test system remained stable for 14 hours of testing
(before, crash occured within half an hour in their setup).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

inet: frags: remove INET_FRAG_EVICTED and use list_evictor for the test

We can simply remove the INET_FRAG_EVICTED flag to avoid all the flags
race conditions with the evictor and use a participation test for the
evictor list, when we're at that point (after inet_frag_kill) in the
timer there're 2 possible cases:

1. The evictor added the entry to its evictor list while the timer was
waiting for the chainlock
or
2. The timer unchained the entry and the evictor won't see it

In both cases we should be able to see list_evictor correctly due
to the sync on the chainlock.

Joint work with Florian Westphal.

Tested-by: Frank Schreuder <fschreuder@transip.nl>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: frag: don't wait for timer deletion when evicting

Frank reports 'NMI watchdog: BUG: soft lockup' errors when
load is high. Instead of (potentially) unbounded restarts of the
eviction process, just skip to the next entry.

One caveat is that, when a netns is exiting, a timer may still be running
by the time inet_evict_bucket returns.

We use the frag memory accounting to wait for outstanding timers,
so that when we free the percpu counter we can be sure no running
timer will trip over it.

Reported-and-tested-by: Frank Schreuder <fschreuder@transip.nl>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: frag: change *_frag_mem_limit functions to take netns_frags as argument

Followup patch will call it after inet_frag_queue was freed, so q->net
doesn't work anymore (but netf = q->net; free(q); mem_limit(netf) would).

Tested-by: Frank Schreuder <fschreuder@transip.nl>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: frag: don't re-use chainlist for evictor

commit 65ba1f1ec0eff ("inet: frags: fix a race between inet_evict_bucket
and inet_frag_kill") describes the bug, but the fix doesn't work reliably.

Problem is that ->flags member can be set on other cpu without chainlock
being held by that task, i.e. the RMW-Cycle can clear INET_FRAG_EVICTED
bit after we put the element on the evictor private list.

We can crash when walking the 'private' evictor list since an element can
be deleted from list underneath the evictor.

Join work with Nikolay Alexandrov.

Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue")
Reported-by: Johan Schuijt <johan@transip.nl>
Tested-by: Frank Schreuder <fschreuder@transip.nl>
Signed-off-by: Nikolay Alexandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: stop spamming klog with rfc6458, 5.3.2. deprecation warnings

Back then when we added support for SCTP_SNDINFO/SCTP_RCVINFO from
RFC6458 5.3.4/5.3.5, we decided to add a deprecation warning for the
(as per RFC deprecated) SCTP_SNDRCV via commit bbbea41d5e53 ("net:
sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support"), see [1].

Imho, it was not a good idea, and we should just revert that message
for a couple of reasons:

  1) It's uapi and therefore set in stone forever.

  2) To be able to run on older and newer kernels, an SCTP application
     would need to probe for both, SCTP_SNDRCV, but also SCTP_SNDINFO/
     SCTP_RCVINFO support, so that on older kernels, it can make use
     of SCTP_SNDRCV, and on newer kernels SCTP_SNDINFO/SCTP_RCVINFO.
     In my (limited) experience, a lot of SCTP appliances are migrating
     to newer kernels only ve(ee)ry slowly.

  3) Some people don't have the chance to change their applications,
     f.e. due to proprietary legacy stuff. So, they'll hit this warning
     in fast path and are stuck with older kernels.

But i.e. due to point 1) I really fail to see the benefit of a warning.
So just revert that for now, the issue was reported up Jamal.

  [1] http://thread.gmane.org/gmane.linux.network/321960/

Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Michael Tuexen <tuexen@fh-muenster.de>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx4-fixes'

Or Gerlitz says:

====================
mlx4 driver fixes, July 22nd 2015

Just few mlx4 fixes.. the fix related to propagating port state
changes to VF should go to -stable of >= 3.11, all the rest just
to 4.2-rc
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Remove BUG_ON assert when checking if ring is full

In mlx4_en_is_ring_empty we check if ring surpassed its size.
Since the prod and cons indicators are u32, there might be a state where
prod wrapped around and cons, making this assert false, although no
actual bug exists (other code segment can cope with this state).

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Relieve cpu load average on the port sending flow

When a port is not attached, the FW requires a longer than usual time to
execute the SENSE_PORT command. In the command flow, the
wait_for_completion_timeout call used in mlx4_cmd_wait puts the kernel
thread into the uninterruptible state during this time. This, in turn,
due to the computation method, causes the CPU load average to increase.

Fix this by using wait_for_completion_interruptible_timeout() for the
SENSE_PORT command, which puts the thread in the interruptible state.
In this state, the thread does not contribute to the CPU load average.

Treat the interrupted case as if the SENSE_PORT command returned
port_type = NONE.

Fix suggested by Gideon Naim <gideonn@mellanox.com> and
Bart Van Assche <bart.vanassche@sandisk.com>.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Fix wrong index in propagating port change event to VFs

The port-change event processing in procedure mlx4_eq_int() uses "slave"
as the vf_oper array index. Since the value of "slave" is the PF function
index, the result is that the PF link state is used for deciding to
propagate the event for all the VFs. The VF link state should be used,
so the VF function index should be used here.

Fixes: 948e306d7d64 ('net/mlx4: Add VF link state support')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Use sink counter for the VF default as fallback

Some old PF drivers don't let VFs allocate counters, in that case, use
the sink counter so the VF can load and operate properly.

Fixes: 6de5f7f6a1fa ('net/mlx4_core: Allocate default counter per port')
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: netlink: fix slave_changelink/br_setport race conditions

Since slave_changelink support was added there have been a few race
conditions when using br_setport() since some of the port functions it
uses require the bridge lock. It is very easy to trigger a lockup due to
some internal spin_lock() usage without bh disabled, also it's possible to
get the bridge into an inconsistent state.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 3ac636b8591c ("bridge: implement rtnl_link_ops->slave_changelink")
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains ten Netfilter/IPVS fixes, they are:

1) Address refcount leak when creating an expectation from the ctnetlink
   interface.

2) Fix bug splat in the IDLETIMER target related to sysfs, from Dmitry
   Torokhov.

3) Resolve panic for unreachable route in IPVS with locally generated
   traffic in the output path, from Alex Gartrell.

4) Fix wrong source address in rare cases for tunneled traffic in IPVS,
   from Julian Anastasov.

5) Fix crash if scheduler is changed via ipvsadm -E, again from Julian.

6) Make sure skb->sk is unset for forwarded traffic through IPVS, again from
   Alex Gartrell.

7) Fix crash with IPVS sync protocol v0 and FTP, from Julian.

8) Reset sender cpu for forwarded traffic in IPVS, also from Julian.

9) Allocate template conntracks through kmalloc() to resolve netns dependency
   problems with the conntrack kmem_cache.

10) Fix zones with expectations that clash using the same tuple, from Joe
    Stringer.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cgroup: net_cls: fix false-positive "suspicious RCU usage"

In dev_queue_xmit() net_cls protected with rcu-bh.

[  270.730026] ===============================
[  270.730029] [ INFO: suspicious RCU usage. ]
[  270.730033] 4.2.0-rc3+ #2 Not tainted
[  270.730036] -------------------------------
[  270.730040] include/linux/cgroup.h:353 suspicious rcu_dereference_check() usage!
[  270.730041] other info that might help us debug this:
[  270.730043] rcu_scheduler_active = 1, debug_locks = 1
[  270.730045] 2 locks held by dhclient/748:
[  270.730046]  #0:  (rcu_read_lock_bh){......}, at: [<ffffffff81682b70>] __dev_queue_xmit+0x50/0x960
[  270.730085]  #1:  (&qdisc_tx_lock){+.....}, at: [<ffffffff81682d60>] __dev_queue_xmit+0x240/0x960
[  270.730090] stack backtrace:
[  270.730096] CPU: 0 PID: 748 Comm: dhclient Not tainted 4.2.0-rc3+ #2
[  270.730098] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
[  270.730100]  0000000000000001 ffff8800bafeba58 ffffffff817ad487 0000000000000007
[  270.730103]  ffff880232a0a780 ffff8800bafeba88 ffffffff810ca4f2 ffff88022fb23e00
[  270.730105]  ffff880232a0a780 ffff8800bafebb68 ffff8800bafebb68 ffff8800bafebaa8
[  270.730108] Call Trace:
[  270.730121]  [<ffffffff817ad487>] dump_stack+0x4c/0x65
[  270.730148]  [<ffffffff810ca4f2>] lockdep_rcu_suspicious+0xe2/0x120
[  270.730153]  [<ffffffff816a62d2>] task_cls_state+0x92/0xa0
[  270.730158]  [<ffffffffa00b534f>] cls_cgroup_classify+0x4f/0x120 [cls_cgroup]
[  270.730164]  [<ffffffff816aac74>] tc_classify_compat+0x74/0xc0
[  270.730166]  [<ffffffff816ab573>] tc_classify+0x33/0x90
[  270.730170]  [<ffffffffa00bcb0a>] htb_enqueue+0xaa/0x4a0 [sch_htb]
[  270.730172]  [<ffffffff81682e26>] __dev_queue_xmit+0x306/0x960
[  270.730174]  [<ffffffff81682b70>] ? __dev_queue_xmit+0x50/0x960
[  270.730176]  [<ffffffff816834a3>] dev_queue_xmit_sk+0x13/0x20
[  270.730185]  [<ffffffff81787770>] dev_queue_xmit+0x10/0x20
[  270.730187]  [<ffffffff8178b91c>] packet_snd.isra.62+0x54c/0x760
[  270.730190]  [<ffffffff8178be25>] packet_sendmsg+0x2f5/0x3f0
[  270.730203]  [<ffffffff81665245>] ? sock_def_readable+0x5/0x190
[  270.730210]  [<ffffffff817b64bb>] ? _raw_spin_unlock+0x2b/0x40
[  270.730216]  [<ffffffff8173bcbc>] ? unix_dgram_sendmsg+0x5cc/0x640
[  270.730219]  [<ffffffff8165f367>] sock_sendmsg+0x47/0x50
[  270.730221]  [<ffffffff8165f42f>] sock_write_iter+0x7f/0xd0
[  270.730232]  [<ffffffff811fd4c7>] __vfs_write+0xa7/0xf0
[  270.730234]  [<ffffffff811fe5b8>] vfs_write+0xb8/0x190
[  270.730236]  [<ffffffff811fe8c2>] SyS_write+0x52/0xb0
[  270.730239]  [<ffffffff817b6bae>] entry_SYSCALL_64_fastpath+0x12/0x76

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_choke: drop all packets in queue during reset

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_plug: purge buffered packets during reset

Otherwise the skbuff related structures are not correctly
refcount'ed.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fib_select_default-fixes'

Julian Anastasov says:

====================
ipv4: fib_select_default changes

This patchset contains 2 changes for the alternative routes,
one to add tb_id/fa_slen check needed after the recent
fib_trie optimizations for fib aliases and the second
change attempts to support alternative routes with TOS
requirement.

Sorry that I don't have access to the original
report from Hagen Paul Pfeifer. I hope he will see this
change.

The second change adds fa_default field to the
fib aliases (which can be many) and if the feature to
filter the alternative routes by TOS is not worth it,
this second patch can be scrapped.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: consider TOS in fib_select_default

fib_select_default considers alternative routes only when
res->fi is for the first alias in res->fa_head. In the
common case this can happen only when the initial lookup
matches the first alias with highest TOS value. This
prevents the alternative routes to require specific TOS.

This patch solves the problem as follows:

- routes that require specific TOS should be returned by
fib_select_default only when TOS matches, as already done
in fib_table_lookup. This rule implies that depending on the
TOS we can have many different lists of alternative gateways
and we have to keep the last used gateway (fa_default) in first
alias for the TOS instead of using single tb_default value.

- as the aliases are ordered by many keys (TOS desc,
fib_priority asc), we restrict the possible results to
routes with matching TOS and lowest metric (fib_priority)
and routes that match any TOS, again with lowest metric.

For example, packet with TOS 8 can not use gw3 (not lowest
metric), gw4 (different TOS) and gw6 (not lowest metric),
all other gateways can be used:

tos 8 via gw1 metric 2 <--- res->fa_head and res->fi
tos 8 via gw2 metric 2
tos 8 via gw3 metric 3
tos 4 via gw4
tos 0 via gw5
tos 0 via gw6 metric 1

Reported-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: fib_select_default should match the prefix

fib_trie starting from 4.1 can link fib aliases from
different prefixes in same list. Make sure the alternative
gateways are in same table and for same prefix (0) by
checking tb_id and fa_slen.

Fixes: 79e5ad2ceb00 ("fib_trie: Remove leaf_info")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Don't use shared bluetooth antenna in iwlwifi driver for management
    frames, from Emmanuel Grumbach.

2) Fix device ID check in ath9k driver, from Felix Fietkau.

3) Off by one in xen-netback BUG checks, from Dan Carpenter.

4) Fix IFLA_VF_PORT netlink attribute validation, from Daniel Borkmann.

5) Fix races in setting peeked bit flag in SKBs during datagram
    receive.  If it's shared we have to clone it otherwise the value can
    easily be corrupted.  Fix from Herbert Xu.

6) Revert fec clock handling change, causes regressions.  From Fabio
    Estevam.

7) Fix use after free in fq_codel and sfq packet schedulers, from WANG
    Cong.

8) ipvlan bug fixes (memory leaks, missing rcu_dereference_bh, etc.)
    from WANG Cong and Konstantin Khlebnikov.

9) Memory leak in act_bpf packet action, from Alexei Starovoitov.

10) ARM bpf JIT bug fixes from Nicolas Schichan.

11) Fix backwards compat of ANY_LAYOUT in virtio_net driver, from
    Michael S Tsirkin.

12) Destruction of bond with different ARP header types not handled
    correctly, fix from Nikolay Aleksandrov.

13) Revert GRO receive support in ipv6 SIT tunnel driver, causes
    regressions because the GRO packets created cannot be processed
    properly on the GSO side if we forward the frame.  From Herbert Xu.

14) TCCR update race and other fixes to ravb driver from Sergei
    Shtylyov.

15) Fix SKB leaks in caif_queue_rcv_skb(), from Eric Dumazet.

16) Fix panics on packet scheduler filter replace, from Daniel Borkmann.

17) Make sure AF_PACKET sees properly IP headers in defragmented frames
    (via PACKET_FANOUT_FLAG_DEFRAG option), from Edward Hyunkoo Jee.

18) AF_NETLINK cannot hold mutex in RCU callback, fix from Florian
    Westphal.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits)
  ravb: fix ring memory allocation
  net: phy: dp83867: Fix warning check for setting the internal delay
  openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
  netlink: don't hold mutex in rcu callback when releasing mmapd ring
  ARM: net: fix vlan access instructions in ARM JIT.
  ARM: net: handle negative offsets in BPF JIT.
  ARM: net: fix condition for load_order > 0 when translating load instructions.
  tcp: suppress a division by zero warning
  drivers: net: cpsw: remove tx event processing in rx napi poll
  inet: frags: fix defragmented packet's IP header for af_packet
  net: mvneta: fix refilling for Rx DMA buffers
  stmmac: fix setting of driver data in stmmac_dvr_probe
  sched: cls_flow: fix panic on filter replace
  sched: cls_flower: fix panic on filter replace
  sched: cls_bpf: fix panic on filter replace
  net/mdio: fix mdio_bus_match for c45 PHY
  net: ratelimit warnings about dst entry refcount underflow or overflow
  caif: fix leaks and race in caif_queue_rcv_skb()
  qmi_wwan: add the second QMI/network interface for Sierra Wireless MC7305/MC7355
  ravb: fix race updating TCCR
  ...

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull ARM64 fixes from Catalin Marinas:

- arm64 build fix following the move of the thread_struct to the end of
   task_struct and the asm offsets becoming too large for the AArch64
   ISA

- preparatory patch for moving irq_data struct members (applied now to
   reduce dependency for the next merging window)

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  ARM64/irq: Use access helper irq_data_get_affinity_mask()
  arm64: switch_to: calculate cpu context pointer using separate register

netfilter: nf_conntrack: Support expectations in different zones

When zones were originally introduced, the expectation functions were
all extended to perform lookup using the zone. However, insertion was
not modified to check the zone. This means that two expectations which
are intended to apply for different connections that have the same tuple
but exist in different zones cannot both be tracked.

Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for "conntrack zones")
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ARM64/irq: Use access helper irq_data_get_affinity_mask()

This is a preparatory patch for moving irq_data struct members.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: switch_to: calculate cpu context pointer using separate register

Commit 0c8c0f03e3a2 ("x86/fpu, sched: Dynamically allocate 'struct fpu'")
moved the thread_struct to the bottom of task_struct. As a result, the
offset is now too large to be used in an immediate add on arm64 with
some kernel configs:

arch/arm64/kernel/entry.S: Assembler messages:
arch/arm64/kernel/entry.S:588: Error: immediate out of range
arch/arm64/kernel/entry.S:597: Error: immediate out of range

This patch calculates the offset using an additional register instead of
an immediate offset.

Fixes: 0c8c0f03e3a2 ("x86/fpu, sched: Dynamically allocate 'struct fpu'")
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ravb: fix ring memory allocation

The driver is written as if it can adapt to a low memory situation  allocating
less RX  skbs and TX aligned buffers than the respective RX/TX ring sizes.  In
reality  though  the driver  would malfunction in this case. Stop being overly
smart and just fail in such situation -- this is achieved by moving the memory
allocation from ravb_ring_format() to ravb_ring_init().

We leave dma_map_single() calls in place but make their failure non-fatal
by marking the corresponding RX descriptors  with zero data size which should
prevent DMA to an invalid addresses.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: dp83867: Fix warning check for setting the internal delay

Fix warning: logical ‘or’ of collectively exhaustive tests is always true

Change the internal delay check from an 'or' condition to an 'and'
condition.

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes

Some architectures like POWER can have a NUMA node_possible_map that
contains sparse entries. This causes memory corruption with openvswitch
since it allocates flow_cache with a multiple of num_possible_nodes() and
assumes the node variable returned by for_each_node will index into
flow->stats[node].

Use nr_node_ids to allocate a maximal sparse array instead of
num_possible_nodes().

The crash was noticed after 3af229f2 was applied as it changed the
node_possible_map to match node_online_map on boot.
Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861
Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: don't hold mutex in rcu callback when releasing mmapd ring

Kirill A. Shutemov says:

This simple test-case trigers few locking asserts in kernel:

int main(int argc, char **argv)
{
        unsigned int block_size = 16 * 4096;
        struct nl_mmap_req req = {
                .nm_block_size          = block_size,
                .nm_block_nr            = 64,
                .nm_frame_size          = 16384,
                .nm_frame_nr            = 64 * block_size / 16384,
        };
        unsigned int ring_size;
int fd;

fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
        if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
                exit(1);
        if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
                exit(1);

ring_size = req.nm_block_nr * req.nm_block_size;
mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
return 0;
}

+++ exited with 0 +++
BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
3 locks held by init/1:
#0:  (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
#1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
#2:  (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20

CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102
0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002
ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98
Call Trace:
<IRQ>  [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
[<ffffffff81085a9d>] ___might_sleep+0x16d/0x270
[<ffffffff81085bed>] __might_sleep+0x4d/0x90
[<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430
[<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
[<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
[<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
[<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
[<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
[<ffffffff817e484d>] __sk_free+0x1d/0x160
[<ffffffff817e49a9>] sk_free+0x19/0x20
[..]

Cong Wang says:

We can't hold mutex lock in a rcu callback, [..]

Thomas Graf says:

The socket should be dead at this point. It might be simpler to
add a netlink_release_ring() function which doesn't require
locking at all.

Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Diagnosed-by: Cong Wang <cwang@twopensource.com>
Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'arm-bpf-fixes'

Nicolas Schichan says:

====================
BPF JIT fixes for ARM

These patches are fixing bugs in the ARM JIT and should probably find
their way to a stable kernel. All 60 test_bpf tests in Linux 4.1 release
are now passing OK (was 54 out of 60 before).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: net: fix vlan access instructions in ARM JIT.

This makes BPF_ANC | SKF_AD_VLAN_TAG and BPF_ANC | SKF_AD_VLAN_TAG_PRESENT
have the same behaviour as the in kernel VM and makes the test_bpf LD_VLAN_TAG
and LD_VLAN_TAG_PRESENT tests pass.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: net: handle negative offsets in BPF JIT.

Previously, the JIT would reject negative offsets known during code
generation and mishandle negative offsets provided at runtime.

Fix that by calling bpf_internal_load_pointer_neg_helper()
appropriately in the jit_get_skb_{b,h,w} slow path helpers and by forcing
the execution flow to the slow path helpers when the offset is
negative.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: net: fix condition for load_order > 0 when translating load instructions.

To check whether the load should take the fast path or not, the code
would check that (r_skb_hlen - load_order) is greater than the offset
of the access using an "Unsigned higher or same" condition. For
halfword accesses and an skb length of 1 at offset 0, that test is
valid, as we end up comparing 0xffffffff(-1) and 0, so the fast path
is taken and the filter allows the load to wrongly succeed. A similar
issue exists for word loads at offset 0 and an skb length of less than
4.

Fix that by using the condition "Signed greater than or equal"
condition for the fast path code for load orders greater than 0.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: suppress a division by zero warning

Andrew Morton reported following warning on one ARM build
with gcc-4.4 :

net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc':
net/ipv4/inet_hashtables.c:617: warning: division by zero

Even guarded with a test on sizeof(spinlock_t), compiler does not
like current construct on a !CONFIG_SMP build.

Remove the warning by using a temporary variable.

Fixes: 095dc8e0c368 ("tcp: fix/cleanup inet_ehash_locks_alloc()")
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()"

This reverts commit a2673b6e040663bf16a552f8619e6bde9f4b9acf.

Kinglong Mee reports a memory leak with that patch, and Jan Kara confirms:

"Thanks for report! You are right that my patch introduces a race
  between fsnotify kthread and fsnotify_destroy_group() which can result
  in leaking inotify event on group destruction.

  I haven't yet decided whether the right fix is not to queue events for
  dying notification group (as that is pointless anyway) or whether we
  should just fix the original problem differently...  Whenever I look
  at fsnotify code mark handling I get lost in the maze of locks, lists,
  and subtle differences between how different notification systems
  handle notification marks :( I'll think about it over night"

and after thinking about it, Jan says:

"OK, I have looked into the code some more and I found another
  relatively simple way of fixing the original oops.  It will be IMHO
  better than trying to fixup this issue which has more potential for
  breakage.  I'll ask Linus to revert the fsnotify fix he already merged
  and send a new fix"

Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Requested-by: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'wireless-drivers-for-davem-2015-07-20' of git://git./linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
ath9k:

* fix device ID check for AR956x

iwlwifi:

* bug fixes specific for 8000 series
* fix a crash in time events
* fix a crash in PCIe transport
* fix BT Coex code that prevented association on certain
devices (3160).
* revert the new RBD allocation model because it introduced
a bug when running on weak VM setups.
* new device IDs
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'pinctrl-v4.2-2' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
"Here are some overly ripe pin control fixes for the v4.2 series.

  They got delayed because of various crap commits and having to clean
  and rinse the patch stack a few times.  Now they are however looking
  good.

   - some dead defines dropped from the Samsung driver, was targeted for
     -rc2 but got delayed
   - drop the strict mode from abx500, this was too strict
   - fix the R-Car sparse IRQs code to work as intended
   - fix the IRQ code for the pinctrl-single GPIO backend to not enforce
     threaded IRQs
   - clear the latched events/IRQs for the Broadcom BCM2835 driver
   - fix up debugfs for the Freescale imx1 driver
   - fix a typo bug in the Schmitt Trigger setup in the LPC18xx driver"

* tag 'pinctrl-v4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: lpc18xx: fix schmitt trigger setup
  Subject: pinctrl: imx1-core: Fix debug output in .pin_config_set callback
  pinctrl: bcm2835: Clear the event latch register when disabling interrupts
  pinctrl: single: ensure pcs irq will not be forced threaded
  sh-pfc: fix sparse GPIOs for R-Car SoCs
  pinctrl: abx500: remove strict mode
  pinctrl: samsung: Remove old unused defines

Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs

Pull UDF fix from Jan Kara:
"A fix for UDF corruption when certain disk-format feature is enabled"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Don't corrupt unalloc spacetable when writing it

Merge tag 'trace-v4.2-rc2-fix2' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing sample code fix from Steven Rostedt:
"He Kuang noticed that the sample code using the trace_event helper
  function __get_dynamic_array_len() is broken.

  This only changes the sample code, and I'm pushing this now instead of
  later because I don't want others using the broken code as an example
  when using it for real"

* tag 'trace-v4.2-rc2-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix sample output of dynamic arrays

drivers: net: cpsw: remove tx event processing in rx napi poll

With commit c03abd84634d ("net: ethernet: cpsw: don't requests IRQs
we don't use") common isr and napi are separated into separate tx isr
and rx isr/napi, but still in rx napi tx events are handled. So removing
the tx event handling in rx napi.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: frags: fix defragmented packet's IP header for af_packet

When ip_frag_queue() computes positions, it assumes that the passed
sk_buff does not contain L2 headers.

However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
functions can be called on outgoing packets that contain L2 headers.

Also, IPv4 checksum is not corrected after reassembly.

Fixes: 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4 fanouts.")
Signed-off-by: Edward Hyunkoo Jee <edjee@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: fix refilling for Rx DMA buffers

With the actual code, if a memory allocation error happens while
refilling a Rx descriptor, then the original Rx buffer is both passed
to the networking stack (in a SKB) and let in the Rx ring. This leads
to various kernel oops and crashes.

As a fix, this patch moves Rx descriptor refilling ahead of building
SKB with the associated Rx buffer. In case of a memory allocation
failure, data is dropped and the original DMA buffer is put back into
the Rx ring.

Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Cc: <stable@vger.kernel.org> # v3.8+
Tested-by: Yoann Sculo <yoann@sculo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: fix setting of driver data in stmmac_dvr_probe

Commit 803f8fc46274b ("stmmac: move driver data setting into
stmmac_dvr_probe") mistakenly set priv and not priv->dev as
driver data. This meant that the remove, resume and suspend
callbacks that fetched and tried to use this data would most
likely explode. Fix the issue by using the correct variable.

Fixes: 803f8fc46274b ("stmmac: move driver data setting into stmmac_dvr_probe")
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sch_panic'

Daniel Borkmann says:

====================
Couple of classifier fixes

This fixes a couple of panics in the form of (analogous for
cls_flow{,er}):

[  912.759276] BUG: unable to handle kernel NULL pointer dereference at (null)
[  912.759373] IP: [<ffffffffa09d4d6d>] cls_bpf_change+0x23d/0x268 [cls_bpf]
[  912.759441] PGD 8783c067 PUD 5f684067 PMD 0
[  912.759491] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[  912.759543] Modules linked in: cls_bpf(E) act_gact [...]
[  912.772734] CPU: 3 PID: 10489 Comm: tc Tainted: G        W   E   4.2.0-rc2+ #73
[  912.775004] Hardware name: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, BIOS MBA51.88Z.00EF.B02.1211271028 11/27/2012
[  912.777327] task: ffff88025eaa8000 ti: ffff88005f734000 task.ti: ffff88005f734000
[  912.779662] RIP: 0010:[<ffffffffa09d4d6d>]  [<ffffffffa09d4d6d>] cls_bpf_change+0x23d/0x268 [cls_bpf]
[  912.781991] RSP: 0018:ffff88005f7379c8  EFLAGS: 00010286
[  912.784183] RAX: ffff880201d64e48 RBX: 0000000000000000 RCX: ffff880201d64e40
[  912.786402] RDX: 0000000000000000 RSI: ffffffffa09d51c0 RDI: ffffffffa09d51a6
[  912.788625] RBP: ffff88005f737a68 R08: 0000000000000000 R09: 0000000000000000
[  912.790854] R10: 0000000000000001 R11: 0000000000000001 R12: ffff880078ab5a80
[  912.793082] R13: ffff880232b31570 R14: ffff88005f737ae0 R15: ffff8801e215d1d0
[  912.795181] FS:  00007f3c0c80d740(0000) GS:ffff880265400000(0000) knlGS:0000000000000000
[  912.797281] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  912.799402] CR2: 0000000000000000 CR3: 000000005460f000 CR4: 00000000001407e0
[  912.799403] Stack:
[  912.799407]  ffffffff00000000 ffff88023ea18000 000000005f737a08 0000000000000000
[  912.799415]  ffffffff81f06140 ffff880201d64e40 0000000000000000 ffff88023ea1804c
[  912.799418]  0000000000000000 ffff88023ea18044 ffff88023ea18030 ffff88023ea18038
[  912.799418] Call Trace:
[  912.799437]  [<ffffffff816d5685>] tc_ctl_tfilter+0x335/0x910
[  912.799443]  [<ffffffff813622a8>] ? security_capable+0x48/0x60
[  912.799448]  [<ffffffff816b90e5>] rtnetlink_rcv_msg+0x95/0x240
[  912.799454]  [<ffffffff810f612d>] ? trace_hardirqs_on+0xd/0x10
[  912.799456]  [<ffffffff816b902f>] ? rtnetlink_rcv+0x1f/0x40
[  912.799459]  [<ffffffff816b902f>] ? rtnetlink_rcv+0x1f/0x40
[  912.799461]  [<ffffffff816b9050>] ? rtnetlink_rcv+0x40/0x40
[  912.799464]  [<ffffffff816df38f>] netlink_rcv_skb+0xaf/0xc0
[  912.799467]  [<ffffffff816b903e>] rtnetlink_rcv+0x2e/0x40
[  912.799469]  [<ffffffff816deaef>] netlink_unicast+0xef/0x1b0
[  912.799471]  [<ffffffff816defa0>] netlink_sendmsg+0x3f0/0x620
[  912.799476]  [<ffffffff81687028>] sock_sendmsg+0x38/0x50
[  912.799479]  [<ffffffff81687938>] ___sys_sendmsg+0x288/0x290
[  912.799482]  [<ffffffff810f7852>] ? __lock_acquire+0x572/0x2050
[  912.799488]  [<ffffffff810265db>] ? native_sched_clock+0x2b/0x90
[  912.799493]  [<ffffffff8116135f>] ? __audit_syscall_entry+0xaf/0x100
[  912.799497]  [<ffffffff8116135f>] ? __audit_syscall_entry+0xaf/0x100
[  912.799501]  [<ffffffff8112aa19>] ? current_kernel_time+0x69/0xd0
[  912.799505]  [<ffffffff81266f16>] ? __fget_light+0x66/0x90
[  912.799508]  [<ffffffff81688812>] __sys_sendmsg+0x42/0x80
[  912.799510]  [<ffffffff81688862>] SyS_sendmsg+0x12/0x20
[  912.799515]  [<ffffffff817f9a6e>] entry_SYSCALL_64_fastpath+0x12/0x76
[  912.799540] Code: 4d 88 49 8b 57 08 48 89 51 08 49 8b 57 10 48 89 c8 48 83 c0 08 48
                     89 51 10 48 8b 51 10 48 c7 c6 c0 51 9d a0 48 c7 c7 a6 51 9d a0 <48>
                     89 02 48 8b 51 08 48 89 42 08 48 b8 00 02 20 00 00 00 ad de
[  912.799544] RIP  [<ffffffffa09d4d6d>] cls_bpf_change+0x23d/0x268 [cls_bpf]
[  912.799544]  RSP <ffff88005f7379c8>
[  912.799545] CR2: 0000000000000000
[  912.807380] ---[ end trace a6440067cfdc7c29 ]---

I've split them into 3 patches, so they can be backported easier
when needed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sched: cls_flow: fix panic on filter replace

The following test case causes a NULL pointer dereference in cls_flow:

  tc filter add dev foo parent 1: handle 0x1 flow hash keys dst action ok
  tc filter replace dev foo parent 1: pref 49152 handle 0x1 \
            flow hash keys mark action drop

To be more precise, actually two different panics are fixed, the first
occurs because tcf_exts_init() is not called on the newly allocated
filter when we do a replace. And the second panic uncovered after that
happens since the arguments of list_replace_rcu() are swapped, the old
element needs to be the first argument and the new element the second.

Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sched: cls_flower: fix panic on filter replace

The following test case causes a NULL pointer dereference in cls_flower:

  tc filter add dev foo parent 1: flower eth_type ipv4 action ok flowid 1:1
  tc filter replace dev foo parent 1: pref 49152 handle 0x1 \
            flower eth_type ipv6 action ok flowid 1:1

The problem is that commit 77b9900ef53a ("tc: introduce Flower classifier")
accidentally swapped the arguments of list_replace_rcu(), the old
element needs to be the first argument and the new element the second.

Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

sched: cls_bpf: fix panic on filter replace

The following test case causes a NULL pointer dereference in cls_bpf:

  FOO="1,6 0 0 4294967295,"
  tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok
  tc filter replace dev foo parent 1: pref 49152 handle 0x1 \
            bpf bytecode "$FOO" flowid 1:1 action drop

The problem is that commit 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
accidentally swapped the arguments of list_replace_rcu(), the old
element needs to be the first argument and the new element the second.

Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mac80211-for-davem-2015-07-17' of git://git./linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Some fixes for the current cycle:

1. Arik introduced an rtnl-locked regulatory API to be able
    to differentiate between place do/don't have the RTNL;
    this fixes missing locking in some of the code paths

2. Two small mesh bugfixes from Bob, one to avoid treating
    a certain malformed over-the-air frame and one to avoid
    sending a garbage field over the air.

3. A fix for powersave during WoWLAN suspend from Krishna Chaitanya.

4. A fix for a powersave vs. aggregation teardown race, from Michal.

5. Thomas reduced the loglevel of CRDA messages to avoid spamming
    the kernel log with mostly irrelevant information.

6. Tom fixed a dangling debugfs directory pointer that could cause
    crashes if subsequent addition of the same interface to debugfs
    failed for some reason.

7. A fix from myself for a list corruption issue in mac80211 during
    combined interface shutdown/removal - shut down interfaces first
    and only then remove them to avoid that.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mdio: fix mdio_bus_match for c45 PHY

We store c45 PHY's id information in c45_ids, so it should be used to
check the matching between PHY driver and PHY device for c45 PHY.

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ratelimit warnings about dst entry refcount underflow or overflow

Kernel generates a lot of warnings when dst entry reference counter
overflows and becomes negative. That bug was seen several times at
machines with outdated 3.10.y kernels. Most like it's already fixed
in upstream. Anyway that flood completely kills machine and makes
further debugging impossible.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif: fix leaks and race in caif_queue_rcv_skb()

1) If sk_filter() is applied, skb was leaked (not freed)
2) Testing SOCK_DEAD twice is racy :
packet could be freed while already queued.
3) Remove obsolete comment about caching skb->len

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qmi_wwan: add the second QMI/network interface for Sierra Wireless MC7305/MC7355

Sierra Wireless MC7305/MC7355 with USB ID 1199:9041 also provide a
second QMI/network interface like the MC73xx with USB ID 1199:68c0 on
USB interface #10 when used in the appropriate USB configuration.
Add the corresponding QMI_FIXED_INTF entry to the qmi_wwan driver.

Please note that the second QMI/network interface is not working for
early MC73xx firmware versions like 01.08.x as the device does not
respond to QMI messages on the second /dev/cdc-wdm port.

Signed-off-by: Reinhard Speyerer <rspmn@arcor.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ravb: fix race updating TCCR

The TCCR.TSRQn bit may get clearead after TCCR gets read, so that TCCR write
would get skipped. We don't need to check this bit before setting.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: fix improper initialization in netcp_ndo_open()

The keystone qmss will raise interrupt when packet arrive at the
receive queue. Only control available to avoid interrupt from happening
is to keep the free descriptor queue (FDQ) empty in the receive side.
So the filling of descriptors into the FDQ has to happen after
request_irq() call is made as part of knav_queue_enable_notify(). So
move the function netcp_rxpool_refill() after this call.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: correct the MAC address for "follow" fail_over_mac policy

The "follow" fail_over_mac policy is useful for multiport devices that
either become confused or incur a performance penalty when multiple
ports are programmed with the same MAC address, but the same MAC
address still may happened by this steps for this policy:

1) echo +eth0 > /sys/class/net/bond0/bonding/slaves
   bond0 has the same mac address with eth0, it is MAC1.

2) echo +eth1 > /sys/class/net/bond0/bonding/slaves
   eth1 is backup, eth1 has MAC2.

3) ifconfig eth0 down
   eth1 became active slave, bond will swap MAC for eth0 and eth1,
   so eth1 has MAC1, and eth0 has MAC2.

4) ifconfig eth1 down
   there is no active slave, and eth1 still has MAC1, eth2 has MAC2.

5) ifconfig eth0 up
   the eth0 became active slave again, the bond set eth0 to MAC1.

Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same
MAC address, it will break this policy for ACTIVE_BACKUP mode.

This patch will fix this problem by finding the old active slave and
swap them MAC address before change active slave.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-fixes-for-4.2-20150716' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2015-07-16

this is a pull request of 2 patches by Stefan Agner. He fixes the resume
operation in the mcp251x driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "sit: Add gro callbacks to sit_offload"

This patch reverts 19424e052fb44da2f00d1a868cbb51f3e9f4bbb5 ("sit:
Add gro callbacks to sit_offload") because it generates packets
that cannot be handled even by our own GSO.

Reported-by: Wolfgang Walter <linux@stwm.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: bcm_sf2: do not use indirect reads and writes for 7445E0

7445E0 contains an ECO which disconnected the internal SF2 pseudo-PHY which was
known to conflict with the external pseudo-PHY of BCM53125 switches. This
motivated the need to utilize the internal SF2 MDIO controller via indirect
register reads/writes to control external Broadcom switches due to this address
conflict (both responded at address 30d).

For 7445E0, the internal pseudo-PHY of the SF2 switch got disconnected, and as
a consequence this prevents the internal SF2 MDIO bus controller from reading
data (reads back everything as 0) since the MDI line is tied low.

Fix this by making the indirect register reads and writes conditional to
7445D0, on 7445E0 we can utilize the SWITCH_MDIO controller (backed by
mdio-unimac and not the DSA created slave MII bus).

We utilize of_machine_is_compatible() here since this is the only way for use
to differentiate between these two chips in a way that does not violate layers
or becomes (too) vendor-specific.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: correctly handle bonding type change on enslave failure

If the bond is enslaving a device with different type it will be setup
by it, but if after being setup the enslave fails the bond doesn't
switch back its type and also keeps pointers to foreign structures that can
be long gone. Thus revert back any type changes if the enslave failed and
the bond had to change its type.
Example:
Before patch:
$ echo lo > bond0/bonding/slaves
-bash: echo: write error: Cannot assign requested address
$ ip l sh bond0
20: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
mode DEFAULT group default
    link/loopback 16:54:78:34:bd:41 brd 00:00:00:00:00:00
$ echo +eth1 > bond0/bonding/slaves
$ ip l sh bond0
20: bond0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether 52:54:00:3f:47:69 brd ff:ff:ff:ff:ff:ff
(notice the MASTER flag is gone)

After patch:
$ echo lo > bond0/bonding/slaves
-bash: echo: write error: Cannot assign requested address
$ ip l sh bond0
21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
mode DEFAULT group default qlen 1000
    link/ether 6e:66:94:f6:07:fc brd ff:ff:ff:ff:ff:ff
$ echo +eth1 > bond0/bonding/slaves
$ ip l sh bond0
21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
mode DEFAULT group default qlen 1000
    link/ether 52:54:00:3f:47:69 brd ff:ff:ff:ff:ff:ff

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: e36b9d16c6a6 ("bonding: clean muticast addresses when device changes type")
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: fix destruction of bond with devices different from arphrd_ether

When the bonding is being unloaded and the netdevice notifier is
unregistered it executes NETDEV_UNREGISTER for each device which should
remove the bond's proc entry but if the device enslaved is not of
ARPHRD_ETHER type and is in front of the bonding, it may execute
bond_release_and_destroy() first which would release the last slave and
destroy the bond device leaving the proc entry and thus we will get the
following error (with dynamic debug on for bond_netdev_event to see the
events order):
[  908.963051] eql: event: 9
[  908.963052] eql: IFF_SLAVE
[  908.963054] eql: event: 2
[  908.963056] eql: IFF_SLAVE
[  908.963058] eql: event: 6
[  908.963059] eql: IFF_SLAVE
[  908.963110] bond0: Releasing active interface eql
[  908.976168] bond0: Destroying bond bond0
[  908.976266] bond0 (unregistering): Released all slaves
[  908.984097] ------------[ cut here ]------------
[  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
remove_proc_entry+0x112/0x160()
[  908.984110] remove_proc_entry: removing non-empty directory
'net/bonding', leaking at least 'bond0'
[  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
usb_common [last unloaded: bonding]

[  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
4.2.0-rc2+ #8
[  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
ffff8800358dfda8
[  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
ffff88003e3a4280
[  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
ffffffff8172ebd0
[  908.984181] Call Trace:
[  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
[  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
[  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
[  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
[  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
[bonding]
[  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
[  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
[  908.984225]  [<ffffffff8142f52d>] ?
unregister_pernet_operations+0x8d/0xd0
[  908.984228]  [<ffffffff8142f58d>] ?
unregister_pernet_subsys+0x1d/0x30
[  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
[  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
[  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
[  908.984244]  [<ffffffff8152b732>] ?
entry_SYSCALL_64_fastpath+0x16/0x75
[  908.984247] ---[ end trace 7c006ed4abbef24b ]---

Thus remove the proc entry manually if bond_release_and_destroy() is
used. Because of the checks in bond_remove_proc_entry() it's not a
problem for a bond device to change namespaces (the bug fixed by the
Fixes commit) but since commit
f9399814927ad ("bonding: Don't allow bond devices to change network
namespaces.") that can't happen anyway.

Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from
                      the netdev events")
Tested-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: fix fid_mask when leaving bridge

The mv88e6xxx_priv_state structure contains an fid_mask, where 1 means
the FID is free to use, 0 means the FID is in use.

This patch fixes the bit clear in mv88e6xxx_leave_bridge() when
assigning a new FID to a port.

Example scenario: I have 7 ports, port 5 is CPU, port 6 is unused (no
PHY). After setting the ports 0, 1 and 2 in bridge br0, and ports 3 and
4 in bridge br1, I have the following fid_mask: 0b111110010110 (0xf96).

Indeed, br0 uses FID 0, and br1 uses FID 3.

After setting nomaster for port 0, I get the wrong fid_mask: 0b10 (0x2).

With this patch we correctly get 0b111110010100 (0xf94), meaning port 0
uses FID 1, br0 uses FID 0, and br1 uses FID 3.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: don't require ANY_LAYOUT with VERSION_1

ANY_LAYOUT is a compatibility feature. It's implied
for VERSION_1 devices, and non-transitional devices
might not offer it. Change code to behave accordingly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux

Pull s390 fix from Martin Schwidefsky:
"Fast path fix for the thread_struct breakage"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: adapt entry.S to the move of thread_struct

Merge branch 'for-linus' of git://git./linux/kernel/git/egtvedt/linux-avr32

Pull AVR32 update from Hans-Christian Egtvedt.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32:
AVR32/time: Migrate to new 'set-state' interface

Merge tag 'ipvs-fixes-for-v4.2' of https://git./linux/kernel/git/horms/ipvs

Simon Horman says:

====================
IPVS Fixes for v4.2

please consider this fix for v4.2.
For reasons that are not clear to me it is a bumper crop.

It seems to me that they are all relevant to stable.
Please let me know if you need my help to get the fixes into stable.

* ipvs: fix ipv6 route unreach panic

  This problem appears to be present since IPv6 support was added to
  IPVS in v2.6.28.

* ipvs: skb_orphan in case of forwarding

  This appears to resolve a problem resulting from a side effect of
  41063e9dd119 ("ipv4: Early TCP socket demux.") which was included in v3.6.

* ipvs: do not use random local source address for tunnels

  This appears to resolve a problem introduced by
  026ace060dfe ("ipvs: optimize dst usage for real server") in v3.10.

* ipvs: fix crash if scheduler is changed

  This appears to resolve a problem introduced by
  ceec4c381681 ("ipvs: convert services to rcu") in v3.10.

  Julian has provided backports of the fix:
  * [PATCHv2 3.10.81] ipvs: fix crash if scheduler is changed
    http://www.spinics.net/lists/lvs-devel/msg04008.html
  * [PATCHv2 3.12.44,3.14.45,3.18.16,4.0.6] ipvs: fix crash if scheduler is changed
    http://www.spinics.net/lists/lvs-devel/msg04007.html

  Please let me know how you would like to handle guiding these
  backports into stable.

* ipvs: fix crash with sync protocol v0 and FTP

  This appears to resolve a problem introduced by
  749c42b620a9 ("ipvs: reduce sync rate with time thresholds") in v3.5
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: fix netns dependencies with conntrack templates

Quoting Daniel Borkmann:

"When adding connection tracking template rules to a netns, f.e. to
configure netfilter zones, the kernel will endlessly busy-loop as soon
as we try to delete the given netns in case there's at least one
template present, which is problematic i.e. if there is such bravery that
the priviledged user inside the netns is assumed untrusted.

Minimal example:

  ip netns add foo
  ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
  ip netns del foo

What happens is that when nf_ct_iterate_cleanup() is being called from
nf_conntrack_cleanup_net_list() for a provided netns, we always end up
with a net->ct.count > 0 and thus jump back to i_see_dead_people. We
don't get a soft-lockup as we still have a schedule() point, but the
serving CPU spins on 100% from that point onwards.

Since templates are normally allocated with nf_conntrack_alloc(), we
also bump net->ct.count. The issue why they are not yet nf_ct_put() is
because the per netns .exit() handler from x_tables (which would eventually
invoke xt_CT's xt_ct_tg_destroy() that drops reference on info->ct) is
called in the dependency chain at a *later* point in time than the per
netns .exit() handler for the connection tracker.

This is clearly a chicken'n'egg problem: after the connection tracker
.exit() handler, we've teared down all the connection tracking
infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be
invoked at a later point in time during the netns cleanup, as that would
lead to a use-after-free. At the same time, we cannot make x_tables depend
on the connection tracker module, so that the xt_ct_tg_destroy() would
be invoked earlier in the cleanup chain."

Daniel confirms this has to do with the order in which modules are loaded or
having compiled nf_conntrack as modules while x_tables built-in. So we have no
guarantees regarding the order in which netns callbacks are executed.

Fix this by allocating the templates through kmalloc() from the respective
SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache.
Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch
is marked as unlikely since conntrack templates are rarely allocated and only
from the configuration plane path.

Note that templates are not kept in any list to avoid further dependencies with
nf_conntrack anymore, thus, the tmpl larval list is removed.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>

s390: adapt entry.S to the move of thread_struct

git commit 0c8c0f03e3a292e031596484275c14cf39c0ab7a
"x86/fpu, sched: Dynamically allocate 'struct fpu'"
moved the thread_struct to the end of the task_struct.

This causes some of the offsets used in entry.S to overflow their
instruction operand field. To fix this use aghi to create a
dedicated pointer for the thread_struct.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

AVR32/time: Migrate to new 'set-state' interface

Migrate avr32 driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.

This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.

We want to call cpu_idle_poll_ctrl() in shutdown only if we were in
oneshot or resume state earlier. Create another variable to save this
information and check that in shutdown callback.

Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>

pinctrl: lpc18xx: fix schmitt trigger setup

The param_val variable is what determines if schmitt
trigger is enabled on a pin or not. A typo here mean
that schmitt trigger was always enabled for standard
and i2c pins.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

Subject: pinctrl: imx1-core: Fix debug output in .pin_config_set callback

imx1_pinconf_set assumes that the array of pins in struct
imx1_pinctrl_soc_info can be indexed by pin id to get the
pinctrl_pin_desc for a pin. This used to be correct up to commit
607af165c047 which removed some entries from the array and so made it
wrong to access the array by pin id.

The result of this bug is a wrong pin name in the output for small pin
ids and an oops for the bigger ones.

This patch is the result of a discussion that includes patches by Markus
Pargmann and Chris Ruehl.

Fixes: 607af165c047 ("pinctrl: i.MX27: Remove nonexistent pad definitions")
Cc: stable@vger.kernel.org
Reported-by: Chris Ruehl <chris.ruehl@gtsys.com.hk>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

pinctrl: bcm2835: Clear the event latch register when disabling interrupts

It's possible to hit a race condition if interrupts are generated on a GPIO
pin when the IRQ line in question is being disabled.

If the interrupt is freed, bcm2835_gpio_irq_disable() is called which
disables the event generation sources (edge, level). If an event occurred
between the last disabling of hard IRQs and the write to the event
source registers, a bit would be set in the GPIO event detect register
(GPEDSn) which goes unacknowledged by bcm2835_gpio_irq_handler()
so Linux complains loudly.

There is no per-GPIO mask register, so when disabling GPIO interrupts
write 1 to the relevant bit in GPEDSn to clear out any stale events.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
Acked-by: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

pinctrl: single: ensure pcs irq will not be forced threaded

The PSC IRQ is requested using request_irq() API and as result it can
be forced to be threaded IRQ in RT-Kernel if PCS_QUIRK_HAS_SHARED_IRQ
is enabled for pinctrl domain.

As result, following 'possible irq lock inversion dependency' report
can be seen:
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
3.14.43-rt42-00360-g96ff499-dirty #24 Not tainted
---------------------------------------------------------
irq/369-pinctrl/927 just changed the state of lock:
(&pcs->lock){+.....}, at: [<c0375b54>] pcs_irq_handle+0x48/0x9c
but this lock was taken by another, HARDIRQ-safe lock in the past:
(&irq_desc_lock_class){-.....}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&pcs->lock);
                               local_irq_disable();
                               lock(&irq_desc_lock_class);
                               lock(&pcs->lock);
  <Interrupt>
    lock(&irq_desc_lock_class);

*** DEADLOCK ***

no locks held by irq/369-pinctrl/927.

the shortest dependencies between 2nd lock and 1st lock:
  -> (&irq_desc_lock_class){-.....} ops: 58724 {
     IN-HARDIRQ-W at:
                       [<c0090040>] lock_acquire+0x9c/0x158
                       [<c07065c8>] _raw_spin_lock+0x48/0x58
                       [<c009edac>] handle_fasteoi_irq+0x24/0x15c
                       [<c009abb0>] generic_handle_irq+0x3c/0x4c
                       [<c000f83c>] handle_IRQ+0x50/0xa0
                       [<c0008674>] gic_handle_irq+0x3c/0x6c
                       [<c0707a04>] __irq_svc+0x44/0x8c
                       [<c000fc44>] arch_cpu_idle+0x40/0x4c
                       [<c009aadc>] cpu_startup_entry+0x270/0x2e0
                       [<c06fcbf8>] rest_init+0xd4/0xe4
                       [<c0a44bfc>] start_kernel+0x3d0/0x3dc
                       [<80008084>] 0x80008084
     INITIAL USE at:
                      [<c0090040>] lock_acquire+0x9c/0x158
                      [<c070674c>] _raw_spin_lock_irqsave+0x54/0x68
                      [<c009aff8>] __irq_get_desc_lock+0x64/0xa4
                      [<c009e38c>] irq_set_chip+0x30/0x78
                      [<c009ec30>] irq_set_chip_and_handler_name+0x24/0x3c
                      [<c036ca10>] gic_irq_domain_map+0x48/0xb4
                      [<c00a0a80>] irq_domain_associate+0x84/0x1d4
                      [<c00a1154>] irq_create_mapping+0x80/0x11c
                      [<c00a1270>] irq_create_of_mapping+0x80/0x120
                      [<c05cdaa8>] irq_of_parse_and_map+0x34/0x3c
                      [<c0a4ea24>] omap_dm_timer_init_one+0x90/0x30c
                      [<c0a4eef0>] omap5_realtime_timer_init+0x8c/0x48c
                      [<c0a486b0>] time_init+0x28/0x38
                      [<c0a44a6c>] start_kernel+0x240/0x3dc
                      [<80008084>] 0x80008084
   }
   ... key      at: [<c1049ce0>] irq_desc_lock_class+0x0/0x8
   ... acquired at:
   [<c07065c8>] _raw_spin_lock+0x48/0x58
   [<c0375a90>] pcs_irq_unmask+0x58/0xa0
   [<c009ea48>] irq_enable+0x38/0x48
   [<c009ead0>] irq_startup+0x78/0x7c
   [<c009d440>] __setup_irq+0x4a8/0x4f4
   [<c009d5dc>] request_threaded_irq+0xb8/0x138
   [<c0415a5c>] omap_8250_startup+0x4c/0x148
   [<c041276c>] serial8250_startup+0x24/0x30
   [<c040d0ec>] uart_startup.part.9+0x5c/0x1b4
   [<c040dbcc>] uart_open+0xf4/0x16c
   [<c03f0540>] tty_open+0x170/0x61c
   [<c0157028>] chrdev_open+0xbc/0x1b4
   [<c0150494>] do_dentry_open+0x1e8/0x2bc
   [<c0150a84>] finish_open+0x44/0x5c
   [<c0160d50>] do_last.isra.47+0x710/0xca0
   [<c01613a4>] path_openat+0xc4/0x640
   [<c0162904>] do_filp_open+0x3c/0x98
   [<c0151bdc>] do_sys_open+0x114/0x1d8
   [<c0151cc8>] SyS_open+0x28/0x2c
   [<c0a44d70>] kernel_init_freeable+0x168/0x1e4
   [<c06fcc24>] kernel_init+0x1c/0xf8
   [<c000eee8>] ret_from_fork+0x14/0x20

-> (&pcs->lock){+.....} ops: 65 {
   HARDIRQ-ON-W at:
                    [<c0090040>] lock_acquire+0x9c/0x158
                    [<c07065c8>] _raw_spin_lock+0x48/0x58
                    [<c0375b54>] pcs_irq_handle+0x48/0x9c
                    [<c0375c5c>] pcs_irq_handler+0x1c/0x28
                    [<c009c458>] irq_forced_thread_fn+0x30/0x74
                    [<c009c784>] irq_thread+0x158/0x1c4
                    [<c0063fc4>] kthread+0xd4/0xe8
                    [<c000eee8>] ret_from_fork+0x14/0x20
   INITIAL USE at:
                   [<c0090040>] lock_acquire+0x9c/0x158
                   [<c070674c>] _raw_spin_lock_irqsave+0x54/0x68
                   [<c0375344>] pcs_enable+0x7c/0xe8
                   [<c0372a44>] pinmux_enable_setting+0x178/0x220
                   [<c036fecc>] pinctrl_select_state+0x110/0x194
                   [<c04732dc>] pinctrl_bind_pins+0x7c/0x108
                   [<c045853c>] driver_probe_device+0x70/0x254
                   [<c0458810>] __driver_attach+0x9c/0xa0
                   [<c045674c>] bus_for_each_dev+0x78/0xac
                   [<c0458030>] driver_attach+0x2c/0x30
                   [<c0457c78>] bus_add_driver+0x15c/0x204
                   [<c0458ee0>] driver_register+0x88/0x108
                   [<c045a168>] __platform_driver_register+0x64/0x6c
                   [<c0a8170c>] omap_hsmmc_driver_init+0x1c/0x20
                   [<c0008a94>] do_one_initcall+0x110/0x170
                   [<c0a44d48>] kernel_init_freeable+0x140/0x1e4
                   [<c06fcc24>] kernel_init+0x1c/0xf8
                   [<c000eee8>] ret_from_fork+0x14/0x20
}
... key      at: [<c1088a8c>] __key.18572+0x0/0x8
... acquired at:
   [<c008cdd4>] mark_lock+0x388/0x76c
   [<c008df40>] __lock_acquire+0x6d0/0x1f98
   [<c0090040>] lock_acquire+0x9c/0x158
   [<c07065c8>] _raw_spin_lock+0x48/0x58
   [<c0375b54>] pcs_irq_handle+0x48/0x9c
   [<c0375c5c>] pcs_irq_handler+0x1c/0x28
   [<c009c458>] irq_forced_thread_fn+0x30/0x74
   [<c009c784>] irq_thread+0x158/0x1c4
   [<c0063fc4>] kthread+0xd4/0xe8
   [<c000eee8>] ret_from_fork+0x14/0x20

stack backtrace:
CPU: 1 PID: 927 Comm: irq/369-pinctrl Not tainted 3.14.43-rt42-00360-g96ff499-dirty #24
[<c00177e0>] (unwind_backtrace) from [<c00130b0>] (show_stack+0x20/0x24)
[<c00130b0>] (show_stack) from [<c0702958>] (dump_stack+0x84/0xd0)
[<c0702958>] (dump_stack) from [<c008bcfc>] (print_irq_inversion_bug+0x1d0/0x21c)
[<c008bcfc>] (print_irq_inversion_bug) from [<c008bf18>] (check_usage_backwards+0xb4/0x11c)
[<c008bf18>] (check_usage_backwards) from [<c008cdd4>] (mark_lock+0x388/0x76c)
[<c008cdd4>] (mark_lock) from [<c008df40>] (__lock_acquire+0x6d0/0x1f98)
[<c008df40>] (__lock_acquire) from [<c0090040>] (lock_acquire+0x9c/0x158)
[<c0090040>] (lock_acquire) from [<c07065c8>] (_raw_spin_lock+0x48/0x58)
[<c07065c8>] (_raw_spin_lock) from [<c0375b54>] (pcs_irq_handle+0x48/0x9c)
[<c0375b54>] (pcs_irq_handle) from [<c0375c5c>] (pcs_irq_handler+0x1c/0x28)
[<c0375c5c>] (pcs_irq_handler) from [<c009c458>] (irq_forced_thread_fn+0x30/0x74)
[<c009c458>] (irq_forced_thread_fn) from [<c009c784>] (irq_thread+0x158/0x1c4)
[<c009c784>] (irq_thread) from [<c0063fc4>] (kthread+0xd4/0xe8)
[<c0063fc4>] (kthread) from [<c000eee8>] (ret_from_fork+0x14/0x20)

To fix it use IRQF_NO_THREAD to ensure that pcs irq will not be forced threaded.

Cc: Tony Lindgren <tony@atomide.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

sh-pfc: fix sparse GPIOs for R-Car SoCs

The PFC driver causes the kernel to hang on the R-Car gen2 SoC based  boards
when the CPU_ALL_PORT() macro is fixed to reflect the reality, i.e. when the
GPIO space becomes actually sparse.  This happens because the _GP_GPIO() macro
includes  an indexed initializer which causes the "holes" (array entries filled
with all 0s) between the groups  of the existing GPIOs; and the driver can't
cope with that.  There seems to  be no reason to use the indexed initializer,
so we can remove the index specifier and so avoid the "holes".

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

pinctrl: abx500: remove strict mode

Commit a21763a0b1e5a5ab8310f581886d04beadc16616
"pinctrl: nomadik: activate strict mux mode"
put all Nomadik pin controllers to strict mode. This was
not good on the Snowball platform: the muxing of GPIOs to
different pins is done with hogs in the DTS file, and then
these GPIOs are used by offset, relying on hogs to mux the
pins. Since that means the pin controller "owns" the pins
and at the same time we have a GPIO user, this pin controller
is by definition not strict.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

Linux 4.2-rc3

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Two fairly simple fixes: one is a change that causes us to have a very
  low queue depth leading to performance issues and the other is a null
  deref occasionally in tapes thanks to use after put"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: fix host max depth checking for the 'queue_depth' sysfs interface
  st: null pointer dereference panic caused by use after kref_put by st_open

Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
"Another round of MIPS fixes for 4.2.

  Things are looking quite decent at this stage but the recent work on
  the FPU support took its toll:

   - fix an incorrect overly restrictive ifdef

   - select O32 64-bit FP support for O32 binary compatibility

   - remove workarounds for Sibyte SB1250 Pass1 parts.  There are rare
     fixing the workarounds is not worth the effort.

   - patch up an outdated and now incorrect comment"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6 CPU
  MIPS: SB1: Remove support for Pass 1 parts.
  MIPS: Require O32 FP64 support for MIPS64 with O32 compat
  MIPS: asm-offset.c: Patch up various comments refering to the old filename.

Merge branch 'parisc-4.2-2' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc fix from Helge Deller:
"A memory leak fix from Christophe Jaillet which was introduced with
kernel 4.0 and which leads to kernel crashes on parisc after 1-3 days"

* 'parisc-4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: mm: Fix a memory leak related to pmd not attached to the pgd

Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"By far most of the fixes here are updates to DTS files to deal with
  some mostly minor bugs.

  There's also a fix to deal with non-PM kernel configs on i.MX, a
  regression fix for ethernet on PXA platforms and a dependency fix for
  OMAP"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: keystone: dts: rename pcie nodes to help override status
  ARM: keystone: dts: fix dt bindings for PCIe
  ARM: pxa: fix dm9000 platform data regression
  ARM: dts: Correct audio input route & set mic bias for am335x-pepper
  ARM: OMAP2+: Add HAVE_ARM_SCU for AM43XX
  MAINTAINERS: digicolor: add dts files
  ARM: ux500: fix MMC/SD card regression
  ARM: ux500: define serial port aliases
  ARM: dts: OMAP5: Add #iommu-cells property to IOMMUs
  ARM: dts: OMAP4: Add #iommu-cells property to IOMMUs
  ARM: dts: Fix frequency scaling on Gumstix Pepper
  ARM: dts: configure regulators for Gumstix Pepper
  ARM: dts: omap3: overo: Update LCD panel names
  ARM: dts: cros-ec-keyboard: Add support for some Japanese keys
  ARM: imx6: gpc: always enable PU domain if CONFIG_PM is not set
  ARM: dts: imx53-qsb: fix TVE entry
  ARM: dts: mx23: fix iio-hwmon support
  ARM: dts: imx27: Adjust the GPT compatible string
  ARM: socfpga: dts: Fix entries order
  ARM: socfpga: dts: Fix adxl34x formating and compatible string

MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6 CPU

Commit 6134d94923d0 ("MIPS: asm: fpu: Allow 64-bit FPU on MIPS32 R6")
added support for 64-bit FPU on a 32-bit MIPS R6 processor but it missed
the 64-bit CPU case leading to FPU failures when requesting FR=1 mode
(which is always the case for MIPS R6 userland) when running a 32-bit
kernel on a 64-bit CPU. We also fix the MIPS R2 case.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Fixes: 6134d94923d0 ("MIPS: asm: fpu: Allow 64-bit FPU on MIPS32 R6")
Reviewed-by: Paul Burton <paul.burton@imgtec.com>
Cc: <stable@vger.kernel.org> # 4.0+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10734/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

parisc: mm: Fix a memory leak related to pmd not attached to the pgd

Commit 0e0da48dee8d ("parisc: mm: don't count preallocated pmds")
introduced a memory leak.

After this commit, the 'return' statement in pmd_free is executed in all
cases. Even for pmd that are not attached to the pgd. So 'free_pages'
can never be called anymore, leading to a memory leak.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Helge Deller <deller@gmx.de>

Merge tag 'pxa-fixes-v4.2-rc2' of https://github.com/rjarzmik/linux into fixesD

Merge "pxa fixes for v4.2" from Robert Jarzmik:

ARM: pxa: fixes for v4.2-rc2

This single fix reenables ethernet cards for several pxa boards,
broken by regulator addition to dm9000 driver.

* tag 'pxa-fixes-v4.2-rc2' of https://github.com/rjarzmik/linux:
ARM: pxa: fix dm9000 platform data regression

Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
"A small set of ARM fixes for -rc3, most of them not far off
  one-liners, with the exception of fixing the V7 cache invalidation for
  incoming SMP processors which was causing problems for SoCFPGA
  devices"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: fix __virt_to_idmap build error on !MMU
  ARM: invalidate L1 before enabling coherency
  ARM: 8404/1: dma-mapping: fix off-by-one error in bitmap size check
  ARM: 8402/1: perf: Don't use of_node after putting it
  ARM: 8400/1: use virt_to_idmap to get phys_reset address

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"Two families of fixes:

   - Fix an FPU context related boot crash on newer x86 hardware with
     larger context sizes than what most people test.  To fix this
     without ugly kludges or extensive reverts we had to touch core task
     allocator, to allow x86 to determine the task size dynamically, at
     boot time.

     I've tested it on a number of x86 platforms, and I cross-built it
     to a handful of architectures:

                                        (warns)               (warns)
       testing     x86-64:  -git:  pass (    0),  -tip:  pass (    0)
       testing     x86-32:  -git:  pass (    0),  -tip:  pass (    0)
       testing        arm:  -git:  pass ( 1359),  -tip:  pass ( 1359)
       testing       cris:  -git:  pass ( 1031),  -tip:  pass ( 1031)
       testing       m32r:  -git:  pass ( 1135),  -tip:  pass ( 1135)
       testing       m68k:  -git:  pass ( 1471),  -tip:  pass ( 1471)
       testing       mips:  -git:  pass ( 1162),  -tip:  pass ( 1162)
       testing    mn10300:  -git:  pass ( 1058),  -tip:  pass ( 1058)
       testing     parisc:  -git:  pass ( 1846),  -tip:  pass ( 1846)
       testing      sparc:  -git:  pass ( 1185),  -tip:  pass ( 1185)

     ... so I hope the cross-arch impact 'none', as intended.

     (by Dave Hansen)

   - Fix various NMI handling related bugs unearthed by the big asm code
     rewrite and generally make the NMI code more robust and more
     maintainable while at it.  These changes are a bit late in the
     cycle, I hope they are still acceptable.

     (by Andy Lutomirski)"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
  x86/fpu, sched: Dynamically allocate 'struct fpu'
  x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code
  x86/nmi/64: Make the "NMI executing" variable more consistent
  x86/nmi/64: Minor asm simplification
  x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
  x86/nmi/64: Reorder nested NMI checks
  x86/nmi/64: Improve nested NMI comments
  x86/nmi/64: Switch stacks on userspace NMI entry
  x86/nmi/64: Remove asm code that saves CR2
  x86/nmi: Enable nested do_nmi() handling for 64-bit kernels

Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
"Fix for a misplaced export that can cause build failures in certain
(rare) Kconfig situations"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick: Move the export of tick_broadcast_oneshot_control to the proper place

Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fix from Ingo Molnar:
"A oneliner rq throttling fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Test list head instead of list entry in throttle_cfs_rq()

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, plus a static key fix fixing /sys/devices/cpu/rdpmc"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Really allow to specify custom CC, AR or LD
  perf auxtrace: Fix misplaced check for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT
  perf hists browser: Take the --comm, --dsos, etc filters into account
  perf symbols: Store if there is a filter in place
  x86, perf: Fix static_key bug in load_mm_cr4()
  tools: Copy lib/hweight.c from the kernel sources
  perf tools: Fix the detached tarball wrt rbtree copy
  perf thread_map: Fix the sizeof() calculation for map entries
  tools lib: Improve clean target
  perf stat: Fix shadow declaration of close
  perf tools: Fix lockup using 32-bit compat vdso

Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Ingo Molnar:
"Misc irq fixes:

   - two driver fixes
   - a Xen regression fix
   - a nested irq thread crash fix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gicv3-its: Fix mapping of LPIs to collections
  genirq: Prevent resend to interrupts marked IRQ_NESTED_THREAD
  genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now
  gpio/davinci: Fix race in installing chained irq handler

Merge branch 'akpm' (patches from Andrew)

Merge fixes from Andrew Morton:
"25 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (25 commits)
  lib/decompress: set the compressor name to NULL on error
  mm/cma_debug: correct size input to bitmap function
  mm/cma_debug: fix debugging alloc/free interface
  mm/page_owner: set correct gfp_mask on page_owner
  mm/page_owner: fix possible access violation
  fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()
  /proc/$PID/cmdline: fixup empty ARGV case
  dma-debug: skip debug_dma_assert_idle() when disabled
  hexdump: fix for non-aligned buffers
  checkpatch: fix long line messages about patch context
  mm: clean up per architecture MM hook header files
  MAINTAINERS: uclinux-h8-devel is moderated for non-subscribers
  mailmap: update Sudeep Holla's email id
  Update Viresh Kumar's email address
  mm, meminit: suppress unused memory variable warning
  configfs: fix kernel infoleak through user-controlled format string
  include, lib: add __printf attributes to several function prototypes
  s390/hugetlb: add hugepages_supported define
  mm: hugetlb: allow hugepages_supported to be architecture specific
  revert "s390/mm: make hugepages_supported a boot time decision"
  ...

Merge branch 'for-linus-4.2' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"These are all from Filipe, and cover a few problems we've had reported
  on the list recently (along with ones he found on his own)"

* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix file corruption after cloning inline extents
  Btrfs: fix order by which delayed references are run
  Btrfs: fix list transaction->pending_ordered corruption
  Btrfs: fix memory leak in the extent_same ioctl
  Btrfs: fix shrinking truncate when the no_holes feature is enabled

Merge tag 'rtc-v4.2-2' of git://git./linux/kernel/git/abelloni/linux

Pull rtc fixes from Alexandre Belloni:
"A few fixes for the RTC susbsystem for 4.2.

  The mt6397 driver was introduce in 4.2 so it is worth fixing before
  the final release.  I though the compilation warning for armada38x was
  fixed by akpm in commit f98b733e93e0 ("rtc-armada38x.c: remove unused
  local `flags'") but he actually missed some occurrences of the
  variables.  Since I received 4 patches for that, I think we can
  include it now.

  Summary:
   - fix mt6397 wakealarm creation
   - remove a compilation warning for armada38x that was forgotten"

* tag 'rtc-v4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  rtc: armada38x: Remove unused variable from armada38x_rtc_set_time()
  rtc: mt6397: enable wakeup before registering rtc device

Merge tag 'dm-4.2-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- revert a request-based DM core change that caused IO latency to
   increase and adversely impact both throughput and system load

- fix for a use after free bug in DM core's device cleanup

- a couple DM btree removal fixes (used by dm-thinp)

- a DM thinp fix for order-5 allocation failure

- a DM thinp fix to not degrade to read-only metadata mode when in
   out-of-data-space mode for longer than the 'no_space_timeout'

- fix a long-standing oversight in both dm-thinp and dm-cache by now
   exporting 'needs_check' in status if it was set in metadata

- fix an embarrassing dm-cache busy-loop that caused worker threads to
   eat cpu even if no IO was actively being issued to the cache device

* tag 'dm-4.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache: avoid calls to prealloc_free_structs() if possible
  dm cache: avoid preallocation if no work in writeback_some_dirty_blocks()
  dm cache: do not wake_worker() in free_migration()
  dm cache: display 'needs_check' in status if it is set
  dm thin: display 'needs_check' in status if it is set
  dm thin: stay in out-of-data-space mode once no_space_timeout expires
  dm: fix use after free crash due to incorrect cleanup sequence
  Revert "dm: only run the queue on completion if congested or no requests pending"
  dm btree: silence lockdep lock inversion in dm_btree_del()
  dm thin: allocate the cell_sort_array dynamically
  dm btree remove: fix bug in redistribute3

x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86

Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.

Also optimize the x86 code a bit by caching task_struct_size.

Acked-and-Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/fpu, sched: Dynamically allocate 'struct fpu'

The FPU rewrite removed the dynamic allocations of 'struct fpu'.
But, this potentially wastes massive amounts of memory (2k per
task on systems that do not have AVX-512 for instance).

Instead of having a separate slab, this patch just appends the
space that we need to the 'task_struct' which we dynamically
allocate already. This saves from doing an extra slab
allocation at fork().

The only real downside here is that we have to stick everything
and the end of the task_struct. But, I think the
BUILD_BUG_ON()s I stuck in there should keep that from being too
fragile.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

lib/decompress: set the compressor name to NULL on error

Without this we end up using the previous name of the compressor in the
loop in unpack_rootfs. For example we get errors like "compression
method gzip not configured" even when we have CONFIG_DECOMPRESS_GZIP
enabled.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/cma_debug: correct size input to bitmap function

In CMA, 1 bit in bitmap means 1 << order_per_bits pages so size of
bitmap is cma->count >> order_per_bits rather than just cma->count.
This patch fixes it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Stefan Strogin <stefan.strogin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/cma_debug: fix debugging alloc/free interface

CMA has alloc/free interface for debugging. It is intended that
alloc/free occurs in specific CMA region, but, currently, alloc/free
interface is on root dir due to the bug so we can't select CMA region
where alloc/free happens.

This patch fixes this problem by making alloc/free interface per CMA
region.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Stefan Strogin <stefan.strogin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/page_owner: set correct gfp_mask on page_owner

Currently, we set wrong gfp_mask to page_owner info in case of isolated
freepage by compaction and split page. It causes incorrect mixed
pageblock report that we can get from '/proc/pagetypeinfo'. This metric
is really useful to measure fragmentation effect so should be accurate.
This patch fixes it by setting correct information.

Without this patch, after kernel build workload is finished, number of
mixed pageblock is 112 among roughly 210 movable pageblocks.

But, with this fix, output shows that mixed pageblock is just 57.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/page_owner: fix possible access violation

When I tested my new patches, I found that page pointer which is used
for setting page_owner information is changed.  This is because page
pointer is used to set new migratetype in loop.  After this work, page
pointer could be out of bound.  If this wrong pointer is used for
page_owner, access violation happens.  Below is error message that I
got.

  BUG: unable to handle kernel paging request at 0000000000b00018
  IP: [<ffffffff81025f30>] save_stack_address+0x30/0x40
  PGD 1af2d067 PUD 166e0067 PMD 0
  Oops: 0002 [#1] SMP
  ...snip...
  Call Trace:
    print_context_stack+0xcf/0x100
    dump_trace+0x15f/0x320
    save_stack_trace+0x2f/0x50
    __set_page_owner+0x46/0x70
    __isolate_free_page+0x1f7/0x210
    split_free_page+0x21/0xb0
    isolate_freepages_block+0x1e2/0x410
    compaction_alloc+0x22d/0x2d0
    migrate_pages+0x289/0x8b0
    compact_zone+0x409/0x880
    compact_zone_order+0x6d/0x90
    try_to_compact_pages+0x110/0x210
    __alloc_pages_direct_compact+0x3d/0xe6
    __alloc_pages_nodemask+0x6cd/0x9a0
    alloc_pages_current+0x91/0x100
    runtest_store+0x296/0xa50
    simple_attr_write+0xbd/0xe0
    __vfs_write+0x28/0xf0
    vfs_write+0xa9/0x1b0
    SyS_write+0x46/0xb0
    system_call_fastpath+0x16/0x75

This patch fixes this error by moving up set_page_owner().

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()

fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and we dereference free
memory in the loop there.

Fix the problem by keeping mark_mutex held in
fsnotify_destroy_mark_locked(). The reason why we drop that mutex is that
we need to call a ->freeing_mark() callback which may acquire mark_mutex
again. To avoid this and similar lock inversion issues, we move the call
to ->freeing_mark() callback to the kthread destroying the mark.

Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

/proc/$PID/cmdline: fixup empty ARGV case

/proc/*/cmdline code checks if it should look at ENVP area by checking
last byte of ARGV area:

rv = access_remote_vm(mm, arg_end - 1, &c, 1, 0);
if (rv <= 0)
goto out_free_page;

If ARGV is somehow made empty (by doing execve(..., NULL, ...) or
manually setting ->arg_start and ->arg_end to equal values), the decision
will be based on byte which doesn't even belong to ARGV/ENVP.

So, quickly check if ARGV area is empty and report 0 to match previous
behaviour.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>