Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

projects / firefly-linux-kernel-4.4.55.git / log

Tom Herbert [Thu, 30 Oct 2014 15:40:56 +0000 (08:40 -0700)]

gre: Use inner mac length when computing tunnel length

Currently, skb_inner_network_header is used but this does not account
for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which
handles TEB and also should work with IP encapsulation in which case
inner mac and inner network headers are the same.

Tested: Ran TCP_STREAM over GRE, worked as expected.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Thu, 30 Oct 2014 23:49:20 +0000 (19:49 -0400)]

Merge branch 'mellanox-net'

Or Gerlitz says:

====================
mlx4 driver encapsulation/steering fixes

The 1st patch fixes a bug in the TX path that supports offloading the
TX checksum of (VXLAN) encapsulated TCP packets. It turns out that the
bug is revealed only when the receiver runs in non-offloaded mode, so
we somehow missed it so far... please queue it for -stable >= 3.14

The 2nd patch makes sure not to leak steering entry on error flow,
please queue it to 3.17-stable
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Or Gerlitz [Thu, 30 Oct 2014 13:59:28 +0000 (15:59 +0200)]

mlx4: Avoid leaking steering rules on flow creation error flow

If mlx4_ib_create_flow() attempts to create > 1 rules with the
firmware, and one of these registrations fail, we leaked the
already created flow rules.

One example of the leak is when the registration of the VXLAN ghost
steering rule fails, we didn't unregister the original rule requested
by the user, introduced in commit d2fce8a9060d "mlx4: Set
user-space raw Ethernet QPs to properly handle VXLAN traffic".

While here, add dump of the VXLAN portion of steering rules
so it can actually be seen when flow creation fails.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Or Gerlitz [Thu, 30 Oct 2014 13:59:27 +0000 (15:59 +0200)]

net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN

For VXLAN/NVGRE encapsulation, the current HW doesn't support offloading
both the outer UDP TX checksum and the inner TCP/UDP TX checksum.

The driver doesn't advertize SKB_GSO_UDP_TUNNEL_CSUM, however we are wrongly
telling the HW to offload the outer UDP checksum for encapsulated packets,
fix that.

Fixes: 837052d0ccc5 ('net/mlx4_en: Add netdev support for TCP/IP
offloads of vxlan tunneling')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Thu, 30 Oct 2014 23:46:33 +0000 (19:46 -0400)]

Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-10-30

This series contains updates to e1000, igb and ixgbe.

Francesco Ruggeri fixes an issue with e1000 where in a VM the driver did
not support unicast filtering.

Roman Gushchin fixes an issue with igb where the driver was re-using
mapped pages so that packets were still getting dropped even if all
the memory issues are gone and there is free memory.

Junwei Zhang found where in the ixgbe_clean_rx_ring() we were repeating
the assignment of NULL to the receive buffer skb and fixes it.

Emil fixes a race condition between setup_link and SFP detection routine
in the watchdog when setting the advertised speed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nicolas Cavallari [Thu, 30 Oct 2014 09:09:53 +0000 (10:09 +0100)]

ipv4: Do not cache routing failures due to disabled forwarding.

If we cache them, the kernel will reuse them, independently of
whether forwarding is enabled or not. Which means that if forwarding is
disabled on the input interface where the first routing request comes
from, then that unreachable result will be cached and reused for
other interfaces, even if forwarding is enabled on them. The opposite
is also true.

This can be verified with two interfaces A and B and an output interface
C, where B has forwarding enabled, but not A and trying
ip route get $dst iif A from $src && ip route get $dst iif B from $src

Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Anish Bhatt [Thu, 30 Oct 2014 00:54:03 +0000 (17:54 -0700)]

cxgb4 : Fix missing initialization of win0_lock

win0_lock was being used un-initialized, resulting in warning traces
being seen when lock debugging is enabled (and just wrong)

Fixes : fc5ab0209650 ('cxgb4: Replaced the backdoor mechanism to access the HW
memory with PCIe Window method')

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Thu, 30 Oct 2014 19:49:05 +0000 (15:49 -0400)]

Merge branch 'r8152-net'

Hayes Wang says:

====================
r8152: patches for autosuspend

There are unexpected processes when enabling autosuspend.
These patches are used to fix them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

hayeswang [Wed, 29 Oct 2014 03:12:17 +0000 (11:12 +0800)]

r8152: check WORK_ENABLE in suspend function

Avoid unnecessary behavior when autosuspend occurs during open().
The relative processes should only be run after finishing open().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

hayeswang [Wed, 29 Oct 2014 03:12:16 +0000 (11:12 +0800)]

r8152: reset tp->speed before autoresuming in open function

If (tp->speed & LINK_STATUS) is not zero, the rtl8152_resume()
would call rtl_start_rx() before enabling the tx/rx. Avoid this
by resetting it to zero.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

hayeswang [Wed, 29 Oct 2014 03:12:15 +0000 (11:12 +0800)]

r8152: clear SELECTIVE_SUSPEND when autoresuming

The flag of SELECTIVE_SUSPEND should be cleared when autoresuming.
Otherwise, when the system suspend and resume occur, it may have
the wrong flow.

Besides, because the flag of SELECTIVE_SUSPEND couldn't be used
to check if the hw enables the relative feature, it should alwayes
be disabled in close().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Emil Tantilov [Tue, 28 Oct 2014 05:50:03 +0000 (05:50 +0000)]

ixgbe: fix race when setting advertised speed

Following commands:

modprobe ixgbe
ifconfig ethX up
ethtool -s ethX advertise 0x020

can lead to "setup link failed with code -14" error due to the setup_link
call racing with the SFP detection routine in the watchdog.

This patch resolves this issue by protecting the setup_link call with check
for __IXGBE_IN_SFP_INIT.

Reported-by: Scott Harrison <scoharr2@cisco.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Junwei Zhang [Wed, 22 Oct 2014 15:29:03 +0000 (15:29 +0000)]

ixgbe: need not repeat init skb with NULL

Signed-off-by: Martin Zhang <martinbj2008@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Roman Gushchin [Thu, 23 Oct 2014 03:32:27 +0000 (03:32 +0000)]

igb: don't reuse pages with pfmemalloc flag

Incoming packet is dropped silently by sk_filter(), if the skb was
allocated from pfmemalloc reserves and the corresponding socket is
not marked with the SOCK_MEMALLOC flag.

Igb driver allocates pages for DMA with __skb_alloc_page(), which
calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
of OOM condition, igb can get pages with pfmemalloc flag set.

If an incoming packet hits the pfmemalloc page and is large enough
(small packets are copying into the memory, allocated with
netdev_alloc_skb_ip_align(), so they are not affected), it will be
dropped.

This behavior is ok under high memory pressure, but the problem is
that the igb driver reuses these mapped pages. So, packets are still
dropping even if all memory issues are gone and there is a plenty
of free memory.

In my case, some TCP sessions hang on a small percentage (< 0.1%)
of machines days after OOMs.

Fix this by avoiding reuse of such pages.

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Tested-by: Aaron Brown "aaron.f.brown@intel.com"
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Francesco Ruggeri [Wed, 22 Oct 2014 15:29:24 +0000 (15:29 +0000)]

e1000: unset IFF_UNICAST_FLT on WMware 82545EM

VMWare's e1000 implementation does not seem to support unicast filtering.
This can be observed by configuring a macvlan interface on eth0 in a VM in
VMWare Fusion 5.0.5, and trying to use that interface instead of eth0.
Tested on 3.16.

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Nikolay Aleksandrov [Tue, 28 Oct 2014 09:44:01 +0000 (10:44 +0100)]

inet: frags: remove the WARN_ON from inet_evict_bucket

The WARN_ON in inet_evict_bucket can be triggered by a valid case:
inet_frag_kill and inet_evict_bucket can be running in parallel on the
same queue which means that there has been at least one more ref added
by a previous inet_frag_find call, but inet_frag_kill can delete the
timer before inet_evict_bucket which will cause the WARN_ON() there to
trigger since we'll have refcnt!=1. Now, this case is valid because the
queue is being "killed" for some reason (removed from the chain list and
its timer deleted) so it will get destroyed in the end by one of the
inet_frag_put() calls which reaches 0 i.e. refcnt is still valid.

CC: Florian Westphal <fw@strlen.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McLean <chutzpah@gentoo.org>
Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue")
Reported-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nikolay Aleksandrov [Tue, 28 Oct 2014 09:30:34 +0000 (10:30 +0100)]

inet: frags: fix a race between inet_evict_bucket and inet_frag_kill

When the evictor is running it adds some chosen frags to a local list to
be evicted once the chain lock has been released but at the same time
the *frag_queue can be running for some of the same queues and it
may call inet_frag_kill which will wait on the chain lock and
will then delete the queue from the wrong list since it was added in the
eviction one. The fix is simple - check if the queue has the evict flag
set under the chain lock before deleting it, this is safe because the
evict flag is set only under that lock and having the flag set also means
that the queue has been detached from the chain list, so no need to delete
it again.
An important note to make is that we're safe w.r.t refcnt because
inet_frag_kill and inet_evict_bucket will sync on the del_timer operation
where only one of the two can succeed (or if the timer is executing -
none of them), the cases are:
1. inet_frag_kill succeeds in del_timer
- then the timer ref is removed, but inet_evict_bucket will not add
   this queue to its expire list but will restart eviction in that chain
2. inet_evict_bucket succeeds in del_timer
- then the timer ref is kept until the evictor "expires" the queue, but
   inet_frag_kill will remove the initial ref and will set
   INET_FRAG_COMPLETE which will make the frag_expire fn just to remove
   its ref.
In the end all of the queue users will do an inet_frag_put and the one
that reaches 0 will free it. The refcount balance should be okay.

CC: Florian Westphal <fw@strlen.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McLean <chutzpah@gentoo.org>
Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Patrick McLean <chutzpah@gentoo.org>
Tested-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tej Parkash [Tue, 28 Oct 2014 05:18:15 +0000 (01:18 -0400)]

cnic: Update the rcu_access_pointer() usages

1. Remove the rcu_read_lock/unlock around rcu_access_pointer
2. Replace the rcu_dereference with rcu_access_pointer

Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Hariprasad Shenai [Mon, 27 Oct 2014 17:52:10 +0000 (23:22 +0530)]

cxgb4vf: Replace repetitive pci device ID's with right ones

Replaced repetive Device ID's which got added in commit b961f9a48844ecf3
("cxgb4vf: Remove superfluous "idx" parameter of CH_DEVICE() macro")

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lubomir Rintel [Mon, 27 Oct 2014 16:39:16 +0000 (17:39 +0100)]

ipv6: notify userspace when we added or changed an ipv6 token

NetworkManager might want to know that it changed when the router advertisement
arrives.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

WANG Cong [Fri, 24 Oct 2014 23:55:58 +0000 (16:55 -0700)]

sch_pie: schedule the timer after all init succeed

Cc: Vijay Subramanian <vijaynsu@cisco.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>

commit | commitdiff | tree

David S. Miller [Tue, 28 Oct 2014 21:26:24 +0000 (17:26 -0400)]

Merge branch 'cdc-ether'

Olivier Blin says:

====================
cdc-ether: handle promiscuous mode

Since kernel 3.16, my Lenovo USB network adapters (RTL8153) using
cdc-ether are not working anymore in a bridge.

This is due to commit c472ab68ad67db23c9907a27649b7dc0899b61f9, which
resets the packet filter when the device is bound.

The default packet filter set by cdc-ether does not include
promiscuous, while the adapter seemed to have promiscuous enabled by
default.

This patch series allows to support promiscuous mode for cdc-ether, by
hooking into set_rx_mode.

Incidentally, maybe this device should be handled by the r8152 driver,
but this patch series is still nice for other adapters.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Oliver Neukum <oneukum@suse.de>

commit | commitdiff | tree

Olivier Blin [Fri, 24 Oct 2014 17:43:02 +0000 (19:43 +0200)]

cdc-ether: handle promiscuous mode with a set_rx_mode callback

Promiscuous mode was not supported anymore with my Lenovo adapters
(RTL8153) since commit c472ab68ad67db23c9907a27649b7dc0899b61f9
(cdc-ether: clean packet filter upon probe).

It was not possible to use them in a bridge anymore.

Signed-off-by: Olivier Blin <olivier.blin@softathome.com>
Also-analyzed-by: Loïc Yhuel <loic.yhuel@softathome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Olivier Blin [Fri, 24 Oct 2014 17:43:01 +0000 (19:43 +0200)]

cdc-ether: extract usbnet_cdc_update_filter function

This will be used by the set_rx_mode callback.

Also move a comment about multicast filtering in this new function.

Signed-off-by: Olivier Blin <olivier.blin@softathome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Olivier Blin [Fri, 24 Oct 2014 17:43:00 +0000 (19:43 +0200)]

usbnet: add a callback for set_rx_mode

To delegate promiscuous mode and multicast filtering to the subdriver.

Signed-off-by: Olivier Blin <olivier.blin@softathome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 28 Oct 2014 21:08:56 +0000 (17:08 -0400)]

Merge branch 'systemport-net'

Florian Fainelli says:

====================
net: systemport: RX path and suspend fixes

These two patches fix a race condition where we have our RX interrupts
enabled, but not NAPI for the RX path, and the second patch fixes an
issue for packets stuck in RX fifo during a suspend/resume cycle.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Tue, 28 Oct 2014 18:12:01 +0000 (11:12 -0700)]

net: systemport: reset UniMAC coming out of a suspend cycle

bcm_sysport_resume() was missing an UniMAC reset which can lead to
various receive FIFO corruptions coming out of a suspend cycle. If the
RX FIFO is stuck, it will deliver corrupted/duplicate packets towards
the host CPU interface.

This could be reproduced on crowded network and when Wake-on-LAN is
enabled for this particular interface because the switch still forwards
packets towards the host CPU interface (SYSTEMPORT), and we had to leave
the UniMAC RX enable bit on to allow matching MagicPackets.

Once we re-enter the resume function, there is a small window during
which the UniMAC receive is still enabled, and we start queueing
packets, but the RDMA and RBUF engines are not ready, which leads to
having packets stuck in the UniMAC RX FIFO, ultimately delivered towards
the host CPU as corrupted.

Fixes: 40755a0fce17 ("net: systemport: add suspend and resume support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Tue, 28 Oct 2014 18:12:00 +0000 (11:12 -0700)]

net: systemport: enable RX interrupts after NAPI

There is currently a small window during which the SYSTEMPORT adapter
enables its RX interrupts without having enabled its NAPI handler, which
can result in packets to be discarded during interface bringup.

A similar but more serious window exists in bcm_sysport_resume() during
which we can have the RDMA engine not fully prepared to receive packets
and yet having RX interrupts enabled.

Fix this my moving the RX interrupt enable down to
bcm_sysport_netif_start() after napi_enable() for the RX path is called,
which fixes both call sites: bcm_sysport_open() and
bcm_sysport_resume().

Fixes: b02e6d9ba7ad ("net: systemport: add bcm_sysport_netif_{enable,stop}")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Randy Dunlap [Mon, 27 Oct 2014 02:14:06 +0000 (19:14 -0700)]

skbuff.h: fix kernel-doc warning for headers_end

Fix kernel-doc warning in <linux/skbuff.h> by making both headers_start
and headers_end private fields.

Warning(..//include/linux/skbuff.h:654): No description found for parameter 'headers_end[0]'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vince Bridgers [Sun, 26 Oct 2014 19:22:24 +0000 (14:22 -0500)]

net: phy: Add SGMII Configuration for Marvell 88E1145 Initialization

Marvell phy 88E1145 configuration & initialization was missing a case
for initializing SGMII mode. This patch adds that case.

Signed-off-by: Vince Bridgers <vbridger@opensource.altera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Mugunthan V N [Fri, 24 Oct 2014 13:21:33 +0000 (18:51 +0530)]

drivers: net:cpsw: fix probe_dt when only slave 1 is pinned out

when slave 0 has no phy and slave 1 connected to phy, driver probe will
fail as there is no phy id present for slave 0 device tree, so continuing
even though no phy-id found, also moving mac-id read later to ensure
mac-id is read from device tree even when phy-id entry in not found.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 28 Oct 2014 19:30:15 +0000 (15:30 -0400)]

Merge tag 'master-2014-10-27' of git://git./linux/kernel/git/linville/wireless

John W. Linville says:

====================
pull request: wireless 2014-10-28

Please pull this batch of fixes intended for the 3.18 stream!

For the mac80211 bits, Johannes says:

"Here are a few fixes for the wireless stack: one fixes the
RTS rate, one for a debugfs file, one to return the correct
channel to userspace, a sanity check for a userspace value
and the remaining two are just documentation fixes."

For the iwlwifi bits, Emmanuel says:

"I revert here a patch that caused interoperability issues.
dvm gets a fix for a bug that was reported by many users.
Two minor fixes for BT Coex and platform power fix that helps
reducing latency when the PCIe link goes to low power states."

In addition...

Felix Fietkau adds a couple of ath code fixes related to regulatory
rule enforcement.

Hauke Mehrtens fixes a build break with bcma when CONFIG_OF_ADDRESS
is not set.

Karsten Wiese provides a trio of minor fixes for rtl8192cu.

Kees Cook prevents a potential information leak in rtlwifi.

Larry Finger also brings a trio of minor fixes for rtlwifi.

Rafał Miłecki adds a device ID to the bcma bus driver.

Rickard Strandqvist offers some strn* -> strl* changes in brcmfmac
to eliminate non-terminated string issues.

Sujith Manoharan avoids some ath9k stalls by enabling HW queue control
only for MCC.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 28 Oct 2014 19:28:30 +0000 (15:28 -0400)]

Merge branch 'dsa-net'

Andrew Lunn says:

====================
DSA tagging mismatches

The second patch is a fix, which should be applied to -rc. It is
possible to get a DSA configuration which does not work. The patch
stops this happening.

The first patch detects this situation, and errors out the probe of
DSA, making it more obvious something is wrong. It is not required to
apply it -rc.

v2 fixes the use case pointed out by Florian, that a switch driver
may use DSA_TAG_PROTO_NONE which the patch did not correctly handle.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Andrew Lunn [Fri, 24 Oct 2014 21:44:05 +0000 (23:44 +0200)]

dsa: mv88e6171: Fix tagging protocol/Kconfig

The mv88e6171 can support two different tagging protocols, DSA and
EDSA. The switch driver structure only allows one protocol to be
enumerated, and DSA was chosen. However the Kconfig entry ensures the
EDSA tagging code is built. With a minimal configuration, we then end
up with a mismatch. The probe is successful, EDSA tagging is used, but
the switch is configured for DSA, resulting in mangled packets.

Change the switch driver structure to enumerate EDSA, fixing the
mismatch.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 42f272539487 ("net: DSA: Marvell mv88e6171 switch driver")
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Andrew Lunn [Fri, 24 Oct 2014 21:44:04 +0000 (23:44 +0200)]

net: dsa: Error out on tagging protocol mismatches

If there is a mismatch between enabled tagging protocols and the
protocol the switch supports, error out, rather than continue with a
situation which is unlikely to work.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
cc: alexander.h.duyck@intel.com
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alexei Starovoitov [Fri, 24 Oct 2014 01:41:08 +0000 (18:41 -0700)]

bpf: split eBPF out of NET

introduce two configs:
- hidden CONFIG_BPF to select eBPF interpreter that classic socket filters
  depend on
- visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use

that solves several problems:
- tracing and others that wish to use eBPF don't need to depend on NET.
  They can use BPF_SYSCALL to allow loading from userspace or select BPF
  to use it directly from kernel in NET-less configs.
- in 3.18 programs cannot be attached to events yet, so don't force it on
- when the rest of eBPF infra is there in 3.19+, it's still useful to
  switch it off to minimize kernel size

bloat-o-meter on x64 shows:
add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601)

tested with many different config combinations. Hopefully didn't miss anything.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 27 Oct 2014 23:00:16 +0000 (19:00 -0400)]

Merge branch 'cxgb4-net'

Anish Bhatt says:

====================
cxgb4 : DCBx fixes for apps/host lldp agents

This patchset contains some minor fixes for cxgb4 DCBx code. Chiefly, cxgb4
was not cleaning up any apps added to kernel app table when link was lost.
Disabling DCBx in firmware would automatically set DCBx state to host-managed
and enabled, we now wait for an explicit enable call from an lldp agent instead

First patch was originally sent to net-next, but considering it applies to
correcting behaviour of code already in net, I think it qualifies as a bug fix.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Anish Bhatt [Thu, 23 Oct 2014 21:37:31 +0000 (14:37 -0700)]

cxgb4 : Handle dcb enable correctly

Disabling DCBx in firmware automatically enables DCBx for control via host
lldp agents. Wait for an explicit setstate call from an lldp agents to enable
DCBx instead.

Fixes: 76bcb31efc06 ("cxgb4 : Add DCBx support codebase and dcbnl_ops")
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Anish Bhatt [Thu, 23 Oct 2014 21:37:30 +0000 (14:37 -0700)]

cxgb4 : Improve handling of DCB negotiation or loss thereof

Clear out any DCB apps we might have added to kernel table when we lose DCB
sync (or IEEE equivalent event). These were previously left behind and not
cleaned up correctly. IEEE allows individual components to work independently,
so improve check for IEEE completion by specifying individual components.

Fixes: 10b0046685ab ("cxgb4: IEEE fixes for DCBx state machine")
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 27 Oct 2014 22:47:40 +0000 (18:47 -0400)]

Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Allow to recycle a TCP port in conntrack when the change role from
   server to client, from Marcelo Leitner.

2) Fix possible off by one access in ip_set_nfnl_get_byindex(), patch
   from Dan Carpenter.

3) alloc_percpu returns NULL on error, no need for IS_ERR() in nf_tables
   chain statistic updates. From Sabrina Dubroca.

4) Don't compile ip options in bridge netfilter, this mangles the packet
   and bridge should not alter layer >= 3 headers when forwarding packets.
   Patch from Herbert Xu and tested by Florian Westphal.

5) Account the final NLMSG_DONE message when calculating the size of the
   nflog netlink batches. Patch from Florian Westphal.

6) Fix a possible netlink attribute length overflow with large packets.
   Again from Florian Westphal.

7) Release the skbuff if nfnetlink_log fails to put the final
   NLMSG_DONE message. This fixes a leak on error. This shouldn't ever
   happen though, otherwise this means we miscalculate the netlink batch
   size, so spot a warning if this ever happens so we can track down the
   problem. This patch from Houcheng Lin.

8) Look at the right list when recycling targets in the nft_compat,
   patch from Arturo Borrero.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Arturo Borrero [Sun, 26 Oct 2014 11:22:40 +0000 (12:22 +0100)]

netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops()

The code looks for an already loaded target, and the correct list to search
is nft_target_list, not nft_match_list.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

John W. Linville [Mon, 27 Oct 2014 17:38:15 +0000 (13:38 -0400)]

Merge tag 'mac80211-for-john-2014-10-23' of git://git./linux/kernel/git/jberg/mac80211

Johannes Berg <johannes@sipsolutions.net> says:

"Here are a few fixes for the wireless stack: one fixes the
RTS rate, one for a debugfs file, one to return the correct
channel to userspace, a sanity check for a userspace value
and the remaining two are just documentation fixes."

Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

John W. Linville [Mon, 27 Oct 2014 17:35:59 +0000 (13:35 -0400)]

Merge tag 'iwlwifi-for-john-2014-10-23' of git://git./linux/kernel/git/iwlwifi/iwlwifi-fixes

Emmanuel Grumbach <egrumbach@gmail.com> says:

"I revert here a patch that caused interoperability issues.
dvm gets a fix for a bug that was reported by many users.
Two minor fixes for BT Coex and platform power fix that helps
reducing latency when the PCIe link goes to low power states."

Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

Eric Dumazet [Thu, 23 Oct 2014 13:30:30 +0000 (06:30 -0700)]

net: napi_reuse_skb() should check pfmemalloc

Do not reuse skb if it was pfmemalloc tainted, otherwise
future frame might be dropped anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 27 Oct 2014 02:46:08 +0000 (22:46 -0400)]

Merge branch 'mellanox'

Eli Cohen says:

====================
irq sync fixes

This two patch series fixes a race where an interrupt handler could access a
freed memory.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eli Cohen [Thu, 23 Oct 2014 12:57:27 +0000 (15:57 +0300)]

net/mlx4_core: Call synchronize_irq() before freeing EQ buffer

After moving the EQ ownership to software effectively destroying it, call
synchronize_irq() to ensure that any handler routines running on other CPU
cores finish execution. Only then free the EQ buffer.
The same thing is done when we destroy a CQ which is one of the sources
generating interrupts. In the case of CQ we want to avoid completion handlers
on a CQ that was destroyed. In the case we do the same to avoid receiving
asynchronous events after the EQ has been destroyed and its buffers freed.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eli Cohen [Thu, 23 Oct 2014 12:57:26 +0000 (15:57 +0300)]

net/mlx5_core: Call synchronize_irq() before freeing EQ buffer

After destroying the EQ, the object responsible for generating interrupts, call
synchronize_irq() to ensure that any handler routines running on other CPU
cores finish execution. Only then free the EQ buffer. This patch solves a very
rare case when we get panic on driver unload.
The same thing is done when we destroy a CQ which is one of the sources
generating interrupts. In the case of CQ we want to avoid completion handlers
on a CQ that was destroyed. In the case we do the same to avoid receiving
asynchronous events after the EQ has been destroyed and its buffers freed.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Geert Uytterhoeven [Thu, 23 Oct 2014 08:25:53 +0000 (10:25 +0200)]

drivers: net: xgene: Rewrite buggy loop in xgene_enet_ecc_init()

drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c: In function ‘xgene_enet_ecc_init’:
drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c:126: warning: ‘data’ may be used uninitialized in this function

Depending on the arbitrary value on the stack, the loop may terminate
too early, and cause a bogus -ENODEV failure.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Dan Carpenter [Thu, 23 Oct 2014 03:06:29 +0000 (20:06 -0700)]

i40e: _MASK vs _SHIFT typo in i40e_handle_mdd_event()

We accidentally mask by the _SHIFT variable. It means that "event" is
always zero.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Thu, 23 Oct 2014 02:43:46 +0000 (19:43 -0700)]

macvlan: fix a race on port dismantle and possible skb leaks

We need to cancel the work queue after rcu grace period,
otherwise it can be rescheduled by incoming packets.

We need to purge queue if some skbs are still in it.

We can use __skb_queue_head_init() variant in
macvlan_process_broadcast()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 412ca1550cbec ("macvlan: Move broadcasts into a work queue")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Thu, 23 Oct 2014 19:58:58 +0000 (12:58 -0700)]

tcp: md5: do not use alloc_percpu()

percpu tcp_md5sig_pool contains memory blobs that ultimately
go through sg_set_buf().

-> sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));

This requires that whole area is in a physically contiguous portion
of memory. And that @buf is not backed by vmalloc().

Given that alloc_percpu() can use vmalloc() areas, this does not
fit the requirements.

Replace alloc_percpu() by a static DEFINE_PER_CPU() as tcp_md5sig_pool
is small anyway, there is no gain to dynamically allocate it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 765cf9976e93 ("tcp: md5: remove one indirection level in tcp_md5sig_pool")
Reported-by: Crestez Dan Leonard <cdleonard@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Sat, 25 Oct 2014 18:15:25 +0000 (14:15 -0400)]

Merge branch 'xen-netback'

David Vrabel says:

====================
xen-netback: guest Rx queue drain and stall fixes

This series fixes two critical xen-netback bugs.

1. Netback may consume all of host memory by queuing an unlimited
   number of skb on the internal guest Rx queue.  This behaviour is
   guest triggerable.

2. Carrier flapping under high traffic rates which reduces
   performance.

The first patch is a prerequite.  Removing support for frontends with
feature-rx-notify makes it easier to reason about the correctness of
netback since it no longer has to support this outdated and broken
mode.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David Vrabel [Wed, 22 Oct 2014 13:08:55 +0000 (14:08 +0100)]

xen-netback: reintroduce guest Rx stall detection

If a frontend not receiving packets it is useful to detect this and
turn off the carrier so packets are dropped early instead of being
queued and drained when they expire.

A to-guest queue is stalled if it doesn't have enough free slots for a
an extended period of time (default 60 s).

If at least one queue is stalled, the carrier is turned off (in the
expectation that the other queues will soon stall as well). The
carrier is only turned on once all queues are ready.

When the frontend connects, all the queues start in the stalled state
and only become ready once the frontend queues enough Rx requests.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David Vrabel [Wed, 22 Oct 2014 13:08:54 +0000 (14:08 +0100)]

xen-netback: fix unlimited guest Rx internal queue and carrier flapping

Netback needs to discard old to-guest skb's (guest Rx queue drain) and
it needs detect guest Rx stalls (to disable the carrier so packets are
discarded earlier), but the current implementation is very broken.

1. The check in hard_start_xmit of the slot availability did not
   consider the number of packets that were already in the guest Rx
   queue.  This could allow the queue to grow without bound.

   The guest stops consuming packets and the ring was allowed to fill
   leaving S slot free.  Netback queues a packet requiring more than S
   slots (ensuring that the ring stays with S slots free).  Netback
   queue indefinately packets provided that then require S or fewer
   slots.

2. The Rx stall detection is not triggered in this case since the
   (host) Tx queue is not stopped.

3. If the Tx queue is stopped and a guest Rx interrupt occurs, netback
   will consider this an Rx purge event which may result in it taking
   the carrier down unnecessarily.  It also considers a queue with
   only 1 slot free as unstalled (even though the next packet might
   not fit in this).

The internal guest Rx queue is limited by a byte length (to 512 Kib,
enough for half the ring).  The (host) Tx queue is stopped and started
based on this limit.  This sets an upper bound on the amount of memory
used by packets on the internal queue.

This allows the estimatation of the number of slots for an skb to be
removed (it wasn't a very good estimate anyway).  Instead, the guest
Rx thread just waits for enough free slots for a maximum sized packet.

skbs queued on the internal queue have an 'expires' time (set to the
current time plus the drain timeout).  The guest Rx thread will detect
when the skb at the head of the queue has expired and discard expired
skbs.  This sets a clear upper bound on the length of time an skb can
be queued for.  For a guest being destroyed the maximum time needed to
wait for all the packets it sent to be dropped is still the drain
timeout (10 s) since it will not be sending new packets.

Rx stall detection is reintroduced in a later commit.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David Vrabel [Wed, 22 Oct 2014 13:08:53 +0000 (14:08 +0100)]

xen-netback: make feature-rx-notify mandatory

Frontends that do not provide feature-rx-notify may stall because
netback depends on the notification from frontend to wake the guest Rx
thread (even if can_queue is false).

This could be fixed but feature-rx-notify was introduced in 2006 and I
am not aware of any frontends that do not implement this.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Richard Cochran [Wed, 22 Oct 2014 19:35:15 +0000 (21:35 +0200)]

ptp: restore the makefile for building the test program.

This patch brings back the makefile called testptp.mk which was removed
in commit adb19fb66eee (Documentation: add makefiles for more targets).

While the idea of that commit was to improve build coverage of the
examples, the new Makefile is unable to cross compile the testptp program.
In contrast, the deleted makefile was able to do this just fine.

This patch fixes the regression by restoring the original makefile.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Peter Foley <pefoley2@pefoley.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Houcheng Lin [Thu, 23 Oct 2014 08:36:08 +0000 (10:36 +0200)]

netfilter: nf_log: release skbuff on nlmsg put failure

The kernel should reserve enough room in the skb so that the DONE
message can always be appended. However, in case of e.g. new attribute
erronously not being size-accounted for, __nfulnl_send() will still
try to put next nlmsg into this full skbuf, causing the skb to be stuck
forever and blocking delivery of further messages.

Fix issue by releasing skb immediately after nlmsg_put error and
WARN() so we can track down the cause of such size mismatch.

[ fw@strlen.de: add tailroom/len info to WARN ]

Signed-off-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Florian Westphal [Thu, 23 Oct 2014 08:36:07 +0000 (10:36 +0200)]

netfilter: nfnetlink_log: fix maximum packet length logged to userspace

don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work.
The nla length includes the size of the nla struct, so anything larger
results in u16 integer overflow.

This patch is similar to
9cefbbc9c8f9abe (netfilter: nfnetlink_queue: cleanup copy_range usage).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Florian Westphal [Thu, 23 Oct 2014 08:36:06 +0000 (10:36 +0200)]

netfilter: nf_log: account for size of NLMSG_DONE attribute

We currently neither account for the nlattr size, nor do we consider
the size of the trailing NLMSG_DONE when allocating nlmsg skb.

This can result in nflog to stop working, as __nfulnl_send() re-tries
sending forever if it failed to append NLMSG_DONE (which will never
work if buffer is not large enough).

Reported-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Herbert Xu [Sat, 4 Oct 2014 14:18:02 +0000 (22:18 +0800)]

bridge: Do not compile options in br_parse_ip_options

Commit 462fb2af9788a82a534f8184abfde31574e1cfa0

bridge : Sanitize skb before it enters the IP stack

broke when IP options are actually used because it mangles the
skb as if it entered the IP stack which is wrong because the
bridge is supposed to operate below the IP stack.

Since nobody has actually requested for parsing of IP options
this patch fixes it by simply reverting to the previous approach
of ignoring all IP options, i.e., zeroing the IPCB.

If and when somebody who uses IP options and actually needs them
to be parsed by the bridge complains then we can revisit this.

Reported-by: David Newall <davidn@davidnewall.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Emmanuel Grumbach [Thu, 23 Oct 2014 05:53:21 +0000 (08:53 +0300)]

iwlwifi: pcie: fix polling in various places

iwl_poll_bit may return a strictly positive value when the
poll doesn't match on the first try.
This was caught when WoWLAN started failing upon resume
even if the poll_bit actually succeeded.

Also change a wrong print. If we reach the end of
iwl_pcie_prepare_card_hw, it means that we couldn't
get the devices.

Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>

commit | commitdiff | tree

Emmanuel Grumbach [Mon, 20 Oct 2014 05:29:55 +0000 (08:29 +0300)]

Revert "iwlwifi: mvm: treat EAPOLs like mgmt frames wrt rate"

This reverts commit aa11bbf3df026d6b1c6b528bef634fd9de7c2619.
This commit was causing connection issues and is not needed
if IWL_MVM_RS_RSSI_BASED_INIT_RATE is set to false by default.

Regardless of the issues mentioned above, this patch added the
following WARNING:

WARNING: CPU: 0 PID: 3946 at drivers/net/wireless/iwlwifi/mvm/tx.c:190 iwl_mvm_set_tx_params+0x60a/0x6f0 [iwlmvm]()
Got an HT rate for a non data frame 0x8
CPU: 0 PID: 3946 Comm: wpa_supplicant Tainted: G O 3.17.0+ #6
Hardware name: LENOVO 20ANCTO1WW/20ANCTO1WW, BIOS GLET71WW (2.25 ) 07/02/2014
0000000000000009 ffffffff814fa911 ffff8804288db8f8 ffffffff81064f52
0000000000001808 ffff8804288db948 ffff88040add8660 ffff8804291b5600
0000000000000000 ffffffff81064fb7 ffffffffa07b73d0 0000000000000020
Call Trace:
[<ffffffff814fa911>] ? dump_stack+0x41/0x51
[<ffffffff81064f52>] ? warn_slowpath_common+0x72/0x90
[<ffffffff81064fb7>] ? warn_slowpath_fmt+0x47/0x50
[<ffffffffa07a39ea>] ? iwl_mvm_set_tx_params+0x60a/0x6f0 [iwlmvm]
[<ffffffffa07a3cf8>] ? iwl_mvm_tx_skb+0x48/0x3c0 [iwlmvm]
[<ffffffffa079cb9b>] ? iwl_mvm_mac_tx+0x7b/0x180 [iwlmvm]
[<ffffffffa0746ce9>] ? __ieee80211_tx+0x2b9/0x3c0 [mac80211]
[<ffffffffa07492f3>] ? ieee80211_tx+0xb3/0x100 [mac80211]
[<ffffffffa0749c49>] ? ieee80211_subif_start_xmit+0x459/0xca0 [mac80211]
[<ffffffff814116e7>] ? dev_hard_start_xmit+0x337/0x5f0
[<ffffffff81430d46>] ? sch_direct_xmit+0x96/0x1f0
[<ffffffff81411ba3>] ? __dev_queue_xmit+0x203/0x4f0
[<ffffffff8142f670>] ? ether_setup+0x70/0x70
[<ffffffff814e96a1>] ? packet_sendmsg+0xf81/0x1110
[<ffffffff8140625c>] ? skb_free_datagram+0xc/0x40
[<ffffffff813f7538>] ? sock_sendmsg+0x88/0xc0
[<ffffffff813f7274>] ? move_addr_to_kernel.part.20+0x14/0x60
[<ffffffff811c47c2>] ? __inode_wait_for_writeback+0x62/0xb0
[<ffffffff813f7a91>] ? SYSC_sendto+0xf1/0x180
[<ffffffff813f88f9>] ? __sys_recvmsg+0x39/0x70
[<ffffffff8150066d>] ? system_call_fastpath+0x1a/0x1f
---[ end trace cc19a150d311fc63 ]---

which was reported here: https://bugzilla.kernel.org/show_bug.cgi?id=85691

CC: <stable@vger.kernel.org> [3.13+]
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>

commit | commitdiff | tree

Emmanuel Grumbach [Sun, 5 Oct 2014 06:11:14 +0000 (09:11 +0300)]

iwlwifi: dvm: drop non VO frames when flushing

When mac80211 wants to ensure that a frame is sent, it calls
the flush() callback. Until now, iwldvm implemented this by
waiting that all the frames are sent (ACKed or timeout).
In case of weak signal, this can take a significant amount
of time, delaying the next connection (in case of roaming).
Many users have reported that the flush would take too long
leading to the following error messages to be printed:

iwlwifi 0000:03:00.0: fail to flush all tx fifo queues Q 2
iwlwifi 0000:03:00.0: Current SW read_ptr 161 write_ptr 201
iwl data: 00000000: 00 00 00 00 00 00 00 00 fe ff 01 00 00 00 00 00
[snip]
iwlwifi 0000:03:00.0: FH TRBs(0) = 0x00000000
[snip]
iwlwifi 0000:03:00.0: Q 0 is active and mapped to fifo 3 ra_tid 0x0000 [9,9]
[snip]

Instead of waiting for these packets, simply drop them. This
significantly improves the responsiveness of the network.
Note that all the queues are flushed, but the VO one. This
is not typically used by the applications and it likely
contains management frames that are useful for connection
or roaming.

This bug is tracked here:
https://bugzilla.kernel.org/show_bug.cgi?id=56581

But it is duplicated in distributions' trackers.
A simple search in Ubuntu's database led to these bugs:

https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/1270808
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1305406
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1356236
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1360597
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1361809

Cc: <stable@vger.kernel.org>
Depends-on: 77be2c54c5bd ("mac80211: add vif to flush call")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>

commit | commitdiff | tree

Matti Gottlieb [Mon, 29 Sep 2014 08:46:04 +0000 (11:46 +0300)]

iwlwifi: mvm: ROC - bug fixes around time events and locking

Don't add the time event to the list. We added it several
times the same time event, which leads to an infinite loop
when walking the list.

Since we (currently) don't support more than one ROC for STA
vif at a time, enforce this and don't add the time event
to any list.

We were also missing the locking of the mutex which led to
a lockdep splat - fix that.

Signed-off-by: Matti Gottlieb <matti.gottlieb@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>

commit | commitdiff | tree

Haim Dreyfuss [Sun, 14 Sep 2014 09:40:00 +0000 (12:40 +0300)]

iwlwifi: mvm: Add tx power condition to bss_info_changed_ap_ibss

The tx power should be limited from many reasons.
currently, setting the tx power is available by the mvm only for
station interface. Adding the tx power condition to
bss_info_changed_ap_ibss make it available also for AP.

Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>

commit | commitdiff | tree

Emmanuel Grumbach [Mon, 22 Sep 2014 09:03:41 +0000 (12:03 +0300)]

iwlwifi: mvm: BT coex - fix BT prio for probe requests

The probe requests sent during scan must get BT prio 3.
Fix that.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>

commit | commitdiff | tree

Emmanuel Grumbach [Mon, 22 Sep 2014 13:12:24 +0000 (16:12 +0300)]

iwlwifi: mvm: BT Coex - update the MPLUT Boost register value

Cc: <stable@vger.kernel.org> [3.16+]
Fixes: 2adc8949efab ("iwlwifi: mvm: BT Coex - fix boost register / LUT values")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>

commit | commitdiff | tree

Liad Kaufman [Tue, 23 Sep 2014 12:15:17 +0000 (15:15 +0300)]

iwlwifi: 8000: fix string given to MODULE_FIRMWARE

I changed the string but forgot to update the fix also to
MODULE_FIRMWARE().

Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>

commit | commitdiff | tree

Emmanuel Grumbach [Tue, 23 Sep 2014 20:02:41 +0000 (23:02 +0300)]

iwlwifi: configure the LTR

The LTR is the handshake between the device and the root
complex about the latency allowed when the bus exits power
save. This configuration was missing and this led to high
latency in the link power up. The end user could experience
high latency in the network because of this.

Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>

commit | commitdiff | tree

Larry Finger [Thu, 23 Oct 2014 16:27:09 +0000 (11:27 -0500)]

rtlwifi: Add check for get_btc_status callback

Drivers that do not use the get_btc_status() callback may not define a
dummy routine. The caller needs to check before making the call.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Murilo Opsfelder Araujo <mopsfelder@gmail.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Thadeu Cascardo <cascardo@cascardo.eti.br>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

Felix Fietkau [Wed, 22 Oct 2014 16:17:35 +0000 (18:17 +0200)]

ath9k_common: always update value in ath9k_cmn_update_txpow

In some cases the limit may be the same as reg->power_limit, but the
actual value that the hardware uses is not up to date. In that case, a
wrong value for current tx power is tracked internally.
Fix this by unconditionally updating it.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

Karsten Wiese [Wed, 22 Oct 2014 13:47:34 +0000 (15:47 +0200)]

rtl8192cu: Prevent Ooops under rtl92c_set_fw_rsvdpagepkt

rtl92c_set_fw_rsvdpagepkt is used by rtl8192cu and its pci sibling rtl8192ce.
rtl_cmd_send_packet crashes when called inside rtl8192cu because it works on
memory allocated only by rtl8192ce.
Fix the crash by calling a dummy function when used in rtl8192cu.
Comparision with the realtek vendor driver makes me think, something is missing in
the dummy function.
Short test as WPA2 station show good results connected to an 802.11g basestation.
Traffic stops after few MBytes as WPA2 station connected to an 802.11n basestation.

Signed-off-by: Karsten Wiese <fzuuzf@googlemail.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

Karsten Wiese [Wed, 22 Oct 2014 13:47:33 +0000 (15:47 +0200)]

rtl8192cu: Call ieee80211_register_hw from rtl_usb_probe

In a previous patch the call to ieee80211_register_hw was moved from the
load firmware callback to the rtl_pci_probe only.
rt8192cu also uses this callback. Currently it doesnt create a wlan%d device.
Fill in the call to ieee80211_register_hw in rtl_usb_probe.

Signed-off-by: Karsten Wiese <fzuuzf@googlemail.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

Karsten Wiese [Wed, 22 Oct 2014 13:47:32 +0000 (15:47 +0200)]

rtl8192cu: Fix for rtlwifi's bluetooth coexist functionality

Initialize function pointer with a function indicating bt coexist is not there.
Prevents Ooops.

Signed-off-by: Karsten Wiese <fzuuzf@googlemail.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

Felix Fietkau [Wed, 22 Oct 2014 13:27:53 +0000 (15:27 +0200)]

ath: use CTL region from cfg80211 if unset in EEPROM

Many AP devices do not have the proper regulatory domain programmed in
EEPROM. Instead they expect the software to set the appropriate region.
For these devices, the country code defaults to US, and the driver uses
the US CTL tables as well.
On devices bought in Europe this can lead to tx power being set too high
on the band edges, even if the cfg80211 regdomain is set correctly.
Fix this issue by taking into account the DFS region, but only when the
EEPROM regdomain is set to default.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

Larry Finger [Tue, 21 Oct 2014 15:52:51 +0000 (10:52 -0500)]

rtlwifi: rtl8821ae: Fix possible array overrun

The kbuild test robot reported a possible array overrun. The affected code
checks for overruns, but fails to take the steps necessary to fix them.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

Sujith Manoharan [Tue, 21 Oct 2014 13:53:02 +0000 (19:23 +0530)]

ath9k: Enable HW queue control only for MCC

Enabling HW queue control for normal (non-mcc) mode
causes problems with queue management, resulting
in traffic stall. Since it is mainly required for
fairness in MCC mode, disable it for the general case.

Bug: https://dev.openwrt.org/ticket/18164

Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

Kees Cook [Mon, 20 Oct 2014 21:57:08 +0000 (14:57 -0700)]

rtlwifi: prevent format string usage from leaking

Use "%s" in the workqueue allocation to make sure the rtl_hal_cfg name
can never accidentally leak information via a format string.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

Rafał Miłecki [Wed, 15 Oct 2014 05:51:44 +0000 (07:51 +0200)]

bcma: add another PCI ID of device with BCM43228

It was found attached to the BCM47081A0 SoC. Log:
bcma: bus0: Found chip with id 43228, rev 0x00 and package 0x08

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

Rickard Strandqvist [Sun, 12 Oct 2014 11:42:14 +0000 (13:42 +0200)]

brcmfmac: dhd_sdio.c: Cleaning up missing null-terminate in conjunction with strncpy

Replacing strncpy with strlcpy to avoid strings that lacks null terminate.
And changed from using strncat to strlcat to simplify code.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

Larry Finger [Sat, 11 Oct 2014 17:59:53 +0000 (12:59 -0500)]

rtlwifi: rtl8192ee: Prevent log spamming for switch statements

The driver logs a message when the default branch of switch statements are
taken. Such information is useful when debugging, but these log items should
not be seen for standard usage.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

Hauke Mehrtens [Thu, 9 Oct 2014 21:39:41 +0000 (23:39 +0200)]

bcma: fix build when CONFIG_OF_ADDRESS is not set

Commit 2101e533f41a ("bcma: register bcma as device tree driver")
introduces a hard dependency on OF_ADDRESS into the bcma driver.
OF_ADDRESS is specifically disabled for the sparc architecture.
This results in the following error when building sparc64:allmodconfig.

drivers/bcma/main.c: In function 'bcma_of_find_child_device':
drivers/bcma/main.c:150:3: error: implicit declaration of function 'of_translate_address'

Fixes: 2101e533f41a ("bcma: register bcma as device tree driver")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit | commitdiff | tree

Haiyang Zhang [Wed, 22 Oct 2014 20:47:18 +0000 (13:47 -0700)]

hyperv: Fix the total_data_buflen in send path

total_data_buflen is used by netvsc_send() to decide if a packet can be put
into send buffer. It should also include the size of RNDIS message before the
Ethernet frame. Otherwise, a messge with total size bigger than send_section_size
may be copied into the send buffer, and cause data corruption.

[Request to include this patch to the Stable branches]

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 22 Oct 2014 21:50:39 +0000 (17:50 -0400)]

Merge branch 'amd-xgbe'

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver fixes 2014-10-22

The following series of patches includes fixes to the driver.

- Properly handle feature changes via ethtool by using correctly sized
variables
- Perform proper napi packet counting and budget checking

This patch series is based on net.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lendacky, Thomas [Wed, 22 Oct 2014 16:26:17 +0000 (11:26 -0500)]

amd-xgbe: Fix napi Rx budget accounting

Currently the amd-xgbe driver increments the packets processed counter
each time a descriptor is processed. Since a packet can be represented
by more than one descriptor incrementing the counter in this way is not
appropriate. Also, since multiple descriptors cause the budget check
to be short circuited, sometimes the returned value from the poll
function would be larger than the budget value resulting in a WARN_ONCE
being triggered.

Update the polling logic to properly account for the number of packets
processed and exit when the budget value is reached.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lendacky, Thomas [Wed, 22 Oct 2014 16:26:11 +0000 (11:26 -0500)]

amd-xgbe: Properly handle feature changes via ethtool

The ndo_set_features callback function was improperly using an unsigned
int to save the current feature value for features such as NETIF_F_RXCSUM.
Since that feature is in the upper 32 bits of a 64 bit variable the
result was always 0 making it not possible to actually turn off the
hardware RX checksum support. Change the unsigned int type to the
netdev_features_t type in order to properly capture the current value
and perform the proper operation.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Philipp Zabel [Wed, 22 Oct 2014 14:34:35 +0000 (16:34 +0200)]

net: fec: ptp: fix NULL pointer dereference if ptp_clock is not set

Since commit 278d24047891 (net: fec: ptp: Enable PPS output based on ptp clock)
fec_enet_interrupt calls fec_ptp_check_pps_event unconditionally, which calls
into ptp_clock_event. If fep->ptp_clock is NULL, ptp_clock_event tries to
dereference the NULL pointer.
Since on i.MX53 fep->bufdesc_ex is not set, fec_ptp_init is never called,
and fep->ptp_clock is NULL, which reliably causes a kernel panic.

This patch adds a check for fep->ptp_clock == NULL in fec_enet_interrupt.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sathya Perla [Wed, 22 Oct 2014 16:12:01 +0000 (21:42 +0530)]

net: fix saving TX flow hash in sock for outgoing connections

The commit "net: Save TX flow hash in sock and set in skbuf on xmit"
introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate
and record flow hash(sk_txhash) in the socket structure. sk_txhash is used
to set skb->hash which is used to spread flows across multiple TXQs.

But, the above routines are invoked before the source port of the connection
is created. Because of this all outgoing connections that just differ in the
source port get hashed into the same TXQ.

This patch fixes this problem for IPv4/6 by invoking the the above routines
after the source port is available for the socket.

Fixes: b73c3d0e4("net: Save TX flow hash in sock and set in skbuf on xmit")
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Li RongQing [Wed, 22 Oct 2014 09:09:53 +0000 (17:09 +0800)]

xfrm6: fix a potential use after free in xfrm6_policy.c

pskb_may_pull() maybe change skb->data and make nh and exthdr pointer
oboslete, so recompute the nd and exthdr

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

LEROY Christophe [Wed, 22 Oct 2014 07:05:47 +0000 (09:05 +0200)]

net: fs_enet: set back promiscuity mode after restart

After interface restart (eg: after link disconnection/reconnection), the bridge
function doesn't work anymore. This is due to the promiscuous mode being cleared
by the restart.

The mac-fcc already includes code to set the promiscuous mode back during the restart.
This patch adds the same handling to mac-fec and mac-scc.

Tested with bridge function on MPC885 with FEC.

Reported-by: Germain Montoies <germain.montoies@c-s.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Karl Beldan [Tue, 21 Oct 2014 14:06:05 +0000 (16:06 +0200)]

net: tso: fix unaligned access to crafted TCP header in helper API

The crafted header start address is from a driver supplied buffer, which
one can reasonably expect to be aligned on a 4-bytes boundary.
However ATM the TSO helper API is only used by ethernet drivers and
the tcp header will then be aligned to a 2-bytes only boundary from the
header start address.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jon Cooper [Tue, 21 Oct 2014 13:50:29 +0000 (14:50 +0100)]

sfc: remove incorrect EFX_BUG_ON_PARANOID check

write_count and insert_count can wrap around, making > check invalid.

Fixes: 70b33fb0ddec827cbbd14cdc664fc27b2ef4a6b6 ("sfc: add support for
skb->xmit_more").

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sabrina Dubroca [Tue, 21 Oct 2014 09:08:21 +0000 (11:08 +0200)]

netfilter: nf_tables: check for NULL in nf_tables_newchain pcpu stats allocation

alloc_percpu returns NULL on failure, not a negative error code.

Fixes: ff3cd7b3c922 ("netfilter: nf_tables: refactor chain statistic routines")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Dan Carpenter [Tue, 21 Oct 2014 08:28:12 +0000 (11:28 +0300)]

netfilter: ipset: off by one in ip_set_nfnl_get_byindex()

The ->ip_set_list[] array is initialized in ip_set_net_init() and it
has ->ip_set_max elements so this check should be >= instead of >
otherwise we are off by one.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Marcelo Leitner [Mon, 13 Oct 2014 16:09:28 +0000 (13:09 -0300)]

netfilter: nf_conntrack: allow server to become a client in TW handling

When a port that was used to listen for inbound connections gets closed
and reused for outgoing connections (like rsh ends up doing for stderr
flow), current we may reject the SYN/ACK packet for the new connection
because tcp_conntracks states forbirds a port to become a client while
there is still a TIME_WAIT entry in there for it.

As TCP may expire the TIME_WAIT socket in 60s and conntrack's timeout
for it is 120s, there is a ~60s window that the application can end up
opening a port that conntrack will end up blocking.

This patch fixes this by simply allowing such state transition: if we
see a SYN, in TIME_WAIT state, on REPLY direction, move it to sSS. Note
that the rest of the code already handles this situation, more
specificly in tcp_packet(), first switch clause.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Sabrina Dubroca [Tue, 21 Oct 2014 09:23:30 +0000 (11:23 +0200)]

net: sched: initialize bstats syncp

Use netdev_alloc_pcpu_stats to allocate percpu stats and initialize syncp.

Fixes: 22e0f8b9322c "net: sched: make bstats per cpu and estimator RCU safe"
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alexei Starovoitov [Mon, 20 Oct 2014 21:54:57 +0000 (14:54 -0700)]

bpf: fix bug in eBPF verifier

while comparing for verifier state equivalency the comparison
was missing a check for uninitialized register.
Make sure it does so and add a testcase.

Fixes: f1bca824dabb ("bpf: add search pruning optimization to verifier")
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Thomas Graf [Tue, 21 Oct 2014 20:05:38 +0000 (22:05 +0200)]

netlink: Re-add locking to netlink_lookup() and seq walker

The synchronize_rcu() in netlink_release() introduces unacceptable
latency. Reintroduce minimal lookup so we can drop the
synchronize_rcu() until socket destruction has been RCUfied.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ying Xue [Mon, 20 Oct 2014 06:46:35 +0000 (14:46 +0800)]

tipc: fix lockdep warning when intra-node messages are delivered

When running tipcTC&tipcTS test suite, below lockdep unsafe locking
scenario is reported:

[ 1109.997854]
[ 1109.997988] =================================
[ 1109.998290] [ INFO: inconsistent lock state ]
[ 1109.998575] 3.17.0-rc1+ #113 Not tainted
[ 1109.998762] ---------------------------------
[ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 1109.998762]  (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762] {SOFTIRQ-ON-W} state was registered at:
[ 1109.998762]   [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80
[ 1109.998762]   [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
[ 1109.998762]   [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
[ 1109.998762]   [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]   [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc]
[ 1109.998762]   [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc]
[ 1109.998762]   [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc]
[ 1109.998762]   [<ffffffff817676ee>] SYSC_connect+0xae/0xc0
[ 1109.998762]   [<ffffffff81767b7e>] SyS_connect+0xe/0x10
[ 1109.998762]   [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200
[ 1109.998762]   [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f
[ 1109.998762] irq event stamp: 241060
[ 1109.998762] hardirqs last  enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0
[ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0
[ 1109.998762] softirqs last  enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50
[ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0
[ 1109.998762]
[ 1109.998762] other info that might help us debug this:
[ 1109.998762]  Possible unsafe locking scenario:
[ 1109.998762]
[ 1109.998762]        CPU0
[ 1109.998762]        ----
[ 1109.998762]   lock(slock-AF_TIPC);
[ 1109.998762]   <Interrupt>
[ 1109.998762]     lock(slock-AF_TIPC);
[ 1109.998762]
[ 1109.998762]  *** DEADLOCK ***
[ 1109.998762]
[ 1109.998762] 2 locks held by swapper/7/0:
[ 1109.998762]  #0:  (rcu_read_lock){......}, at: [<ffffffff81782dc9>] __netif_receive_skb_core+0x69/0xb70
[ 1109.998762]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0001c90>] tipc_l2_rcv_msg+0x40/0x260 [tipc]
[ 1109.998762]
[ 1109.998762] stack backtrace:
[ 1109.998762] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.17.0-rc1+ #113
[ 1109.998762] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 1109.998762]  ffffffff82745830 ffff880016c03828 ffffffff81a209eb 0000000000000007
[ 1109.998762]  ffff880017b3cac0 ffff880016c03888 ffffffff81a1c5ef 0000000000000001
[ 1109.998762]  ffff880000000001 ffff880000000000 ffffffff81012d4f 0000000000000000
[ 1109.998762] Call Trace:
[ 1109.998762]  <IRQ>  [<ffffffff81a209eb>] dump_stack+0x4e/0x68
[ 1109.998762]  [<ffffffff81a1c5ef>] print_usage_bug+0x1f1/0x202
[ 1109.998762]  [<ffffffff81012d4f>] ? save_stack_trace+0x2f/0x50
[ 1109.998762]  [<ffffffff810a406c>] mark_lock+0x28c/0x2f0
[ 1109.998762]  [<ffffffff810a3440>] ? print_irq_inversion_bug.part.46+0x1f0/0x1f0
[ 1109.998762]  [<ffffffff810a467d>] __lock_acquire+0x5ad/0x1d80
[ 1109.998762]  [<ffffffff810a70dd>] ? trace_hardirqs_on+0xd/0x10
[ 1109.998762]  [<ffffffff8108ace8>] ? sched_clock_cpu+0x98/0xc0
[ 1109.998762]  [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30
[ 1109.998762]  [<ffffffff810a10dc>] ? lock_release_holdtime.part.29+0x1c/0x1a0
[ 1109.998762]  [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90
[ 1109.998762]  [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc]
[ 1109.998762]  [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
[ 1109.998762]  [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffff810a6fb6>] ? trace_hardirqs_on_caller+0xa6/0x1c0
[ 1109.998762]  [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
[ 1109.998762]  [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc]
[ 1109.998762]  [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffffa00076bd>] tipc_rcv+0x5ed/0x960 [tipc]
[ 1109.998762]  [<ffffffffa0001d1c>] tipc_l2_rcv_msg+0xcc/0x260 [tipc]
[ 1109.998762]  [<ffffffffa0001c90>] ? tipc_l2_rcv_msg+0x40/0x260 [tipc]
[ 1109.998762]  [<ffffffff81783345>] __netif_receive_skb_core+0x5e5/0xb70
[ 1109.998762]  [<ffffffff81782dc9>] ? __netif_receive_skb_core+0x69/0xb70
[ 1109.998762]  [<ffffffff81784eb9>] ? dev_gro_receive+0x259/0x4e0
[ 1109.998762]  [<ffffffff817838f6>] __netif_receive_skb+0x26/0x70
[ 1109.998762]  [<ffffffff81783acd>] netif_receive_skb_internal+0x2d/0x1f0
[ 1109.998762]  [<ffffffff81785518>] napi_gro_receive+0xd8/0x240
[ 1109.998762]  [<ffffffff815bf854>] e1000_clean_rx_irq+0x2c4/0x530
[ 1109.998762]  [<ffffffff815c1a46>] e1000_clean+0x266/0x9c0
[ 1109.998762]  [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30
[ 1109.998762]  [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90
[ 1109.998762]  [<ffffffff817842b1>] net_rx_action+0x141/0x310
[ 1109.998762]  [<ffffffff810bd710>] ? handle_fasteoi_irq+0xe0/0x150
[ 1109.998762]  [<ffffffff81059fa6>] __do_softirq+0x116/0x4d0
[ 1109.998762]  [<ffffffff8105a626>] irq_exit+0x96/0xc0
[ 1109.998762]  [<ffffffff81a30d07>] do_IRQ+0x67/0x110
[ 1109.998762]  [<ffffffff81a2ee2f>] common_interrupt+0x6f/0x6f
[ 1109.998762]  <EOI>  [<ffffffff8100d2b7>] ? default_idle+0x37/0x250
[ 1109.998762]  [<ffffffff8100d2b5>] ? default_idle+0x35/0x250
[ 1109.998762]  [<ffffffff8100dd1f>] arch_cpu_idle+0xf/0x20
[ 1109.998762]  [<ffffffff810999fd>] cpu_startup_entry+0x27d/0x4d0
[ 1109.998762]  [<ffffffff81034c78>] start_secondary+0x188/0x1f0

When intra-node messages are delivered from one process to another
process, tipc_link_xmit() doesn't disable BH before it directly calls
tipc_sk_rcv() on process context to forward messages to destination
socket. Meanwhile, if messages delivered by remote node arrive at the
node and their destinations are also the same socket, tipc_sk_rcv()
running on process context might be preempted by tipc_sk_rcv() running
BH context. As a result, the latter cannot obtain the socket lock as
the lock was obtained by the former, however, the former has no chance
to be run as the latter is owning the CPU now, so headlock happens. To
avoid it, BH should be always disabled in tipc_sk_rcv().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ying Xue [Mon, 20 Oct 2014 06:44:25 +0000 (14:44 +0800)]

tipc: fix a potential deadlock

Locking dependency detected below possible unsafe locking scenario:

           CPU0                          CPU1
T0:  tipc_named_rcv()                tipc_rcv()
T1:  [grab nametble write lock]*     [grab node lock]*
T2:  tipc_update_nametbl()           tipc_node_link_up()
T3:  tipc_nodesub_subscribe()        tipc_nametbl_publish()
T4:  [grab node lock]*               [grab nametble write lock]*

The opposite order of holding nametbl write lock and node lock on
above two different paths may result in a deadlock. If we move the
the updating of the name table after link state named out of node
lock, the reverse order of holding locks will be eliminated, and
as a result, the deadlock risk.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom