Veaceslav Falico [Fri, 28 Feb 2014 11:39:19 +0000 (12:39 +0100)]
bonding: send arp requests even if there's no route to them
Currently we're only sending arp requests if we have a route to the target
(and, thus, can find out the source ip address).
There are some use cases, however, where we don't want/need to set an ip
address (or set up a specific route) for bonding to use arp monitoring *for
traffic generation*. We can easily send arp probes (arp requests with src
ip == 0) to generate arp broadcast responses from the target ip and use
them for determining if the target is up.
This, obviously, won't work with arp validation - because we don't have the
ip address set and, thus, will filter out the responses. So in that case -
print a warning.
CC: François CACHEREUL <f.cachereul@alphalink.fr>
CC: Zhenjie Chen <zhchen@redhat.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 28 Feb 2014 22:05:32 +0000 (17:05 -0500)]
Merge branch '6lowpan'
Alexander Aring says:
====================
6lowpan: reimplementation of fragmentation handling
this patch series reimplementation the fragmentation handling of 6lowpan
accroding to rfc4944 [1].
The first big note is, that the current fragmentation behaviour isn't rfc
complaint. The main issue is a wrong datagram_size value which needs to be:
datagram_size = ipv6_payload + ipv6 header + (maybe compressed transport header,
currently only udp is supported)
but the current datagram_size value is calculated as:
datagram_size = ipv6_payload
Fragmentation work in a linux<->linux communication only.
Why reimplementation?
I reimplemted the reassembly side only. The current behaviour is to allocate a
skb with the reassembled size and hold all fragments in a list, protected by a
spinlock. After we received all fragments (detected by the sum of all fragments,
it begins to place all fragments into the allocated skb).
This reassembly implementation has some race condition. Additional I make it more
rfc complaint. The current implementation match on the tag value inside the frag
header only, but rfc4944 says we need to match on dst addr(mac), src addr(mac),
tag value, datagram_size value. [2]
The new reassembly handling use the inet_frag api (I mean the callback interface
of ipv6 and ipv4 reassembly). I looked into ipv6 and wanted to see how ipv6 is
dealing with reassembly, so I based my code on this implementation.
On the sending side to generate the fragments I improved the current code to use
the nearest 8 divided payload. (We can do that, because the mac layer has a
dynamic size, so it depends on mac_header how big we can do the payload).
Of course I fix also the reassembly/sending side to be rfc complaint now.
Regards
Alexander Aring
[1] http://tools.ietf.org/html/rfc4944
[2] http://tools.ietf.org/html/rfc4944#section-5.3
changes since v2:
- rework checkpatch code style issue patch.
Merge two pr_debugs into one pr_debug.
changes since v3:
- rename 6lowpan.ko to 6lowpan_rtnl.c in commit msg of patch 5/8.
changes since v4:
- Add a new patch 2/8 to introduce lowpan_uncompress_size function. Also
improving this function a little bit.
- Add a new patch 4/8 to change tag value to __be16.
- use skb_header_reset function on FRAG1 only, which should have the
lowpan header. See lowpan_get_frag_info function. (slightly improving
of fragmentation header parsing).
- changes types of variables to u16 in lowpan_skb_fragmentation.
- use lowpan_uncompress_size instead of storing necessary information
in skb control block, this can be destroyed after dev_queue_xmit call.
Thanks David for this hint.
- remove Tested-by: Martin Townsend <martin.townsend@xsilon.com>, because
too many funcionality change.
changes since v5:
- handle lowpan_addr_mode_size with lookup table.
changes since v6:
- remove unnecessary parameter in lowpan_frag_queue.
- fix commit message in patch 8/8 which included a describtion of adding the
lownpan_uncompress_size function. This was splitted in a seperate patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Fri, 28 Feb 2014 06:32:50 +0000 (07:32 +0100)]
6lowpan: handling 6lowpan fragmentation via inet_frag api
This patch drops the current way of 6lowpan fragmentation on receiving
side and replace it with a implementation which use the inet_frag api.
The old fragmentation handling has some race conditions and isn't
rfc4944 compatible. Also adding support to match fragments on
destination address, source address, tag value and datagram_size
which is missing in the current implementation.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Fri, 28 Feb 2014 06:32:49 +0000 (07:32 +0100)]
net: ns: add ieee802154_6lowpan namespace
This patch adds necessary ieee802154 6lowpan namespace to provide the
inet_frag information. This is a initial support for handling 6lowpan
fragmentation with the inet_frag api.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Fri, 28 Feb 2014 06:32:48 +0000 (07:32 +0100)]
6lowpan: fix some checkpatch issues
Detected with:
./scripts/checkpatch.pl --strict -f net/ieee802154/6lowpan_rtnl.c
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Fri, 28 Feb 2014 06:32:47 +0000 (07:32 +0100)]
6lowpan: move 6lowpan.c to 6lowpan_rtnl.c
We have a 6lowpan.c file and 6lowpan.ko file. To avoid confusing we
should move 6lowpan.c to 6lowpan_rtnl.c. Then we can support multiple
source files for 6lowpan module.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Fri, 28 Feb 2014 06:32:46 +0000 (07:32 +0100)]
6lowpan: change tag type to __be16
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Fri, 28 Feb 2014 06:32:45 +0000 (07:32 +0100)]
6lowpan: fix fragmentation on sending side
This patch fix the fragmentation on sending side according to rfc4944.
Also add improvement to use the full payload of a PDU which calculate
the nearest divided to 8 payload length for the fragmentation datagram
size attribute.
The main issue is that the datagram size of fragmentation header use the
ipv6 payload length, but rfc4944 says it's the ipv6 payload length inclusive
network header size (and transport header size if compressed).
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Fri, 28 Feb 2014 06:32:44 +0000 (07:32 +0100)]
6lowpan: add uncompress header size function
This patch add a lookup function for uncompressed 6LoWPAN header
size. This is needed to estimate the real size after uncompress the
6LoWPAN header.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Fri, 28 Feb 2014 06:32:43 +0000 (07:32 +0100)]
6lowpan: add frag information struct
This patch adds a 6lowpan fragmentation struct into cb of skb which
is necessary to hold fragmentation information.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 28 Feb 2014 05:48:16 +0000 (14:48 +0900)]
net: w5100: Use devm_ioremap_resource()
Use devm_ioremap_resource() in order to make the code simpler.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 28 Feb 2014 05:47:47 +0000 (14:47 +0900)]
net: w5300: Use devm_ioremap_resource()
Use devm_ioremap_resource() in order to make the code simpler.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 28 Feb 2014 01:22:06 +0000 (02:22 +0100)]
packet: allow to transmit +4 byte in TX_RING slot for VLAN case
Commit
57f89bfa2140 ("network: Allow af_packet to transmit +4 bytes
for VLAN packets.") added the possibility for non-mmaped frames to
send extra 4 byte for VLAN header so the MTU increases from 1500 to
1504 byte, for example.
Commit
cbd89acb9eb2 ("af_packet: fix for sending VLAN frames via
packet_mmap") attempted to fix that for the mmap part but was
reverted as it caused regressions while using eth_type_trans()
on output path.
Lets just act analogous to
57f89bfa2140 and add a similar logic
to TX_RING. We presume size_max as overcharged with +4 bytes and
later on after skb has been built by tpacket_fill_skb() check
for ETH_P_8021Q header on packets larger than normal MTU. Can
be easily reproduced with a slightly modified trafgen in mmap(2)
mode, test cases:
{ fill(0xff, 12) const16(0x8100) fill(0xff, <1504|1505>) }
{ fill(0xff, 12) const16(0x0806) fill(0xff, <1500|1501>) }
Note that we need to do the test right after tpacket_fill_skb()
as sockets can have PACKET_LOSS set where we would not fail but
instead just continue to traverse the ring.
Reported-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Ben Greear <greearb@candelatech.com>
Cc: Phil Sutter <phil@nwl.cc>
Tested-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 28 Feb 2014 17:42:13 +0000 (12:42 -0500)]
Merge branch 'intel-next'
Aaron Brown says:
====================
This series contains updates to ixgbe and ixgbevf.
Don provides an update to change a hard coded timeout interval to
a system-wide timeout one, collects AUTOC register functions into
one place and fixes some firmware bit handling.
Emil resolves a tx handling error introduced in a recent commit and
adds check for CHECKSUM_PARTIAL to avoid an skb_is_gso check
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Tantilov [Fri, 28 Feb 2014 04:32:45 +0000 (20:32 -0800)]
ixgbevf: add check for CHECKSUM_PARTIAL when doing TSO
This patch adds check for CHECKSUM_PARTIAL to avoid the skb_is_gso check
in ixgbevf_tso(). It should reduce overhead for workloads that are not using
TSO or checksum offloads. It is the same as in ixgbe.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Tantilov [Fri, 28 Feb 2014 04:32:44 +0000 (20:32 -0800)]
ixgbevf: fix handling of tx checksumming
This patch resolves an issue introduced by:
commit
7ad1a093519e37fb673579819bf6af122641c397
ixgbevf: make the first tx_buffer a repository for most of the skb info
Incorrect check for the result of ixgbevf_tso() can lead to calling
ixgbevf_tx_csum() which can spawn 2 context descriptors and result in
performance degradation and/or corrupted packets.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Fri, 28 Feb 2014 04:32:43 +0000 (20:32 -0800)]
ixgbe: Add check for FW veto bit
The driver will now honor the MNG FW veto bit in blocking link resets.
This patch will affect x520 and x540 systems.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Fri, 28 Feb 2014 04:32:42 +0000 (20:32 -0800)]
ixgbe: fix bit toggled for 82599 reset fix.
The current code doesn't toggle the correct bit to reset the data pipeline
on Restart_AN assertion. This patch corrects that.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Fri, 28 Feb 2014 04:32:41 +0000 (20:32 -0800)]
ixgbe: collect all 82599 AUTOC code in one function
When reading or writing to the AUTOC register on 82599 devices we need to
preform various operations that aren't needed for other MAC types. This
patch will collect all of that code into one place to minimize MAC checks
in common code paths.
While doing this I also clean up some cases where we weren't holding the
SW/FW semaphore during a read/modify/write of AUTOC.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Fri, 28 Feb 2014 04:32:40 +0000 (20:32 -0800)]
ixgbe: fix to use correct timeout interval for memory read completion
Currently we were just always polling for a hard coded 80 ms and not
respecting the system-wide timeout interval. Since up until now all
devices have been tested with this 80ms value we continue to use this
value as a hard minimum.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 27 Feb 2014 13:20:29 +0000 (14:20 +0100)]
ipv6: addrconf: silence sparse endianness warnings
Avoid the following sparse __CHECK_ENDIAN__ warnings:
include/net/addrconf.h:318:25: warning: restricted __be64 degrades to integer
include/net/addrconf.h:318:70: warning: restricted __be64 degrades to integer
include/net/addrconf.h:330:25: warning: restricted __be64 degrades to integer
include/net/addrconf.h:330:70: warning: restricted __be64 degrades to integer
include/net/addrconf.h:347:25: warning: restricted __be64 degrades to integer
include/net/addrconf.h:348:26: warning: restricted __be64 degrades to integer
include/net/addrconf.h:349:18: warning: restricted __be64 degrades to integer
The warnings are false but they make it harder to spot real
bugs.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Duan Jiong [Thu, 27 Feb 2014 09:03:03 +0000 (17:03 +0800)]
neigh: directly goto out after setting nud_state to NUD_FAILED
Because those following if conditions will not be matched.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Feb 2014 21:31:54 +0000 (16:31 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
This is the rework of the IPsec virtual tunnel interface
for ipv4 to support inter address family tunneling and
namespace crossing. The only change to the last RFC version
is a compile fix for an odd configuration where CONFIG_XFRM
is set but CONFIG_INET is not set.
1) Add and use a IPsec protocol multiplexer.
2) Add xfrm_tunnel_skb_cb to the skb common buffer
to store a receive callback there.
3) Make vti work with i_key set by not including the i_key
when comupting the hash for the tunnel lookup in case of
vti tunnels.
4) Update ip_vti to use it's own receive hook.
5) Remove xfrm_tunnel_notifier, this is replaced by the IPsec
protocol multiplexer.
6) We need to be protocol family indepenent, so use the on xfrm_lookup
returned dst_entry instead of the ipv4 rtable in vti_tunnel_xmit().
7) Add support for inter address family tunneling.
8) Check if the tunnel endpoints of the xfrm state and the vti interface
are matching and return an error otherwise.
8) Enable namespace crossing tor vti devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Feb 2014 20:59:13 +0000 (15:59 -0500)]
Merge branch 'kdoc'
Luis R. Rodriguez says:
====================
net: start kdoc'ifying net_device
While working on extending some functionality I felt restricted
with the amount of documentation I can add. Part of this is that
the existing style on the header files don't let me be verbose.
This starts addressing that by using kdoc for the net_device
flags, and as Ben noted, the priv_flags can be moved out from
UAPI.
Luis R. Rodriguez (2):
net: kdoc struct net_device flags and priv_flags
net: move net_device priv_flags out from UAPI
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Luis R. Rodriguez [Wed, 26 Feb 2014 01:15:13 +0000 (17:15 -0800)]
net: move net_device priv_flags out from UAPI
These are private to userspace, and they're unstable
anyway and can be shuffled at will (see
080e4130b1fb)
so any userspace application relying on them is on crack.
Test compiled with allyesconfig.
mcgrof@drvbp1 /pub/mem/mcgrof/net-next (git::master)$ make allyesconfig
mcgrof@drvbp1 /pub/mem/mcgrof/net-next (git::master)$ time make -j 20
...
BUILD arch/x86/boot/bzImage
Setup is 16992 bytes (padded to 17408 bytes).
System is 56153 kB
CRC
721d2751
Kernel: arch/x86/boot/bzImage is ready (#1)
real 19m35.744s
user 280m37.984s
sys 27m54.104s
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luis R. Rodriguez [Wed, 26 Feb 2014 01:15:12 +0000 (17:15 -0800)]
net: kdoc struct net_device flags and priv_flags
We have documentation for these flags but they're scattered
all over the place. #defines don't allow documentation to be
written easily so to help to start bringing some documentation
together use the enums kdoc practice but keep the defines to
allow userspace to be able to #ifdef them.
I've verified the same values are assigned before and after
with a simple userspace test program [0] and checksumming the
output.
[0] http://drvbp1.linux-foundation.org/~mcgrof/kdoc/netdev_flags/
mcgrof@gnat ~/tmp $ ./check-flags | sha1sum
0ec5b6b1840aa3bb9ce464e61c564820871c92c3 -
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Thu, 27 Feb 2014 18:51:54 +0000 (19:51 +0100)]
atm: nicstar: remove interruptible_sleep_on_timeout
We are trying to finally kill off interruptible_sleep_on_timeout.
the two uses in the nicstar driver can be trivially replaced
with wait_event_interruptible_lock_irq_timeout, which prevents the
wake-up race and is able to check the buffer state with scq->lock
held.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 26 Feb 2014 22:02:48 +0000 (14:02 -0800)]
tcp: switch rtt estimations to usec resolution
Upcoming congestion controls for TCP require usec resolution for RTT
estimations. Millisecond resolution is simply not enough these days.
FQ/pacing in DC environments also require this change for finer control
and removal of bimodal behavior due to the current hack in
tcp_update_pacing_rate() for 'small rtt'
TCP_CONG_RTT_STAMP is no longer needed.
As Julian Anastasov pointed out, we need to keep user compatibility :
tcp_metrics used to export RTT and RTTVAR in msec resolution,
so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
to use the new attributes if provided by the kernel.
In this example ss command displays a srtt of 32 usecs (10Gbit link)
lpk51:~# ./ss -i dst lpk52
Netid State Recv-Q Send-Q Local Address:Port Peer
Address:Port
tcp ESTAB 0 1 10.246.11.51:42959
10.246.11.52:64614
cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448
cwnd:10 send
3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559
Updated iproute2 ip command displays :
lpk51:~# ./ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source
10.246.11.51
Old binary displays :
lpk51:~# ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
10.246.11.51
With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Larry Brakmo <brakmo@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 26 Feb 2014 22:02:11 +0000 (14:02 -0800)]
net: add skb_mstamp infrastructure
ktime_get() is too expensive on some cases, and we'd like to get
usec resolution timestamps in TCP stack.
This patch adds a light weight facility using a combination of
local_clock() and jiffies samples.
Instead of :
u64 t0, t1;
t0 = ktime_get();
// stuff
t1 = ktime_get();
delta_us = ktime_us_delta(t1, t0);
use :
struct skb_mstamp t0, t1;
skb_mstamp_get(&t0);
// stuff
skb_mstamp_get(&t1);
delta_us = skb_mstamp_us_delta(&t1, &t0);
Note : local_clock() might have a (bounded) drift between cpus.
Do not use this infra in place of ktime_get() without understanding the
issues.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Larry Brakmo <brakmo@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Dooks [Wed, 26 Feb 2014 11:48:00 +0000 (11:48 +0000)]
phy: micrel: add of configuration for LED mode
Add support for the led-mode property for the following PHYs
which have a single LED mode configuration value.
KSZ8001 and KSZ8041 which both use register 0x1e bits 15,14 and
KSZ8021, KSZ8031 and KSZ8051 which use register 0x1f bits 5,4
to control the LED configuration.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 26 Feb 2014 11:01:55 +0000 (12:01 +0100)]
isdn: fix multiple sleep_on races
The isdn core code uses a couple of wait queues with
interruptible_sleep_on, which is racy and about to get
removed from the kernel. Fortunately, we know for each case
what we are waiting for, so they can all be converted to
the better wait_event_interruptible interface.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 26 Feb 2014 11:01:54 +0000 (12:01 +0100)]
isdn: divert, hysdn: fix interruptible_sleep_on race
These two drivers use identical code for their procfs status
file handling, which contains a small race against status
data becoming available while reading the file.
This uses wait_event_interruptible instead to fix this
particular race and eventually get rid of all sleep_on
instances. There seems to be another race involving
multiple concurrent readers of the same procfs file, which
I don't try to fix here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 26 Feb 2014 11:01:53 +0000 (12:01 +0100)]
isdn: hisax/elsa: fix sleep_on race in elsa FSM
The state machine code in the elsa driver uses interruptible_sleep_on
to wait for state changes, which is racy. A closer look at the possible
states reveals that it is always used to wait for getting back into
ARCOFI_NOP, so we can use wait_event_interruptible instead.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 26 Feb 2014 11:01:52 +0000 (12:01 +0100)]
isdn: pcbit: fix interruptible_sleep_on race
interruptible_sleep_on is racy and going away. In case of pcbit,
the driver would run into a timeout if the card is initialized
before we start waiting for it. This uses wait_event to fix the
race. In order to do this, the state machine handling for the
timeout case has to get trivially reorganized so we actually know
whether the timeout has occorred or not.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 26 Feb 2014 11:01:51 +0000 (12:01 +0100)]
atm: firestream: fix interruptible_sleep_on race
interruptible_sleep_on is racy and going away. This replaces the one use
in the firestream driver with the appropriate wait_event_interruptible
variant.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
Cc: linux-atm-general@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 26 Feb 2014 20:55:53 +0000 (15:55 -0500)]
Merge branch 'intel-next'
Aaron Brown says:
====================
Intel Wired LAN Driver Updates
This series contains updates to ixgbe, igb and documentation. The
first four have been sent up as part of other series where 1 or more
in the series were rejected and either dropped or still being worked
on for reasons unrelated to these patches.
Don makes recovery from a HW ECC error just schedule a reset as it turns
out the previous behaviour of forcing the user to reload is not necessary.
Mark adds WoL support to port 0 of a new device. Jacob removes a magic
number from the ptp_caps.name and updates the SubmittingPatches
documentation with details on the Fixed: tag. And Carolyn updates igb
files to remove the FSF physical mail address.
[ DaveM Note: SubmittingPatches change omitted, will go via LKML ]
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Carolyn Wyborny [Wed, 26 Feb 2014 01:58:57 +0000 (17:58 -0800)]
igb: Update license text to remove FSF address and update copyright.
This patch updates the license text to remove address of Free Software
Foundation and refer users to www.gnu.org instead. This patch also updates
the copyright dates in appropriate igb driver files.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher [Wed, 26 Feb 2014 01:58:56 +0000 (17:58 -0800)]
igb: make local functions static and remove dead code
Based on Stephen Hemminger's original patch.
Make local functions static, and remove unused functions.
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Wed, 26 Feb 2014 01:58:55 +0000 (17:58 -0800)]
ixgbe: Add WoL support for a new device
Add WoL support for port 0 of a new 82599-based device.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 26 Feb 2014 01:58:54 +0000 (17:58 -0800)]
ixgbe: don't use magic size number to assign ptp_caps.name
Rather than using a magic size number, just use sizeof since that will
work and is more robust to future changes.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Wed, 26 Feb 2014 01:58:53 +0000 (17:58 -0800)]
ixgbe: modify behavior on receiving a HW ECC error.
Currently when we noticed a HW ECC error we would request the use reload
the driver to force a reset of the part. This was done due to the mistaken
believe that a normal reset would not be sufficient. Well it turns out it
would be so now we just schedule a reset upon seeing the ECC.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Wed, 26 Feb 2014 00:20:43 +0000 (01:20 +0100)]
ipv6: yet another new IPV6_MTU_DISCOVER option IPV6_PMTUDISC_OMIT
This option has the same semantic as IP_PMTUDISC_OMIT for IPv4 which
got recently introduced. It doesn't honor the path mtu discovered by the
host but in contrary to IPV6_PMTUDISC_INTERFACE allows the generation of
fragments if the packet size exceeds the MTU of the outgoing interface
MTU.
Fixes: 93b36cf3425b9b ("ipv6: support IPV6_PMTU_INTERFACE on sockets")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Wed, 26 Feb 2014 00:20:42 +0000 (01:20 +0100)]
ipv4: yet another new IP_MTU_DISCOVER option IP_PMTUDISC_OMIT
IP_PMTUDISC_INTERFACE has a design error: because it does not allow the
generation of fragments if the interface mtu is exceeded, it is very
hard to make use of this option in already deployed name server software
for which I introduced this option.
This patch adds yet another new IP_MTU_DISCOVER option to not honor any
path mtu information and not accepting new icmp notifications destined for
the socket this option is enabled on. But we allow outgoing fragmentation
in case the packet size exceeds the outgoing interface mtu.
As such this new option can be used as a drop-in replacement for
IP_PMTUDISC_DONT, which is currently in use by most name server software
making the adoption of this option very smooth and easy.
The original advantage of IP_PMTUDISC_INTERFACE is still maintained:
ignoring incoming path MTU updates and not honoring discovered path MTUs
in the output path.
Fixes: 482fc6094afad5 ("ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Wed, 26 Feb 2014 00:20:41 +0000 (01:20 +0100)]
ipv4: use ip_skb_dst_mtu to determine mtu in ip_fragment
ip_skb_dst_mtu mostly falls back to ip_dst_mtu_maybe_forward if no socket
is attached to the skb (in case of forwarding) or determines the mtu like
we do in ip_finish_output, which actually checks if we should branch to
ip_fragment. Thus use the same function to determine the mtu here, too.
This is important for the introduction of IP_PMTUDISC_OMIT, where we
want the packets getting cut in pieces of the size of the outgoing
interface mtu. IPv6 already does this correctly.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timo Teräs [Wed, 26 Feb 2014 09:43:04 +0000 (11:43 +0200)]
neigh: probe application via netlink in NUD_PROBE
iproute2 arpd seems to expect this as there's code and comments
to handle netlink probes with NUD_PROBE set. It is used to flush
the arpd cached mappings.
opennhrp instead turns off unicast probes (so it can handle all
neighbour discovery). Without this change it will not see NUD_PROBE
probes and cannot reconfirm the mapping. Thus currently neigh entry
will just fail and can cause few packets dropped until broadcast
discovery is restarted.
Earlier discussion on the subject:
http://marc.info/?t=
139305877100001&r=1&w=2
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Sacren [Wed, 26 Feb 2014 05:38:29 +0000 (22:38 -0700)]
ieee802154: fix new function declaration
The commit
8fad346f366a7 ("
eee802154: add basic support for RF212 to
at86rf230 driver") introduced the new function is_rf212() with some
minor issues in declaration:
1) Fix the function type by changing it to bool as the function
definition returns a boolean value. Additionally both callers of
is_rf212() are expected to return a boolean value.
2) Fix the function specifier by deleting the inline keyword as the
compiler takes care of that.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Tue, 25 Feb 2014 20:11:02 +0000 (21:11 +0100)]
ipv6: log src and dst along with "udp checksum is 0"
These info messages are rather pointless without any means to identify
the source of the bogus packets. Logging the src and dst addresses and
ports may help a bit.
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Tue, 25 Feb 2014 16:46:10 +0000 (17:46 +0100)]
vxlan: remove unused port variable in vxlan_udp_encap_recv()
Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 26 Feb 2014 20:38:18 +0000 (15:38 -0500)]
Merge branch 'mlx4'
Amir Vadai says:
====================
net, net/mlx4: Add sysfs file for port number
Modern distro's are using biosdevname to rename interface to a name based on
slot/port number.
biosdevname can't get the port number of devices that have multiple ports that
share the same PCI function.
This patch adds a sysfs file under: /sys/devices/.../net/<interface>/dev_port,
that contains the port number (0 based) - to be used by biosdevname.
Also, dev_id was wrongly used in mlx4_en driver - added a patch that fix it.
This patch was tested and applied over commit
51adfcc "net: bcmgenet: remove
unused bh_lock member"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Tue, 25 Feb 2014 16:17:52 +0000 (18:17 +0200)]
net/mlx4_en: Fix bad use of dev_id
dev_id should be set for multiple netdev's sharing the same MAC, which
is not the case here.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Tue, 25 Feb 2014 16:17:51 +0000 (18:17 +0200)]
net/mlx4_en: Expose port number through sysfs
Initialize dev_port with port number (0 based) to be accessed through
sysfs from user space.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Tue, 25 Feb 2014 16:17:50 +0000 (18:17 +0200)]
net: Add sysfs file for port number
Add a sysfs file to enable user space to query the device
port number used by a netdevice instance. This is needed for
devices that have multiple ports on the same PCI function.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 26 Feb 2014 20:28:08 +0000 (15:28 -0500)]
Merge branch 'bnx2x'
Michal Schmidt says:
====================
bnx2x: minimize RAM usage in kdump
kdump kernels usually have only a small amount of memory reserved.
bnx2x can be memory-hungry. Let's minimize its memory usage when
running in kdump.
I detect kdump by looking at the "reset_devices" flag. A couple of
storage drivers (cciss, hpsa) use it for the same purpose. I am not sure
this is the best way to solve the problem, but it works.
Should it be made more generic by, say, looking at the total amount
of lowmem instead? Not using TPA by default when lowmem is small and/or
defaulting to fewer queues would help 32bit systems where a driver for
a multi-function multi-queue NIC can consume a significant amount
of available memory. Or do we want no such heuristics?
Is this something to consider doing for other network drivers too?
====================
Acked-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Tue, 25 Feb 2014 15:04:26 +0000 (16:04 +0100)]
bnx2x: save RAM in kdump kernel by disabling TPA
When running in a kdump kernel, disable TPA. This saves memory, which
tends to be scarce in kdump.
TPA, being a receive acceleration, is unlikely to be useful for kdump,
whose purpose is to send the memory image out.
This saves additional 5 MB in the kdump environment.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Tue, 25 Feb 2014 15:04:25 +0000 (16:04 +0100)]
bnx2x: save RAM in kdump kernel by using a single queue
When running in a kdump kernel, make sure to use only a single ethernet
queue even if a num_queues option in /etc/modprobe.d/*.conf would specify
otherwise. This saves memory, which tends to be scarce in kdump.
This saves about 40 MB in the kdump environment on a setup with
num_queues=8 in the config file.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Tue, 25 Feb 2014 15:04:24 +0000 (16:04 +0100)]
bnx2x: clamp num_queues to prevent passing a negative value
Use the clamp() macro to make the calculation of the number of queues
slightly easier to understand. It also avoids a crash when someone
accidentally passes a negative value in num_queues= module parameter.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Tue, 25 Feb 2014 13:34:32 +0000 (14:34 +0100)]
net: tcp: add mib counters to track zero window transitions
Three counters are added:
- one to track when we went from non-zero to zero window
- one to track the reverse
- one counter incremented when we want to announce zero window,
but can't because we would shrink current window.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Jerram [Tue, 25 Feb 2014 11:17:25 +0000 (11:17 +0000)]
net: order MPLS ethertypes numerically
All ethertypes other than ETH_P_MPLS_UC, ETH_P_MPLS_MC and
ETH_P_ATMMPOA were already ordered numerically. This commit moves
those three ETH_P_... values into correct numerical order too.
Signed-off-by: Neil Jerram <Neil.Jerram@metaswitch.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Thu, 20 Feb 2014 21:25:51 +0000 (13:25 -0800)]
bnx2x: Remove hidden flow control goto from BNX2X_ALLOC macros
BNX2X_ALLOC macros use "goto alloc_mem_err"
so these labels appear unused in some functions.
Expand these macros in-place via coccinelle and
some typing.
Update the macros to use statement expressions
and remove the BNX2X_ALLOC macro.
This adds some > 80 char lines.
$ cat bnx2x_pci_alloc.cocci
@@
expression e1;
expression e2;
expression e3;
@@
- BNX2X_PCI_ALLOC(e1, e2, e3);
+ e1 = BNX2X_PCI_ALLOC(e2, e3); if (!e1) goto alloc_mem_err;
@@
expression e1;
expression e2;
expression e3;
@@
- BNX2X_PCI_FALLOC(e1, e2, e3);
+ e1 = BNX2X_PCI_FALLOC(e2, e3); if (!e1) goto alloc_mem_err;
@@
expression e1;
expression e2;
@@
- BNX2X_ALLOC(e1, e2);
+ e1 = kzalloc(e2, GFP_KERNEL); if (!e1) goto alloc_mem_err;
@@
expression e1;
expression e2;
expression e3;
@@
- kzalloc(sizeof(e1) * e2, e3)
+ kcalloc(e2, sizeof(e1), e3)
@@
expression e1;
expression e2;
expression e3;
@@
- kzalloc(e1 * sizeof(e2), e3)
+ kcalloc(e1, sizeof(e2), e3)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert [Fri, 21 Feb 2014 07:41:11 +0000 (08:41 +0100)]
vti4: Enable namespace changing
vti4 is now fully namespace aware, so allow namespace changing
for vti devices
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Steffen Klassert [Fri, 21 Feb 2014 07:41:11 +0000 (08:41 +0100)]
vti4: Check the tunnel endpoints of the xfrm state and the vti interface
The tunnel endpoints of the xfrm_state we got from the xfrm_lookup
must match the tunnel endpoints of the vti interface. This patch
ensures this matching.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Steffen Klassert [Fri, 21 Feb 2014 07:41:10 +0000 (08:41 +0100)]
vti4: Support inter address family tunneling.
With this patch we can tunnel ipv6 traffic via a vti4
interface. A vti4 interface can now have an ipv6 address
and ipv6 traffic can be routed via a vti4 interface.
The resulting traffic is xfrm transformed and tunneled
throuhg ipv4 if matching IPsec policies and states are
present.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Steffen Klassert [Fri, 21 Feb 2014 07:41:10 +0000 (08:41 +0100)]
vti4: Use the on xfrm_lookup returned dst_entry directly
We need to be protocol family indepenent to support
inter addresss family tunneling with vti. So use a
dst_entry instead of the ipv4 rtable in vti_tunnel_xmit.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Steffen Klassert [Fri, 21 Feb 2014 07:41:10 +0000 (08:41 +0100)]
xfrm4: Remove xfrm_tunnel_notifier
This was used from vti and is replaced by the IPsec protocol
multiplexer hooks. It is now unused, so remove it.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Steffen Klassert [Fri, 21 Feb 2014 07:41:10 +0000 (08:41 +0100)]
vti: Update the ipv4 side to use it's own receive hook.
With this patch, vti uses the IPsec protocol multiplexer to
register it's own receive side hooks for ESP, AH and IPCOMP.
Vti now does the following on receive side:
1. Do an input policy check for the IPsec packet we received.
This is required because this packet could be already
prosecces by IPsec, so an inbuond policy check is needed.
2. Mark the packet with the i_key. The policy and the state
must match this key now. Policy and state belong to the outer
namespace and policy enforcement is done at the further layers.
3. Call the generic xfrm layer to do decryption and decapsulation.
4. Wait for a callback from the xfrm layer to properly clean the
skb to not leak informations on namespace and to update the
device statistics.
On transmit side:
1. Mark the packet with the o_key. The policy and the state
must match this key now.
2. Do a xfrm_lookup on the original packet with the mark applied.
3. Check if we got an IPsec route.
4. Clean the skb to not leak informations on namespace
transitions.
5. Attach the dst_enty we got from the xfrm_lookup to the skb.
6. Call dst_output to do the IPsec processing.
7. Do the device statistics.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Steffen Klassert [Fri, 21 Feb 2014 07:41:09 +0000 (08:41 +0100)]
ip_tunnel: Make vti work with i_key set
Vti uses the o_key to mark packets that were transmitted or received
by a vti interface. Unfortunately we can't apply different marks
to in and outbound packets with only one key availabe. Vti interfaces
typically use wildcard selectors for vti IPsec policies. On forwarding,
the same output policy will match for both directions. This generates
a loop between the IPsec gateways until the ttl of the packet is
exceeded.
The gre i_key/o_key are usually there to find the right gre tunnel
during a lookup. When vti uses the i_key to mark packets, the tunnel
lookup does not work any more because vti does not use the gre keys
as a hash key for the lookup.
This patch workarounds this my not including the i_key when comupting
the hash for the tunnel lookup in case of vti tunnels.
With this we have separate keys available for the transmitting and
receiving side of the vti interface.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Steffen Klassert [Fri, 21 Feb 2014 07:41:09 +0000 (08:41 +0100)]
xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer
IPsec vti_rcv needs to remind the tunnel pointer to
check it later at the vti_rcv_cb callback. So add
this pointer to the IPsec common buffer, initialize
it and check it to avoid transport state matching of
a tunneled packet.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Steffen Klassert [Fri, 21 Feb 2014 07:41:09 +0000 (08:41 +0100)]
ipcomp4: Use the IPsec protocol multiplexer API
Switch ipcomp4 to use the new IPsec protocol multiplexer.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Steffen Klassert [Fri, 21 Feb 2014 07:41:09 +0000 (08:41 +0100)]
ah4: Use the IPsec protocol multiplexer API
Switch ah4 to use the new IPsec protocol multiplexer.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Steffen Klassert [Fri, 21 Feb 2014 07:41:08 +0000 (08:41 +0100)]
esp4: Use the IPsec protocol multiplexer API
Switch esp4 to use the new IPsec protocol multiplexer.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Steffen Klassert [Fri, 21 Feb 2014 07:41:08 +0000 (08:41 +0100)]
xfrm4: Add IPsec protocol multiplexer
This patch add an IPsec protocol multiplexer. With this
it is possible to add alternative protocol handlers as
needed for IPsec virtual tunnel interfaces.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Florian Fainelli [Tue, 25 Feb 2014 00:56:13 +0000 (16:56 -0800)]
net: bcmgenet: remove unused bh_lock member
bh_lock spinlock is unused, remove it from the private driver structure.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 25 Feb 2014 00:56:12 +0000 (16:56 -0800)]
net: bcmgenet: remove commented code in bcmgenet_xmit()
This code is commented since it is unused, left-over from the very first
time this driver was merged.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 25 Feb 2014 00:56:11 +0000 (16:56 -0800)]
net: bcmgenet: drop checks on priv->phydev
Drop all the checks on priv->phydev since we will refuse probing the
driver if we cannot attach to a PHY device. Drop all checks on
priv->phydev. This also fixes some smatch issues reported by Dan
Carpenter.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 25 Feb 2014 00:38:53 +0000 (19:38 -0500)]
Merge branch 'gianfar'
Claudiu Manoil says:
====================
gianfar: Device reset and reconfig fixes
These patches end up fixing some notable device reset & reconfig
related problems. One issue is on-the-fly (Rx/Tx on) programming
of interrupt coalescing (IC) registers on the processing path,
against HW recommendation. This is an old issue that became visible
after BQL introduction, as under certain conditions (low traffic)
one TX interrupt gets lost and BQL fires Tx timeout as a result.
Another notable issue is a race on the Tx path (xmit, clean_tx)
during device reset (i.e. during Tx timeout watchdog firing)
that leads to NULL access.
Fixing the problematic on-thy-fly register writes (i.e. the IC regs)
required the implementation of a MAC soft reset procedure.
The race leading to NULL access was addressed by fixing the
stop_gfar()/startup_gfar() pair (disable/enable napi a.s.o.)
and adding the device state DOWN to sync with the TX path.
v2: Refactored if() clauses from gfar_set_features(), PATCH 2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Mon, 24 Feb 2014 10:13:46 +0000 (12:13 +0200)]
gianfar: Fix Tx int miss, dont write IC on-the-fly
Programming the interrupt coalescing (IC) registers while
the controller/DMA is on may incur the loss of one Tx
confirmation interrupt, under certain conditions. This is
a subtle hw race because it does not occur during a burst
of Tx packets. It has been observed on p2020 devices that,
if just one packet is being xmit'ed, the Tx confirmation
doesn't trigger and BQL evetually blocks the Tx queues,
followed by Tx timeout and an un-responsive device.
This issue was not apparent prior to introducing BQL
support, as a late Tx confirmation was not an issue back then
and the next burst of Tx frames would have triggered the
Tx confirmation/ Tx ring cleanup anyway.
Bottom line, the hw specifications state that the IC registers
should not be programmed while the Rx/Tx blocks (the DMA) are
enabled. Further more, these registers are currently re-written
with the same values on the processing path, over and over again.
To fix this, rewriting the IC registers has been removed from
the processing path (napi poll). A complete MAC reset procedure
has been implemented for the ethtool -c option instead, to
reliably update these registers while the controller is stopped.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Mon, 24 Feb 2014 10:13:45 +0000 (12:13 +0200)]
gianfar: Fix device reset races (oops) for Tx
The device reset procedure, stop_gfar()/startup_gfar(), has
concurrency issues.
"Kernel access of bad area" oopses show up during Tx timeout
device reset or other reset cases (like changing MTU) that
happen while the interface still has traffic. The oopses
happen in start_xmit and clean_tx_ring when accessing tx_queue->
tx_skbuff which is NULL. The race comes from de-allocating the
tx_skbuff while transmission and napi processing are still
active. Though the Tx queues get temoprarily stopped when Tx
timeout occurs, they get re-enabled as a result of Tx congestion
handling inside the napi context (see clean_tx_ring()). Not
disabling the napi during reset is also a bug, because
clean_tx_ring() will try to access tx_skbuff while it is being
de-alloc'ed and re-alloc'ed.
To fix this, stop_gfar() needs to disable napi processing
after stopping the Tx queues. However, in order to prevent
clean_tx_ring() to re-enable the Tx queue before the napi
gets disabled, the device state DOWN has been introduced.
It prevents the Tx congestion management from re-enabling the
de-congested Tx queue while the device is brought down.
An additional locking state, RESETTING, has been introduced
to prevent simultaneous resets or to prevent configuring the
device while it is resetting.
The bogus 'rxlock's (for each Rx queue) have been removed since
their purpose is not justified, as they don't prevent nor are
suited to prevent device reset/reconfig races (such as this one).
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Mon, 24 Feb 2014 10:13:44 +0000 (12:13 +0200)]
gianfar: Don't free/request irqs on device reset
Resetting the device (stop_gfar()/startup_gfar()) should
be fast and to the point, in order to timely recover
from an error condition (like Tx timeout) or during
device reconfig. The irq free/ request routines are just
redundant here, and they should be part of the device
close/ open routines instead.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Mon, 24 Feb 2014 10:13:43 +0000 (12:13 +0200)]
gianfar: Fix on-the-fly vlan and mtu updates
The RCTRL and TCTRL registers should not be changed
on-the-fly, while the controller is running, otherwise
unexpected behaviour occurs. But that's exactly what
gfar_vlan_mode() does, updating the VLAN acceleration
bits inside RCTRL/TCTRL. The attempt to lock these
operations doesn't help, but only adds to the confusion.
There's also a dependency for Rx FCB insertion (activating
/de-activating the TOE offload block on Rx) which might
change the required rx buffer size. This makes matters
worse as gfar_vlan_mode() ends up calling gfar_change_mtu(),
though the MTU size remains the same. Note that there are
other situations that may affect the required rx buffer size,
like changing RXCSUM or rx hw timestamping, but errorneously
the rx buffer size is not recomputed/ updated in the process.
To fix this, do the vlan updates properly inside the MAC
reset and reconfiguration procedure, which takes care of
the rx buffer size dependecy and the rx TOE block (PRSDEP)
activation/deactivation as well (in the correct order).
As a consequence, MTU/ rx buff size updates are done now
by the same MAC reset and reconfig procedure, so that out
of context updates to MAXFRM, MRBLR, and MACCFG inside
change_mtu() are no longer needed. The rx buffer size
dependecy to Rx FCB is now handled for the other cases too
(RXCSUM and rx hw timestamping).
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Mon, 24 Feb 2014 10:13:42 +0000 (12:13 +0200)]
gianfar: Implement MAC reset and reconfig procedure
The main MAC config registers like: RCTRL/TCTRL, MRBLR,
MAXFRM, RXIC/TXIC, most fields of MACCFG1/2, should not
be changed on-the-fly, but at least after stopping the
DMA and disabling the Rx/Tx blocks and, for increased
reliability, after a MAC soft reset.
Impelement a complete MAC soft reset and reconfig procedure
following the latest HW advisories - gfar_mac_reset() - to
replace gfar_mac_init() and (the confusing) init_registers()
functions.
Factor out separate config functions for RCTRL and TCTRL,
insure programming order of the relevant config regs after
MAC soft reset.
Split gfar_hw_init() into gfar_mac_reset() and the remaining
global regs that don't need to be reconfigured after MAC soft
reset (FIFOCFG, ATTRELI, HW counters a.s.o).
As gfar_hw_init() now makes all the register writes @probe()
time, based on all the device flags and config options, it
must be moved further down, just before register_netdev(),
as the last config step when the config values are comitted
to HW. Also, move netif_carrier_off() after register_netdev(),
because it has no effect if called before.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 25 Feb 2014 00:33:27 +0000 (19:33 -0500)]
bcmgenet: Deleted unnecessary select_queue method.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabio Estevam [Mon, 24 Feb 2014 03:47:24 +0000 (00:47 -0300)]
net: bcmgenet: Use devm_ioremap_resource()
According to Documentation/driver-model/devres.txt, devm_request_and_ioremap()
is deprecated, so use devm_ioremap_resource() instead.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Sun, 23 Feb 2014 08:05:26 +0000 (00:05 -0800)]
bridge: netfilter: Use ether_addr_copy
Convert the uses of memcpy to ether_addr_copy because
for some architectures it is smaller and faster.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Sun, 23 Feb 2014 08:05:25 +0000 (00:05 -0800)]
bridge: Use ether_addr_copy and ETH_ALEN
Convert the more obvious uses of memcpy to ether_addr_copy.
There are still uses of memcpy that could be converted but
these addresses are __aligned(2).
Convert a couple uses of 6 in gr_private.h to ETH_ALEN.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sun, 23 Feb 2014 00:03:24 +0000 (00:03 +0000)]
cgxb4: Stop using ethtool SPEED_* constants
ethtool speed values are just numbers of megabits and there is no need
to add SPEED_40000. To be consistent, use integer constants directly
for all speeds.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 22 Feb 2014 17:37:53 +0000 (18:37 +0100)]
tools: bpf_dbg: various misc code cleanups
Lets clean up bpf_dbg a bit and improve its code slightly
in various areas: i) Get rid of some macros as there's no
good reason for keeping them, ii) remove one unused variable
and reduce scope of various variables found by cppcheck,
iii) Close non-default file descriptors when exiting the shell.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 22 Feb 2014 13:01:53 +0000 (14:01 +0100)]
loopback: sctp: add NETIF_F_SCTP_CSUM to device features
Drivers are allowed to set NETIF_F_SCTP_CSUM if they have
hardware crc32c checksumming support for the SCTP protocol.
Currently, NETIF_F_SCTP_CSUM flag is available in igb,
ixgbe, i40e/i40evf drivers and for vlan devices.
If we don't have NETIF_F_SCTP_CSUM then crc32c is done
through CPU instructions, invoked from crypto layer, or
if not available as slow-path fallback in software.
Currently, loopback device propagates checksum offloading
feature flags in dev->features, but is missing SCTP checksum
offloading. Therefore, account for NETIF_F_SCTP_CSUM as
well.
Before patch:
./netperf_sctp -H 192.168.0.100 -t SCTP_STREAM_MANY
SCTP 1-TO-MANY STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.100 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
4194304 4194304 4096 10.00 4683.50
After patch:
./netperf_sctp -H 192.168.0.100 -t SCTP_STREAM_MANY
SCTP 1-TO-MANY STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.100 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
4194304 4194304 4096 10.00 15348.26
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mathias Krause [Fri, 21 Feb 2014 20:38:36 +0000 (21:38 +0100)]
pktgen: document all supported flags
The documentation misses a few of the supported flags. Fix this. Also
respect the dependency to CONFIG_XFRM for the IPSEC flag.
Cc: Fan Du <fan.du@windriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mathias Krause [Fri, 21 Feb 2014 20:38:35 +0000 (21:38 +0100)]
pktgen: simplify error handling in pgctrl_write()
The 'out' label is just a relict from previous times as pgctrl_write()
had multiple error paths. Get rid of it and simply return right away
on errors.
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mathias Krause [Fri, 21 Feb 2014 20:38:34 +0000 (21:38 +0100)]
pktgen: fix out-of-bounds access in pgctrl_write()
If a privileged user writes an empty string to /proc/net/pktgen/pgctrl
the code for stripping the (then non-existent) '\n' actually writes the
zero byte at index -1 of data[]. The then still uninitialized array will
very likely fail the command matching tests and the pr_warning() at the
end will therefore leak stack bytes to the kernel log.
Fix those issues by simply ensuring we're passed a non-empty string as
the user API apparently expects a trailing '\n' for all commands.
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Feb 2014 23:44:05 +0000 (18:44 -0500)]
Merge branch 'qlcnic-next'
Shahed Shaikh says:
====================
qlcnic: Re-factoring and enhancements
This patch series includes following changes -
* Re-factored firmware minidump template header handling
* Support to make 8 vNIC mode application to work with 16 vNIC mode
* Enhance error message logging when adapter is in failed state and
when adapter lock access fails.
* Allow vlan0 traffic
* update MAINTAINERS
Please apply this series to net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shahed Shaikh [Fri, 21 Feb 2014 18:20:16 +0000 (13:20 -0500)]
Update MAINTAINERS for qlcnic driver
Keep myself as only maintainer for qlcnic driver and update
group email alias to Dept-HSGLinuxNICDev@qlogic.com
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shahed Shaikh [Fri, 21 Feb 2014 18:20:15 +0000 (13:20 -0500)]
qlcnic: Update version to 5.3.56
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harish Patil [Fri, 21 Feb 2014 18:20:14 +0000 (13:20 -0500)]
qlcnic: Enhance semaphore lock access failure error message
Signed-off-by: Harish Patil <harish.patil@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Borundia [Fri, 21 Feb 2014 18:20:13 +0000 (13:20 -0500)]
qlcnic: Allow vlan0 traffic
o Adapter allows vlan0 traffic in case of SR-IOV after setting
QLC_SRIOV_ALLOW_VLAN0 bit even though we do not add vlan0 filters.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sucheta Chakraborty [Fri, 21 Feb 2014 18:20:12 +0000 (13:20 -0500)]
qlcnic: Enhance driver message in failed state.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Fri, 21 Feb 2014 18:20:11 +0000 (13:20 -0500)]
qlcnic: Updates to QLogic application/driver interface for virtual NIC configuration
Qlogic application interface in the driver which has larger than 8 vNIC
configuration support has been updated to handle the following cases:
o Only 8 or lower total vNICs were enabled within the vNIC 0-7 range
o vNICs were enabled in the vNIC 0-15 range such that enabled vNICs were
not contiguous and only 8 or lower number of total VNICs were enabled
o Disconnect in the vNIC mapping between application and driver when the
enabled VNICs were dis contiguous
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shahed Shaikh [Fri, 21 Feb 2014 18:20:10 +0000 (13:20 -0500)]
qlcnic: Re-factor firmware minidump template header handling
Treat firmware minidump template headers for 82xx and 83xx/84xx adapters separately,
as it may change for 82xx and 83xx/84xx adapter type independently.
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Feb 2014 23:38:27 +0000 (18:38 -0500)]
Merge branch 'mlx4'
Amir Vadai says:
====================
net/mlx4: Mellanox driver update 01-01-2014
This small patchset has a fix to a bogus usage of
netif_get_num_default_rss_queues() in mlx4_en driver.
Changes from V1:
- Removed affinity_hint patch, to make it a generic instead of mlx specific
Changes from V0:
- Instead of reverting the netif_get_num_default_rss_queues() in mlx4_en,
fixing it to limit the actual number of receive queues instead of limiting
the number of IRQ's.
Patchset was applied and tested against commit:
cb6e926 "ipv6:fix checkpatch
errors with assignment in if condition"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Shamay [Fri, 21 Feb 2014 10:39:18 +0000 (12:39 +0200)]
net/mlx4: Fix limiting number of IRQ's instead of RSS queues
This fix a performance bug introduced by commit
90b1ebe "mlx4: set
maximal number of default RSS queues", which limits the numbers of IRQs
opened by core module.
The limit should be on the number of queues in the indirection table -
rx_rings, and not on the number of IRQ's. Also, limiting on mlx4_core
initialization instead of in mlx4_en, prevented using "ethtool -L" to
utilize all the CPU's, when performance mode is prefered, since limiting
this number to 8 reduces overall packet rate by 15%-50% in multiple TCP
streams applications.
For example, after running ethtool -L <ethx> rx 16
Packet rate
Before the fix 897799
After the fix
1142070
Results were obtained using netperf:
S=200 ; ( for i in $(seq 1 $S) ; do ( \
netperf -H 11.7.13.55 -t TCP_RR -l 30 &) ; \
wait ; done | grep "1 1" | awk '{SUM+=$6} END {print SUM}' )
CC: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>