David S. Miller [Mon, 31 Aug 2015 19:34:00 +0000 (12:34 -0700)]
Merge branch 'per-route-dctcp-receive-side'
Daniel Borkmann says:
====================
tcp: receive-side per route dctcp handling
Original cover letter:
Currently, the following case doesn't use DCTCP, even if it should:
- responder has f.e. cubic as system wide default
- 'ip route congctl dctcp $src' was set
Then, DCTCP is NOT used if a DCTCP sender attempts to connect from a
host in the $src range: ECT(0) is set, but listen_sk is not dctcp, so
we fail the INET_ECN_is_not_ect sanity check.
We also have to examine the dst used for the SYN/ACK reply to make
this case work.
In order to minimize additional cost, store the 'ecn is must have'
information is the dst_features field.
The set targets -next instead of -net since this doesn't seem to be a
serious bug and to give the change more soak time until it hits linus
tree.
v1 -> v2:
- Addressed Dave's feedback, not exposing any bits to user space
- Added patch 3 to reject incorrect configurations
- Rest as is, rebased and retested
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 31 Aug 2015 13:58:47 +0000 (15:58 +0200)]
tcp: use dctcp if enabled on the route to the initiator
Currently, the following case doesn't use DCTCP, even if it should:
A responder has f.e. Cubic as system wide default, but for a specific
route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The
initiating host then uses DCTCP as congestion control, but since the
initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok,
and we have to fall back to Reno after 3WHS completes.
We were thinking on how to solve this in a minimal, non-intrusive
way without bloating tcp_ecn_create_request() needlessly: lets cache
the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0)
is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES
contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows
to only do a single metric feature lookup inside tcp_ecn_create_request().
Joint work with Florian Westphal.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 31 Aug 2015 13:58:46 +0000 (15:58 +0200)]
fib, fib6: reject invalid feature bits
Feature bits that are invalid should not be accepted by the kernel,
only the lower 4 bits may be configured, but not the remaining ones.
Even from these 4, 2 of them are unused.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 31 Aug 2015 13:58:45 +0000 (15:58 +0200)]
net: fib6: reduce identation in ip6_convert_metrics
Reduce the identation a bit, there's no need to artificically have
it increased.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 31 Aug 2015 13:58:44 +0000 (15:58 +0200)]
net: fib: move metrics parsing to a helper
fib_create_info() is already quite large, so before adding more
code to the metrics section move that to a helper, similar to
ip6_convert_metrics.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philip Downey [Mon, 31 Aug 2015 10:30:38 +0000 (11:30 +0100)]
IGMP: Document igmp_link_local_mcast_reports
Document the addition of a new sysctl variable which controls the
generation of IGMP reports for link local multicast groups in the
224.0.0.X range.
IGMP reports for local multicast groups can now be optionally
inhibited by setting the value to zero e.g.:
echo 0 > /proc/sys/net/ipv4/igmp_link_local_mcast_reports
To retain backwards compatibility the previous behaviour is retained
by default on system boot or reverted by setting the value back to
non-zero.
Signed-off-by: Philip Downey <pdowney@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Mon, 31 Aug 2015 01:09:38 +0000 (18:09 -0700)]
ip-tunnel: Use API to access tunnel metadata options.
Currently tun-info options pointer is used in few cases to
pass options around. But tunnel options can be accessed using
ip_tunnel_info_opts() API without using the pointer. Following
patch removes the redundant pointer and consistently make use
of API.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Mon, 31 Aug 2015 12:46:07 +0000 (15:46 +0300)]
ipv4: fix 32b build
Address remaining issue after
80ec192.
Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 31 Aug 2015 05:40:44 +0000 (22:40 -0700)]
ipv4: Fix 32-bit build.
net/ipv4/af_inet.c: In function 'snmp_get_cpu_field64':
>> net/ipv4/af_inet.c:1486:26: error: 'offt' undeclared (first use in this function)
v = *(((u64 *)bhptr) + offt);
^
net/ipv4/af_inet.c:1486:26: note: each undeclared identifier is reported only once for each function it appears in
net/ipv4/af_inet.c: In function 'snmp_fold_field64':
>> net/ipv4/af_inet.c:1499:39: error: 'offct' undeclared (first use in this function)
res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset);
^
>> net/ipv4/af_inet.c:1499:10: error: too many arguments to function 'snmp_get_cpu_field'
res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset);
^
net/ipv4/af_inet.c:1455:5: note: declared here
u64 snmp_get_cpu_field(void __percpu *mib, int cpu, int offt)
^
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ken-ichirou MATSUZAWA [Sun, 30 Aug 2015 22:54:49 +0000 (07:54 +0900)]
netlink: rx mmap: fix POLLIN condition
Poll() returns immediately after setting the kernel current frame
(ring->head) to SKIP from user space even though there is no new
frame. And in a case of all frames is VALID, user space program
unintensionally sets (only) kernel current frame to UNUSED, then
calls poll(), it will not return immediately even though there are
VALID frames.
To avoid situations like above, I think we need to scan all frames
to find VALID frames at poll() like netlink_alloc_skb(),
netlink_forward_ring() finding an UNUSED frame at skb allocation.
Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 31 Aug 2015 04:54:13 +0000 (21:54 -0700)]
Merge branch 'thunderx-features-fixes'
Aleksey Makarov says:
====================
net: thunderx: New features and fixes
v2:
- The unused affinity_mask field of the structure cmp_queue
has been deleted. (thanks to David Miller)
- The unneeded initializers have been dropped. (thanks to Alexey Klimov)
- The commit message "net: thunderx: Rework interrupt handling"
has been fixed. (thanks to Alexey Klimov)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Sun, 30 Aug 2015 09:29:16 +0000 (12:29 +0300)]
net: thunderx: Support for internal loopback mode
Support for setting VF's corresponding BGX LMAC in internal
loopback mode. This mode can be used for verifying basic HW
functionality such as packet I/O, RX checksum validation,
CQ/RBDR interrupts, stats e.t.c. Useful when DUT has no external
network connectivity.
'loopback' mode can be enabled or disabled via ethtool.
Note: This feature is not supported when no of VFs enabled are
morethan no of physical interfaces i.e active BGX LMACs
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Sun, 30 Aug 2015 09:29:15 +0000 (12:29 +0300)]
net: thunderx: Support for upto 96 queues for a VF
This patch adds support for handling multiple qsets assigned to a
single VF. There by increasing no of queues from earlier 8 to max
no of CPUs in the system i.e 48 queues on a single node and 96 on
dual node system. User doesn't have option to assign which Qsets/VFs
to be merged. Upon request from VF, PF assigns next free Qsets as
secondary qsets. To maintain current behavior no of queues is kept
to 8 by default which can be increased via ethtool.
If user wants to unbind NICVF driver from a secondary Qset then it
should be done after tearing down primary VF's interface.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Sun, 30 Aug 2015 09:29:14 +0000 (12:29 +0300)]
net: thunderx: Rework interrupt handling
Rework interrupt handler to avoid checking IRQ affinity of
CQ interrupts. Now separate handlers are registered for each IRQ
including RBDR. Register interrupt handlers for only those
which are being used. Add nicvf_dump_intr_status() and use it
in irq handlers.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Sun, 30 Aug 2015 09:29:13 +0000 (12:29 +0300)]
net: thunderx: Support for HW VLAN stripping
This patch configures HW to strip 802.1Q header if found in a
receiving packet. The stripped VLAN ID and TCI information is
passed on to software via CQE_RX. Also sets netdev's 'vlan_features'
so that other HW offload features can be used for tagged packets.
This offload feature can be enabled or disabled via ethtool.
Network stack normally ignores RPS for 802.1Q packets and hence low
throughput. With this offload enabled throughput for tagged packets
will be almost same as normal packets.
Note: This patch doesn't enable HW VLAN insertion for transmit packets.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Sun, 30 Aug 2015 09:29:12 +0000 (12:29 +0300)]
net: thunderx: Receive hashing HW offload support
Adding support for receive hashing HW offload by using RSS_ALG
and RSS_TAG fields of CQE_RX descriptor. Also removed dependency
on minimum receive queue count to configure RSS so that hash is
always generated.
This hash is used by RPS logic to distribute flows across multiple
CPUs. Offload can be disabled via ethtool.
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Sun, 30 Aug 2015 09:29:11 +0000 (12:29 +0300)]
net: thunderx: mailboxes: remove code duplication
Use the nicvf_send_msg_to_pf() function in the mailbox code.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Sun, 30 Aug 2015 09:29:10 +0000 (12:29 +0300)]
net: thunderx: Add receive error stats reporting via ethtool
Added ethtool support to dump receive packet error statistics reported
in CQE. Also made some small fixes
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aleksey Makarov [Sun, 30 Aug 2015 09:29:09 +0000 (12:29 +0300)]
net: thunderx: fix MAINTAINERS
The liquidio and thunder drivers have different maintainers.
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 31 Aug 2015 04:48:59 +0000 (21:48 -0700)]
Merge branch 'snmp-stat-aggregation'
Raghavendra K T says:
====================
Optimize the snmp stat aggregation for large cpus
While creating 1000 containers, perf is showing lot of time spent in
snmp_fold_field on a large cpu system.
The current patch tries to improve by reordering the statistics gathering.
Please note that similar overhead was also reported while creating
veth pairs https://lkml.org/lkml/2013/3/19/556
Changes in V4:
- remove 'item' variable and use IPSTATS_MIB_MAX to avoid sparse
warning (Eric) also remove 'item' parameter (Joe)
- add missing memset of padding.
Changes in V3:
- use memset to initialize temp buffer in leaf function. (David)
- use memcpy to copy the buffer data to stat instead of unalign_pu (Joe)
- Move buffer definition to leaf function __snmp6_fill_stats64() (Eric)
-
Changes in V2:
- Allocate the stat calculation buffer in stack. (Eric)
Setup:
160 cpu (20 core) baremetal powerpc system with 1TB memory
1000 docker containers was created with command
docker run -itd ubuntu:15.04 /bin/bash in loop
observation:
Docker container creation linearly increased from around 1.6 sec to 7.5 sec
(at 1000 containers) perf data showed, creating veth interfaces resulting in
the below code path was taking more time.
rtnl_fill_ifinfo
-> inet6_fill_link_af
-> inet6_fill_ifla6_attrs
-> snmp_fold_field
proposed idea:
currently __snmp6_fill_stats64 calls snmp_fold_field that walks
through per cpu data to of an item (iteratively for around 36 items).
The patch tries to aggregate the statistics by going through
all the items of each cpu sequentially which is reducing cache
misses.
Performance of docker creation improved by around more than 2x
after the patch.
before the patch:
================
3f45ba571a42e925c4ec4aaee0e48d7610a9ed82a4c931f83324d41822cf6617
real 0m6.836s
user 0m0.095s
sys 0m0.011s
perf record -a docker run -itd ubuntu:15.04 /bin/bash
=======================================================
50.73% docker [kernel.kallsyms] [k] snmp_fold_field
9.07% swapper [kernel.kallsyms] [k] snooze_loop
3.49% docker [kernel.kallsyms] [k] veth_stats_one
2.85% swapper [kernel.kallsyms] [k] _raw_spin_lock
1.37% docker docker [.] backtrace_qsort
1.31% docker docker [.] strings.FieldsFunc
cache-misses: 2.7%
after the patch:
=============
9178273e9df399c8290b6c196e4aef9273be2876225f63b14a60cf97eacfafb5
real 0m3.249s
user 0m0.088s
sys 0m0.020s
perf record -a docker run -itd ubuntu:15.04 /bin/bash
=======================================================
10.57% docker docker [.] scanblock
8.37% swapper [kernel.kallsyms] [k] snooze_loop
6.91% docker [kernel.kallsyms] [k] snmp_get_cpu_field
6.67% docker [kernel.kallsyms] [k] veth_stats_one
3.96% docker docker [.] runtime_MSpan_Sweep
2.47% docker docker [.] strings.FieldsFunc
cache-misses: 1.41 %
Please let me know if you have suggestions/comments.
Thanks Eric, Joe and David for the comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Raghavendra K T [Sun, 30 Aug 2015 05:59:42 +0000 (11:29 +0530)]
net: Optimize snmp stat aggregation by walking all the percpu data at once
Docker container creation linearly increased from around 1.6 sec to 7.5 sec
(at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field.
reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks
through per cpu data of an item (iteratively for around 36 items).
idea: This patch tries to aggregate the statistics by going through
all the items of each cpu sequentially which is reducing cache
misses.
Docker creation got faster by more than 2x after the patch.
Result:
Before After
Docker creation time 6.836s 3.25s
cache miss 2.7% 1.41%
perf before:
50.73% docker [kernel.kallsyms] [k] snmp_fold_field
9.07% swapper [kernel.kallsyms] [k] snooze_loop
3.49% docker [kernel.kallsyms] [k] veth_stats_one
2.85% swapper [kernel.kallsyms] [k] _raw_spin_lock
perf after:
10.57% docker docker [.] scanblock
8.37% swapper [kernel.kallsyms] [k] snooze_loop
6.91% docker [kernel.kallsyms] [k] snmp_get_cpu_field
6.67% docker [kernel.kallsyms] [k] veth_stats_one
changes/ideas suggested:
Using buffer in stack (Eric), Usage of memset (David), Using memcpy in
place of unaligned_put (Joe).
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raghavendra K T [Sun, 30 Aug 2015 05:59:41 +0000 (11:29 +0530)]
net: Introduce helper functions to get the per cpu data
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 31 Aug 2015 04:45:01 +0000 (21:45 -0700)]
Merge git://git./linux/kernel/git/davem/net
David S. Miller [Sun, 30 Aug 2015 02:07:16 +0000 (19:07 -0700)]
Merge branch 'ovs-vport-cleanup'
Pravin B Shelar says:
====================
openvswitch: Cleanup post vport conversion.
After converting all vport to netdev implmentations there
is no need for some of vport functionality.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Sun, 30 Aug 2015 00:44:08 +0000 (17:44 -0700)]
openvswitch: Remove vport-net
This structure is not used anymore.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Sun, 30 Aug 2015 00:44:07 +0000 (17:44 -0700)]
openvswitch: Remove vport stats.
Since all vport types are now backed by netdev, we can directly
use netdev stats. Following patch removes redundant stat
from vport.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Sun, 30 Aug 2015 00:44:06 +0000 (17:44 -0700)]
openvswitch: Remove egress_tun_info.
tun info is passed using skb-dst pointer. Now we have
converted all vports to netdev based implementation so
Now we can remove redundant pointer to tun-info from OVS_CB.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Sun, 30 Aug 2015 00:44:05 +0000 (17:44 -0700)]
openvswitch: Remove vport get_name()
Remove unused get_name() function pointer from vport ops.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross [Fri, 28 Aug 2015 23:54:40 +0000 (16:54 -0700)]
geneve: Use GRO cells infrastructure.
Geneve can benefit from GRO at the device level in a manner similar
to other tunnels, especially as hardware offloads are still emerging.
After this patch, aggregated frames are seen on the tunnel interface.
Single stream throughput nearly doubles in ideal circumstances (on
old hardware).
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Sat, 29 Aug 2015 00:02:21 +0000 (09:02 +0900)]
openvswitch: retain parsed IPv6 header fields in flow on error skipping extension headers
When an error occurs skipping IPv6 extension headers retain the already
parsed IP protocol and IPv6 addresses in the flow. Also assume that the
packet is not a fragment in the absence of information to the contrary;
that is always use the frag_off value set by ipv6_skip_exthdr().
This allows matching on the IP protocol and IPv6 addresses of packets
with malformed extension headers.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Aug 2015 20:15:03 +0000 (13:15 -0700)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-08-28
One more bunch of Bluetooth patches for 4.3:
- Crash fix for hci_bcm driver
- Enhancements to hci_intel driver (e.g. baudrate configuration)
- Fix for SCO link type after multiple connect attempts
- Cleanups & minor fixes in a few other places
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Lindgren [Fri, 28 Aug 2015 18:50:15 +0000 (11:50 -0700)]
net/smsc911x: Fix deferred probe for interrupt
The interrupt handler may not be available when smsc911x probes if the
interrupt handler is a GPIO controller for example. Let's fix that
by adding handling for -EPROBE_DEFER.
Cc: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Aug 2015 20:07:55 +0000 (13:07 -0700)]
Merge branch 'tnl-ipv4-ipv6'
Jiri Benc says:
====================
tunnels: fix incorrect IPv4/v6 headers interpretation
With tunneling, it is currently possible to get an IPv6 header and interpret
it as an IPv4 header, or to interpret an IPv6 address as an IPv4 address
(and vice versa). This leads to things like sending packets to incorrect
address, IPv6 flow label being interpreted as IP packet length, etc.
Fix several places where this can happen.
Most of this is net-next only. The third patch affects net, too, but it
doesn't seem there's anything in user space that sets the attribute at all
currently, thus net-next is fine.
Changelog:
v2: fixed geneve after incorrect rebase on top of Pravin's patches
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Fri, 28 Aug 2015 18:48:22 +0000 (20:48 +0200)]
vxlan: do not receive IPv4 packets on IPv6 socket
By default (subject to the sysctl settings), IPv6 sockets listen also for
IPv4 traffic. Vxlan is not prepared for that and expects IPv6 header in
packets received through an IPv6 socket.
In addition, it's currently not possible to have both IPv4 and IPv6 vxlan
tunnel on the same port (unless bindv6only sysctl is enabled), as it's not
possible to create and bind both IPv4 and IPv6 vxlan interfaces and there's
no way to specify both IPv4 and IPv6 remote/group IP addresses.
Set IPV6_V6ONLY on vxlan sockets to fix both of these issues. This is not
done globally in udp_tunnel, as l2tp and tipc seems to work okay when
receiving IPv4 packets on IPv6 socket and people may rely on this behavior.
The other tunnels (geneve and fou) do not support IPv6.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Fri, 28 Aug 2015 18:48:21 +0000 (20:48 +0200)]
fou: reject IPv6 config
fou does not really support IPv6 encapsulation. After an UDP socket is
created in fou_create, the encap_rcv callback is set either to fou_udp_recv
or to gue_udp_recv. Both of those unconditionally assume that the received
packet has an IPv4 header and access the data at network_header as it was an
IPv4 header. This leads to IPv6 flow label being interpreted as IP packet
length, etc.
Disallow fou tunnel to be configured as IPv6 until real IPv6 support is
added to fou.
CC: Tom Herbert <tom@herbertland.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Fri, 28 Aug 2015 18:48:20 +0000 (20:48 +0200)]
ip_tunnels: record IP version in tunnel info
There's currently nothing preventing directing packets with IPv6
encapsulation data to IPv4 tunnels (and vice versa). If this happens,
IPv6 addresses are incorrectly interpreted as IPv4 ones.
Track whether the given ip_tunnel_key contains IPv4 or IPv6 data. Store this
in ip_tunnel_info. Reject packets at appropriate places if they are supposed
to be encapsulated into an incompatible protocol.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Fri, 28 Aug 2015 18:48:19 +0000 (20:48 +0200)]
ip_tunnels: convert the mode field of ip_tunnel_info to flags
The mode field holds a single bit of information only (whether the
ip_tunnel_info struct is for rx or tx). Change the mode field to bit flags.
This allows more mode flags to be added.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 28 Aug 2015 15:42:09 +0000 (08:42 -0700)]
net: FIB tracepoints
A few useful tracepoints developing VRF driver.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Sat, 29 Aug 2015 01:23:39 +0000 (21:23 -0400)]
sctp: Do not try to search for the transport twice
When removing an non-primary transport during ASCONF
processing, we end up traversing the transport list
twice: once in sctp_cmd_del_non_primary, and once in
sctp_assoc_del_peer. We can avoid the second
search and call sctp_assoc_rm_peer() instead.
Found by code inspection during code reviews.
Signed-off-by: Vladislav Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Fri, 28 Aug 2015 22:05:32 +0000 (15:05 -0700)]
bonding: fix bond_poll_controller bh_enable warning
The problem is rcu_read_unlock_bh() which triggers a warning when irqs are
disabled. ndo_poll_controller should run with irqs disabled always so we
can drop the rcu_read_lock_bh.
[ 98.502922] bond0: making interface eth1 the new active one
[ 98.503039] ------------[ cut here ]------------
[ 98.503039] WARNING: CPU: 0 PID: 1744 at kernel/softirq.c:150 __local_bh_enable_ip+0x96/0xc0()
[ 98.503039] Modules linked in: bonding(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netconsole ppdev joydev parport_pc serio_raw parport i2c_piix4 video acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc virtio_net e1000 ata_generic pcnet32 mii virtio_pci virtio_ring virtio pata_acpi
[ 98.503039] CPU: 0 PID: 1744 Comm: ifenslave Tainted: G OE 4.2.0-rc7+ #56
[ 98.503039] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 98.503039]
0000000000000000 00000000e96ba230 ffff880020c236b8 ffffffff8183f105
[ 98.503039]
0000000000000000 0000000000000000 ffff880020c236f8 ffffffff810a9496
[ 98.503039]
ffff88002ea99e08 0000000000000200 ffffffffa02a8e06 ffff88002ea99e08
[ 98.503039] Call Trace:
[ 98.503039] [<
ffffffff8183f105>] dump_stack+0x4c/0x65
[ 98.503039] [<
ffffffff810a9496>] warn_slowpath_common+0x86/0xc0
[ 98.503039] [<
ffffffffa02a8e06>] ? bond_poll_controller+0x146/0x250 [bonding]
[ 98.503039] [<
ffffffff810a95ca>] warn_slowpath_null+0x1a/0x20
[ 98.503039] [<
ffffffff810ae376>] __local_bh_enable_ip+0x96/0xc0
[ 98.503039] [<
ffffffffa02a8e2f>] bond_poll_controller+0x16f/0x250 [bonding]
[ 98.503039] [<
ffffffffa02a8cf3>] ? bond_poll_controller+0x33/0x250 [bonding]
[ 98.503039] [<
ffffffff810feaed>] ? trace_hardirqs_off+0xd/0x10
[ 98.503039] [<
ffffffff81848afb>] ? _raw_spin_unlock_irqrestore+0x5b/0x60
[ 98.503039] [<
ffffffff816ec48e>] netpoll_poll_dev+0x6e/0x350
[ 98.503039] [<
ffffffff816eb977>] ? netpoll_start_xmit+0x137/0x1d0
[ 98.503039] [<
ffffffff816b2e8b>] ? __alloc_skb+0x5b/0x210
[ 98.503039] [<
ffffffff816ec89d>] netpoll_send_skb_on_dev+0x12d/0x2a0
[ 98.503039] [<
ffffffff816eccde>] netpoll_send_udp+0x2ce/0x430
[ 98.503039] [<
ffffffffa0190850>] write_msg+0xb0/0xf0 [netconsole]
[ 98.503039] [<
ffffffff81116b63>] call_console_drivers.constprop.25+0x133/0x260
[ 98.503039] [<
ffffffff81117934>] console_unlock+0x2f4/0x580
[ 98.503039] [<
ffffffff81117ea5>] ? vprintk_emit+0x2e5/0x630
[ 98.503039] [<
ffffffff81117ee5>] vprintk_emit+0x325/0x630
[ 98.503039] [<
ffffffff81118379>] vprintk_default+0x29/0x40
[ 98.503039] [<
ffffffff8183de4f>] printk+0x55/0x6b
[ 98.503039] [<
ffffffff816c754c>] __netdev_printk+0x16c/0x260
[ 98.503039] [<
ffffffff816c7a12>] netdev_info+0x62/0x80
[ 98.503039] [<
ffffffffa02ab464>] bond_change_active_slave+0x134/0x6a0 [bonding]
[ 98.503039] [<
ffffffffa02aba95>] bond_select_active_slave+0xc5/0x310 [bonding]
[ 98.503039] [<
ffffffffa02aeb78>] bond_enslave+0x1088/0x10c0 [bonding]
[ 98.503039] [<
ffffffffa02af46b>] bond_do_ioctl+0x37b/0x400 [bonding]
[ 98.503039] [<
ffffffff81101d8d>] ? trace_hardirqs_on+0xd/0x10
[ 98.503039] [<
ffffffff816dc437>] ? rtnl_lock+0x17/0x20
[ 98.503039] [<
ffffffff816e5fd1>] dev_ifsioc+0x331/0x3e0
[ 98.503039] [<
ffffffff816e62dc>] dev_ioctl+0xec/0x6c0
[ 98.503039] [<
ffffffff816a6c6a>] sock_do_ioctl+0x4a/0x60
[ 98.503039] [<
ffffffff816a7300>] sock_ioctl+0x1c0/0x250
[ 98.503039] [<
ffffffff81271bfe>] do_vfs_ioctl+0x2ee/0x540
[ 98.503039] [<
ffffffff810fd943>] ? up_read+0x23/0x40
[ 98.503039] [<
ffffffff81070993>] ? __do_page_fault+0x1d3/0x420
[ 98.503039] [<
ffffffff8127e246>] ? __fget_light+0x66/0x90
[ 98.503039] [<
ffffffff81271ec9>] SyS_ioctl+0x79/0x90
[ 98.503039] [<
ffffffff8184936e>] entry_SYSCALL_64_fastpath+0x12/0x76
[ 98.503039] ---[ end trace
00cfa804b0670051 ]---
Fixes: 616f45416ca0 ("bonding: implement bond_poll_controller()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 28 Aug 2015 13:56:01 +0000 (16:56 +0300)]
sh_eth: propagate platform_get_irq() error upstream
The driver overrides the error returned by platform_get_irq() with -ENODEV
which e.g. precludes the deferred probing from working. Propagate the real
error code to the driver core instead.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 28 Aug 2015 13:55:10 +0000 (16:55 +0300)]
ravb: propagate platform_get_irq() error upstream
The driver overrides the error returned by platform_get_irq() with -ENODEV
which e.g. precludes the deferred probing from working. Propagate the real
error code to the driver core instead.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
lucien [Fri, 28 Aug 2015 09:45:58 +0000 (17:45 +0800)]
sctp: ASCONF-ACK with Unresolvable Address should be sent
RFC 5061:
This is an opaque integer assigned by the sender to identify each
request parameter. The receiver of the ASCONF Chunk will copy this
32-bit value into the ASCONF Response Correlation ID field of the
ASCONF-ACK response parameter. The sender of the ASCONF can use this
same value in the ASCONF-ACK to find which request the response is
for. Note that the receiver MUST NOT change this 32-bit value.
Address Parameter: TLV
This field contains an IPv4 or IPv6 address parameter, as described
in Section 3.3.2.1 of [RFC4960].
ASCONF chunk with Error Cause Indication Parameter (Unresolvable Address)
should be sent if the Delete IP Address is not part of the association.
Endpoint A Endpoint B
(ESTABLISHED) (ESTABLISHED)
ASCONF ----------------->
(Delete IP Address)
<----------------- ASCONF-ACK
(Unresolvable Address)
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ken-ichirou MATSUZAWA [Fri, 28 Aug 2015 07:05:20 +0000 (16:05 +0900)]
netlink: mmap: fix lookup frame position
__netlink_lookup_frame() was always called with the same "pos"
value in netlink_forward_ring(). It will look at the same ring entry
header over and over again, every time through this loop. Then cycle
through the whole ring, advancing ring->head, not "pos" until it
equals the "ring->head != head" loop test fails.
Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe Ricard [Fri, 28 Aug 2015 05:07:48 +0000 (07:07 +0200)]
netlink: add NETLINK_CAP_ACK socket option
Since commit
c05cdb1b864f ("netlink: allow large data transfers from
user-space"), the kernel may fail to allocate the necessary room for the
acknowledgment message back to userspace. This patch introduces a new
socket option that trims off the payload of the original netlink message.
The netlink message header is still included, so the user can guess from
the sequence number what is the message that has triggered the
acknowledgment.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Stringer [Sat, 29 Aug 2015 02:22:11 +0000 (19:22 -0700)]
openvswitch: Fix conntrack compilation without mark.
Fix build with !CONFIG_NF_CONNTRACK_MARK && CONFIG_OPENVSWITCH_CONNTRACK
Fixes: 182e304 ("openvswitch: Allow matching on conntrack mark")
Reported-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Tested-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 28 Aug 2015 23:29:59 +0000 (16:29 -0700)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next tree.
In sum, patches to address fallout from the previous round plus updates from
the IPVS folks via Simon Horman, they are:
1) Add a new scheduler to IPVS: The weighted overflow scheduling algorithm
directs network connections to the server with the highest weight that is
currently available and overflows to the next when active connections exceed
the node's weight. From Raducu Deaconu.
2) Fix locking ordering in IPVS, always take rtnl_lock in first place. Patch
from Julian Anastasov.
3) Allow to indicate the MTU to the IPVS in-kernel state sync daemon. From
Julian Anastasov.
4) Enhance multicast configuration for the IPVS state sync daemon. Also from
Julian.
5) Resolve sparse warnings in the nf_dup modules.
6) Fix a linking problem when CONFIG_NF_DUP_IPV6 is not set.
7) Add ICMP codes 5 and 6 to IPv6 REJECT target, they are more informative
subsets of code 1. From Andreas Herz.
8) Revert the jumpstack size calculation from mark_source_chains due to chain
depth miscalculations, from Florian Westphal.
9) Calm down more sparse warning around the Netfilter tree, again from Florian
Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 28 Aug 2015 23:27:28 +0000 (16:27 -0700)]
Merge branch 'bpf_trace_printk-percent-s'
Alexei Starovoitov says:
====================
support for '%s' in bpf_trace_printk
v2->v3:
fix the comment to mention that strncpy_from_unsafe() returns
the length of the string including the trailing NUL.
v1->v2:
patch 1: generalize FETCH_FUNC_NAME(memory, string) into
strncpy_from_unsafe()
patch 2: use it in bpf_trace_printk
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Fri, 28 Aug 2015 22:56:23 +0000 (15:56 -0700)]
bpf: add support for %s specifier to bpf_trace_printk()
%s specifier makes bpf program and kernel debugging easier.
To make sure that trace_printk won't crash the unsafe string
is copied into stack and unsafe pointer is substituted.
The following C program:
#include <linux/fs.h>
int foo(struct pt_regs *ctx, struct filename *filename)
{
void *name = 0;
bpf_probe_read(&name, sizeof(name), &filename->name);
bpf_trace_printk("executed %s\n", name);
return 0;
}
when attached to kprobe do_execve()
will produce output in /sys/kernel/debug/tracing/trace_pipe :
make-13492 [002] d..1 3250.997277: : executed /bin/sh
sh-13493 [004] d..1 3250.998716: : executed /usr/bin/gcc
gcc-13494 [002] d..1 3250.999822: : executed /usr/lib/gcc/x86_64-linux-gnu/4.7/cc1
gcc-13495 [002] d..1 3251.006731: : executed /usr/bin/as
gcc-13496 [002] d..1 3251.011831: : executed /usr/lib/gcc/x86_64-linux-gnu/4.7/collect2
collect2-13497 [000] d..1 3251.012941: : executed /usr/bin/ld
Suggested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Fri, 28 Aug 2015 22:56:22 +0000 (15:56 -0700)]
lib: introduce strncpy_from_unsafe()
generalize FETCH_FUNC_NAME(memory, string) into
strncpy_from_unsafe() and fix sparse warnings that were
present in original implementation.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Fri, 28 Aug 2015 22:44:25 +0000 (15:44 -0700)]
netpoll: warn on netpoll_send_udp users who haven't disabled irqs
Make sure we catch future netpoll_send_udp users who use it without
disabling irqs and also as a hint for poll_controller users.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 28 Aug 2015 21:15:25 +0000 (14:15 -0700)]
Merge branch 'phylib-simplifications'
Sergei Shtylyov says:
====================
Some phylib simplifications
Here's 2 patches against DaveM's 'net-next.git' repo. We simplify a bogus
string of type casts in the 1st patch and make the code respect some coding
standards of the networking code in the 2nd one. I may follow with fixing of
checkpatch.pl's complaints. if I have time..
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 28 Aug 2015 18:35:14 +0000 (21:35 +0300)]
phylib: simplify NULL checks
Fix scripts/checkpatch.pl's messages like:
CHECK: Comparison to NULL could be written "!phydrv->read_mmd_indirect"
BTW, it doesn't detect the reversed comparisons (which I've fixed as well).
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 28 Aug 2015 18:34:34 +0000 (21:34 +0300)]
phylib: simplify bogus phy_device_create() result
Get rid of the bogus string of type casts where ERR_PTR() is enough.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 28 Aug 2015 16:46:39 +0000 (18:46 +0200)]
net: sched: don't break line in tc_classify loop notification
Just some minor noise follow-up to address some stylistic issues of
commit
3b3ae880266d ("net: sched: consolidate tc_classify{,_compat}").
Accidentally v1 instead of v2 of that commit got applied, so this
patch adds the relative diff.
Suggested-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shradha Shah [Fri, 28 Aug 2015 09:55:42 +0000 (10:55 +0100)]
sfc: Allow driver to cope with a lower number of VIs than it needs for RSS
Previously, the driver would refuse to load if it couldn't secure
enough VIs from the MC to fulfill its RSS requirements.
This was causing probe to fail on later functions in
configurations where we'd run out of VIs, such as having many
VFs.
This change allows the driver to load with fewer VIs, down to a
minimum of 2. A warning will be printed saying that RSS
requirements were not met, possibly affecting performance.
efx->max_tx_channels needs to be set to avoid going down the
failure path in efx_probe_nic() immediately in the loop after the
probe() NIC-type function.
Also, Set rc=ENOSPC when bombing out of efx_probe_nic due to lack
of VIs.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Fri, 28 Aug 2015 05:47:12 +0000 (11:17 +0530)]
cxgb4: Force uninitialized state if FW in adapter is unsupported
Forcing uninitialized state allows us to upgrade and reinitialize
the adapter.
FW_VERSION_T4 = 1.4.0.0
FW_VERSION_T5 = 0.0.0.0
FW_VERSION_T6 = 0.0.0.0
At this point driver supports above and greater than above version.
If FW in adapter < min FW_VERSION driver supports tries to upgrade the FW
If FW in adapter >= FW_VERSION driver supports then it follows normal path
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 28 Aug 2015 20:43:33 +0000 (13:43 -0700)]
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
Included changes:
- code beautification
- remove obsolete 'deleted' attribute for bat-gw node
- increase internal version number
- prevent potential access to netdev object after deregistration
- set needed_head/tail_room for batman virtual interface
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Valentin Rothberg [Fri, 28 Aug 2015 08:39:56 +0000 (10:39 +0200)]
openswitch: fix typo CONFIG_NF_CONNTRACK_LABEL
Fix typo in conntrack.c
s/CONFIG_NF_CONNTRACK_LABEL/CONFIG_NF_CONNTRACK_LABELS/
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 28 Aug 2015 20:32:37 +0000 (13:32 -0700)]
Merge branch 'vrf-inetpeer'
David Ahern says:
====================
net: Refactor inetpeer cache and add support for VRFs
Per Dave's comment on the version 1 patch adding VRF support to inetpeer
cache by explicitly making the address + index a key. Refactored the
inetpeer code in the process; mostly impacts the use by tcp_metrics.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 27 Aug 2015 23:07:03 +0000 (16:07 -0700)]
net: Add support for VRFs to inetpeer cache
inetpeer caches based on address only, so duplicate IP addresses within
a namespace return the same cached entry. Enhance the ipv4 address key
to contain both the IPv4 address and VRF device index.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 27 Aug 2015 23:07:02 +0000 (16:07 -0700)]
net: Refactor inetpeer address struct
Move the inetpeer_addr_base union to inetpeer_addr and drop
inetpeer_addr_base.
Both the a6 and in6_addr overlays are not needed; drop the __be32 version
and rename in6 to a6 for consistency with ipv4. Add a new u32 array to
the union which removes the need for the typecast in the compare function
and the use of a consistent arg for both ipv4 and ipv6 addresses which
makes the compare function more readable.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 27 Aug 2015 23:07:01 +0000 (16:07 -0700)]
net: Add helper function to compare inetpeer addresses
tcp_metrics and inetpeer both have functions to compare inetpeer
addresses. Consolidate into 1 version.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 27 Aug 2015 23:07:00 +0000 (16:07 -0700)]
net: Add set,get helpers for inetpeer addresses
Use inetpeer set,get helpers in tcp_metrics rather than peeking into
the inetpeer_addr struct.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 27 Aug 2015 23:06:59 +0000 (16:06 -0700)]
net: Introduce ipv4_addr_hash and use it for tcp metrics
Refactors a common line into helper function.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 27 Aug 2015 17:10:50 +0000 (10:10 -0700)]
net: Add ethernet header for pass through VRF device
The change to use a custom dst broke tcpdump captures on the VRF device:
$ tcpdump -n -i vrf10
...
05:32:29.009362 IP 10.2.1.254 > 10.2.1.2: ICMP echo request, id 21989, seq 1, length 64
05:32:29.009855 00:00:40:01:8d:36 > 45:00:00:54:d6:6f, ethertype Unknown (0x0a02), length 84:
0x0000: 0102 0a02 01fe 0000 9181 55e5 0001 bd11 ..........U.....
0x0010: da55 0000 0000 bb5d 0700 0000 0000 1011 .U.....]........
0x0020: 1213 1415 1617 1819 1a1b 1c1d 1e1f 2021 ...............!
0x0030: 2223 2425 2627 2829 2a2b 2c2d 2e2f 3031 "#$%&'()*+,-./01
0x0040: 3233 3435 3637 234567
Local packets going through the VRF device are missing an ethernet header.
Fix by adding one and then stripping it off before pushing back to the IP
stack. With this patch you get the expected dumps:
...
05:36:15.713944 IP 10.2.1.254 > 10.2.1.2: ICMP echo request, id 23795, seq 1, length 64
05:36:15.714160 IP 10.2.1.2 > 10.2.1.254: ICMP echo reply, id 23795, seq 1, length 64
...
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chas Williams [Thu, 27 Aug 2015 16:28:46 +0000 (12:28 -0400)]
net/xen-netfront: only napi_synchronize() if running
If an interface isn't running napi_synchronize() will hang forever.
[ 392.248403] rmmod R running task 0 359 343 0x00000000
[ 392.257671]
ffff88003760fc88 ffff880037193b40 ffff880037193160 ffff88003760fc88
[ 392.267644]
ffff880037610000 ffff88003760fcd8 0000000100014c22 ffffffff81f75c40
[ 392.277524]
0000000000bc7010 ffff88003760fca8 ffffffff81796927 ffffffff81f75c40
[ 392.287323] Call Trace:
[ 392.291599] [<
ffffffff81796927>] schedule+0x37/0x90
[ 392.298553] [<
ffffffff8179985b>] schedule_timeout+0x14b/0x280
[ 392.306421] [<
ffffffff810f91b9>] ? irq_free_descs+0x69/0x80
[ 392.314006] [<
ffffffff811084d0>] ? internal_add_timer+0xb0/0xb0
[ 392.322125] [<
ffffffff81109d07>] msleep+0x37/0x50
[ 392.329037] [<
ffffffffa00ec79a>] xennet_disconnect_backend.isra.24+0xda/0x390 [xen_netfront]
[ 392.339658] [<
ffffffffa00ecadc>] xennet_remove+0x2c/0x80 [xen_netfront]
[ 392.348516] [<
ffffffff81481c69>] xenbus_dev_remove+0x59/0xc0
[ 392.356257] [<
ffffffff814e7217>] __device_release_driver+0x87/0x120
[ 392.364645] [<
ffffffff814e7cf8>] driver_detach+0xb8/0xc0
[ 392.371989] [<
ffffffff814e6e69>] bus_remove_driver+0x59/0xe0
[ 392.379883] [<
ffffffff814e84f0>] driver_unregister+0x30/0x70
[ 392.387495] [<
ffffffff814814b2>] xenbus_unregister_driver+0x12/0x20
[ 392.395908] [<
ffffffffa00ed89b>] netif_exit+0x10/0x775 [xen_netfront]
[ 392.404877] [<
ffffffff81124e08>] SyS_delete_module+0x1d8/0x230
[ 392.412804] [<
ffffffff8179a8ee>] system_call_fastpath+0x12/0x71
Signed-off-by: Chas Williams <3chas3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philip Downey [Thu, 27 Aug 2015 15:46:26 +0000 (16:46 +0100)]
IGMP: Inhibit reports for local multicast groups
The range of addresses between 224.0.0.0 and 224.0.0.255 inclusive, is
reserved for the use of routing protocols and other low-level topology
discovery or maintenance protocols, such as gateway discovery and
group membership reporting. Multicast routers should not forward any
multicast datagram with destination addresses in this range,
regardless of its TTL.
Currently, IGMP reports are generated for this reserved range of
addresses even though a router will ignore this information since it
has no purpose. However, the presence of reserved group addresses in
an IGMP membership report uses up network bandwidth and can also
obscure addresses of interest when inspecting membership reports using
packet inspection or debug messages.
Although the RFCs for the various version of IGMP (e.g.RFC 3376 for
v3) do not specify that the reserved addresses be excluded from
membership reports, it should do no harm in doing so. In particular
there should be no adverse effect in any IGMP snooping functionality
since 224.0.0.x is specifically excluded as per RFC 4541 (IGMP and MLD
Snooping Switches Considerations) section 2.1.2. Data Forwarding
Rules:
2) Packets with a destination IP (DIP) address in the 224.0.0.X
range which are not IGMP must be forwarded on all ports.
IGMP reports for local multicast groups can now be optionally
inhibited by means of a system control variable (by setting the value
to zero) e.g.:
echo 0 > /proc/sys/net/ipv4/igmp_link_local_mcast_reports
To retain backwards compatibility the previous behaviour is retained
by default on system boot or reverted by setting the value back to
non-zero e.g.:
echo 1 > /proc/sys/net/ipv4/igmp_link_local_mcast_reports
Signed-off-by: Philip Downey <pdowney@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Corinna Vinschen [Thu, 27 Aug 2015 15:11:48 +0000 (17:11 +0200)]
r8169: Add software counter for multicast packages
The multicast hardware counter on 8168/8111 chips is only 32 bit while the
statistics in struct rtnl_link_stats64 are 64 bit. Given that statistics
are requested on an irregular basis, an overflow of the hardware counter
can go unnoticed. To count even very large numbers of multicast packets
reliably, add a software counter and remove previously applied code to
fill the multicast field requested by @rtl8169_get_stats64 with the values
read from the rx_multicast hardware counter.
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frederic Danis [Fri, 28 Aug 2015 13:44:00 +0000 (15:44 +0200)]
Bluetooth: hci_bcm: Fix crash on suspend
If bcm_suspend is called whithout device opened there is a crash as
it tries to use bdev->hu which is NULL.
Rename bcm_device_list_lock to bcm_device_lock as it does not only apply
to bcm_device_list.
Signed-off-by: Frederic Danis <frederic.danis@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Florian Westphal [Thu, 27 Aug 2015 22:16:21 +0000 (00:16 +0200)]
netfilter: reduce sparse warnings
bridge/netfilter/ebtables.c:290:26: warning: incorrect type in assignment (different modifiers)
-> remove __pure annotation.
ipv6/netfilter/ip6t_SYNPROXY.c:240:27: warning: cast from restricted __be16
-> switch ntohs to htons and vice versa.
netfilter/core.c:391:30: warning: symbol 'nfq_ct_nat_hook' was not declared. Should it be static?
-> delete it, got removed
net/netfilter/nf_synproxy_core.c:221:48: warning: cast to restricted __be32
-> Use __be32 instead of u32.
Tested with objdiff that these changes do not affect generated code.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Wed, 26 Aug 2015 21:20:51 +0000 (23:20 +0200)]
Revert "netfilter: xtables: compute exact size needed for jumpstack"
This reverts commit
98d1bd802cdbc8f56868fae51edec13e86b59515.
mark_source_chains will not re-visit chains, so
*filter
:INPUT ACCEPT [365:25776]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [217:45832]
:t1 - [0:0]
:t2 - [0:0]
:t3 - [0:0]
:t4 - [0:0]
-A t1 -i lo -j t2
-A t2 -i lo -j t3
-A t3 -i lo -j t4
# -A INPUT -j t4
# -A INPUT -j t3
# -A INPUT -j t2
-A INPUT -j t1
COMMIT
Will compute a chain depth of 2 if the comments are removed.
Revert back to counting the number of chains for the time being.
Reported-by: Cong Wang <cwang@twopensource.com>
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Kuba Pawlak [Fri, 28 Aug 2015 12:05:22 +0000 (13:05 +0100)]
Bluetooth: Fix SCO link type handling on connection complete
Synchronous connections are initially created with type eSCO.
Link manager may reject proposed link parameters, which triggers
connection setup retry with a different set. Link type embedded
in responses should be disregarded until Synchronous Connect Complete
returns Success (0x00). Current code updates link type every time
which creates an issue when link type changes to SCO and back to eSCO
on further attepts.
Issue happens with BlackBerry 9100 and 9700 with Intel WilkinsPeak
on third connection setup attept
2015-05-18 01:27:57.332242 < HCI Command: Setup Synchronous Connection (0x01|0x0028) plen 17
handle 256 voice setting 0x0060 ptype 0x0380
2015-05-18 01:27:57.333604 > HCI Event: Command Status (0x0f) plen 4
Setup Synchronous Connection (0x01|0x0028) status 0x00 ncmd 1
2015-05-18 01:27:57.334614 > HCI Event: Synchronous Connect Complete (0x2c) plen 17
status 0x1a handle 0 bdaddr 30:7C:30:B3:A8:86 type SCO
Error: Unsupported Remote Feature / Unsupported LMP Feature
2015-05-18 01:27:57.334895 < HCI Command: Setup Synchronous Connection (0x01|0x0028) plen 17
handle 256 voice setting 0x0060 ptype 0x0380
2015-05-18 01:27:57.335601 > HCI Event: Command Status (0x0f) plen 4
Setup Synchronous Connection (0x01|0x0028) status 0x00 ncmd 1
2015-05-18 01:27:57.336610 > HCI Event: Synchronous Connect Complete (0x2c) plen 17
status 0x1a handle 0 bdaddr 30:7C:30:B3:A8:86 type SCO
Error: Unsupported Remote Feature / Unsupported LMP Feature
2015-05-18 01:27:57.336685 < HCI Command: Setup Synchronous Connection (0x01|0x0028) plen 17
handle 256 voice setting 0x0060 ptype 0x03c8
2015-05-18 01:27:57.337603 > HCI Event: Command Status (0x0f) plen 4
Setup Synchronous Connection (0x01|0x0028) status 0x00 ncmd 1
2015-05-18 01:27:57.342608 > HCI Event: Max Slots Change (0x1b) plen 3
handle 256 slots 1
2015-05-18 01:27:57.377631 > HCI Event: Synchronous Connect Complete (0x2c) plen 17
status 0x00 handle 257 bdaddr 30:7C:30:B3:A8:86 type eSCO
Air mode: CVSD
Signed-off-by: Kuba Pawlak <kubax.t.pawlak@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Loic Poulain [Thu, 27 Aug 2015 05:21:51 +0000 (07:21 +0200)]
Bluetooth: hci_intel: Add support for platform driver
A platform device can be used to provide some specific resources in
order to manage the controller. In this first patch we retrieve the
reset gpio which is used to power on/off the controller.
The main issue is to match the current tty with the correct pdev.
In case of ACPI, we can easily find the right tty/pdev pair because
they are both child of the same UART port.
If controller is powered-on from the driver, we need to wait for a
HCI boot event before being able to send any command.
Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Marcel Holtmann [Thu, 27 Aug 2015 06:57:39 +0000 (08:57 +0200)]
Bluetooth: btintel: Add MODULE_FIRMWARE entries for iBT 3.0 controllers
The iBT 3.0 controllers need intel/ibt-11-5.sfi and intel/ibt-11-5.ddc
firmware files from linux-firmware repository.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Minjune Kim [Thu, 27 Aug 2015 04:21:52 +0000 (13:21 +0900)]
Bluetooth: btusb: Correct typos based on checkpatch.pl
Signed-off-by: Minjune Kim <infinite.minjune.kim@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Loic Poulain [Tue, 25 Aug 2015 15:55:44 +0000 (17:55 +0200)]
Bluetooth: hci_intel: Add Intel baudrate configuration support
Implement the set_baudrate callback for hci_intel.
- Controller requires a read Intel version command before updating
its baudrate.
- The operation consists in an async cmd since the controller does
not respond at the same speed.
- Wait 100ms to let the controller change its baudrate.
- Clear RTS until we change our own UART speed
Manage speed change in the setup function, we need to restore the oper
speed once chip has booted on patched firmware.
Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Loic Poulain [Mon, 24 Aug 2015 16:57:57 +0000 (18:57 +0200)]
Bluetooth: hci_uart: Fix zero len data packet reception issue
Packets with a variable length value equal to zero were not received.
Since no more data expected (and input buffer entirely consumed), we
need to complete/forward the packet immediately instead of waiting for
more data.
Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Stefan Schmidt [Thu, 20 Aug 2015 10:09:47 +0000 (12:09 +0200)]
nl802154: stricter input checking for boolean inputs
So far we handled boolean input by forcing them with !! and assigning
them into a bool. This allowed userspace to send values > 1 which were
used as 1. We should be stricter here and return -EINVAL for all but
0 or 1.
Signed-off-by: Stefan Schmidt <stefan@osg.samsung.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Nicholas Krause [Wed, 19 Aug 2015 01:23:01 +0000 (21:23 -0400)]
Bluetooth: Make the function sco_conn_del have a return type of void
This makes the function sco_conn_del have a return type of void now
due to this function always running successfully and thus never
needing to signal its caller when a non recoverable internal failure
occurs by returning a error code to its respective caller.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Shengzhen Li [Wed, 19 Aug 2015 10:12:19 +0000 (03:12 -0700)]
Bluetooth: btmrvl: change device pointer passed to dev_coredumpv
This change ensures we will get driver name as 'btmrvl_sdio'
in udev event.
Signed-off-by: Shengzhen Li <szli@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
David S. Miller [Fri, 28 Aug 2015 04:45:31 +0000 (21:45 -0700)]
Merge git://git./linux/kernel/git/davem/net
Linus Torvalds [Fri, 28 Aug 2015 00:59:17 +0000 (17:59 -0700)]
Merge tag 'powerpc-4.2-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fix MSI/MSI-X on pseries from Guilherme"
* tag 'powerpc-4.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/PCI: Disable MSI/MSI-X interrupts at PCI probe time in OF case
PCI: Make pci_msi_setup_pci_dev() non-static for use by arch code
Linus Torvalds [Fri, 28 Aug 2015 00:52:38 +0000 (17:52 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"Some straggler bug fixes here:
1) Netlink_sendmsg() doesn't check iterator type properly in mmap
case, from Ken-ichirou MATSUZAWA.
2) Don't sleep in atomic context in bcmgenet driver, from Florian
Fainelli.
3) The pfkey_broadcast() code patch can't actually ever use anything
other than GFP_ATOMIC. And the cases that right now pass
GFP_KERNEL or similar will currently trigger an RCU splat. Just
use GFP_ATOMIC unconditionally. From David Ahern.
4) Fix FD bit timings handling in pcan_usb driver, from Marc
Kleine-Budde.
5) Cache dst leaked in ip6_gre tunnel removal, fix from Huaibin Wang.
6) Traversal into drivers/net/ethernet/renesas should be triggered by
CONFIG_NET_VENDOR_RENESAS, not a particular driver's config
option. From Kazuya Mizuguchi.
7) Fix regression in handling of igmp_join errors in vxlan, from
Marcelo Ricardo Leitner.
8) Make phy_{read,write}_mmd_indirect() properly take the mdio_lock
mutex when programming the registers. From Russell King.
9) Fix non-forced handling in u32_destroy(), from WANG Cong.
10) Test the EVENT_NO_RUNTIME_PM flag before it is cleared in
usbnet_stop(), from Eugene Shatokhin.
11) In sfc driver, don't fetch statistics firmware isn't capable of,
from Bert Kenward.
12) Verify ASCONF address parameter location in SCTP, from Xin Long"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
sctp: donot reset the overall_error_count in SHUTDOWN_RECEIVE state
sctp: asconf's process should verify address parameter is in the beginning
sfc: only use vadaptor stats if firmware is capable
net: phy: fixed: propagate fixed link values to struct
usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared
drivers: net: xgene: fix: Oops in linkwatch_fire_event
cls_u32: complete the check for non-forced case in u32_destroy()
net: fec: use reinit_completion() in mdio accessor functions
net: phy: add locking to phy_read_mmd_indirect()/phy_write_mmd_indirect()
vxlan: re-ignore EADDRINUSE from igmp_join
net: compile renesas directory if NET_VENDOR_RENESAS is configured
ip6_gre: release cached dst on tunnel removal
phylib: Make PHYs children of their MDIO bus, not the bus' parent.
can: pcan_usb: don't provide CAN FD bittimings by non-FD adapters
net: Fix RCU splat in af_key
net: bcmgenet: fix uncleaned dma flags
net: bcmgenet: Avoid sleeping in bcmgenet_timeout
netlink: mmap: fix tx type check
Linus Torvalds [Fri, 28 Aug 2015 00:46:06 +0000 (17:46 -0700)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull nvdimm fixlet from Dan Williams:
"This is a libnvdimm ABI fixup.
I pushed back on this change quite hard given the late date, that it
appears to be purely cosmetic, sysfs is not necessarily meant to be a
user friendly UI, and the kernel interprets the reversed polarity of
the ACPI_NFIT_MEM_ARMED flag correctly. When this flag is set, the
energy source of an NVDIMM is not armed and any new writes to the DIMM
may not be preserved.
However, Bob Moore warned me that it is important to get these things
named correctly wherever they appear otherwise we run the risk of a
less than cautious firmware engineer implementing the polarity the
wrong way. Once a mistake like that escapes into production platforms
the flag becomes useless and we need to move to a new bit position.
Bob has agreed to take a change through ACPICA to rename
ACPI_NFIT_MEM_ARMED to ACPI_NFIT_MEM_NOT_ARMED, and the patch below
from Toshi brings the sysfs representation of these flags in line with
their respective polarities.
Please pull for 4.2 as this is the first kernel to expose the ACPI
NFIT sysfs representation, and this is likely a kernel that firmware
developers will be using for checking out their NVDIMM enabling"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nfit: Clarify memory device state flags strings
David S. Miller [Fri, 28 Aug 2015 00:14:30 +0000 (17:14 -0700)]
Merge branch 'iff_no_queue_fixups'
Phil Sutter says:
====================
fixup IFF_NO_QUEUE conversion
This series serves two purposes:
On one hand it fixes a quite embarrassing bug around the warning I added for
drivers still setting tx_queue_len = 0 to achieve noqueue operation. It turned
out to be quite useless as due to using alloc_netdev(), many in-kernel drivers
fell into the trap by accident, as well. Instead this place serves pretty well
as a sanitizing point to set IFF_NO_QUEUE for drivers not initializing
tx_queue_len, which in turn allows to drop all special treatment of the latter
being zero since that can not happen anymore without IFF_NO_QUEUE being set.
On the other hand, it provides a better solution for Eric Dumazet's concern
regarding how to assign noqueue to an interface which does not default to it
already. In order to make this possible, noqueue is being registered so users
can 'tc qd add dev eth0 root noqueue'. In addition, it resolves the ugly
situation of 'tc qd show' not showing noqueue. Finally, the former changes
allow for some code cleanup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Thu, 27 Aug 2015 19:21:39 +0000 (21:21 +0200)]
net: sched: simplify attach_one_default_qdisc()
Now that noqueue qdisc can be attached just like any other qdisc, no
special treatment is necessary anymore when attaching it as default
qdisc.
This change has the added benefit that 'tc qdisc show' prints noqueue
instead of nothing for devices defaulting to noqueue.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Thu, 27 Aug 2015 19:21:38 +0000 (21:21 +0200)]
net: sched: register noqueue qdisc
This way users can attach noqueue just like any other qdisc using tc
without having to mess with tx_queue_len first.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Thu, 27 Aug 2015 19:21:37 +0000 (21:21 +0200)]
net: sched: ignore tx_queue_len when assigning default qdisc
Since alloc_netdev_mqs() sets IFF_NO_QUEUE for drivers not initializing
tx_queue_len, it is safe to assume that if tx_queue_len is zero,
dev->priv flags always contains IFF_NO_QUEUE.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Thu, 27 Aug 2015 19:21:36 +0000 (21:21 +0200)]
net: fix IFF_NO_QUEUE for drivers using alloc_netdev
Printing a warning in alloc_netdev_mqs() if tx_queue_len is zero and
IFF_NO_QUEUE not set is not appropriate since drivers may use one of the
alloc_netdev* macros instead of alloc_etherdev*, thereby not
intentionally leaving tx_queue_len uninitialized. Instead check here if
tx_queue_len is zero and set IFF_NO_QUEUE, so the value of tx_queue_len
can be ignored in net/sched_generic.c.
Fixes: 906470c ("net: warn if drivers set tx_queue_len = 0")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Sacren [Fri, 28 Aug 2015 00:05:49 +0000 (18:05 -0600)]
sock: fix kernel doc error
The symbol '__sk_reclaim' is not present in the current tree. Apparently
'__sk_reclaim' was meant to be '__sk_mem_reclaim', so fix it with the
right symbol name for the kernel doc.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Hideo Aoki <haoki@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
lucien [Wed, 26 Aug 2015 20:52:20 +0000 (04:52 +0800)]
sctp: donot reset the overall_error_count in SHUTDOWN_RECEIVE state
Commit
f8d960524328 ("sctp: Enforce retransmission limit during shutdown")
fixed a problem with excessive retransmissions in the SHUTDOWN_PENDING by not
resetting the association overall_error_count. This allowed the association
to better enforce assoc.max_retrans limit.
However, the same issue still exists when the association is in SHUTDOWN_RECEIVED
state. In this state, HB-ACKs will continue to reset the overall_error_count
for the association would extend the lifetime of association unnecessarily.
This patch solves this by resetting the overall_error_count whenever the current
state is small then SCTP_STATE_SHUTDOWN_PENDING. As a small side-effect, we
end up also handling SCTP_STATE_SHUTDOWN_ACK_SENT and SCTP_STATE_SHUTDOWN_SENT
states, but they are not really impacted because we disable Heartbeats in those
states.
Fixes: Commit f8d960524328 ("sctp: Enforce retransmission limit during shutdown")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Carol L Soto [Thu, 27 Aug 2015 19:43:26 +0000 (14:43 -0500)]
net/mlx4_core: Fix unintialized variable used in error path
The uninitialized value name in mlx4_en_activate_cq was used in order
to print an error message. Fixing it by replacing it with cq->vector.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Carol L Soto [Thu, 27 Aug 2015 19:43:25 +0000 (14:43 -0500)]
net/mlx4_core: Capping number of requested MSIXs to MAX_MSIX
We currently manage IRQs in pool_bm which is a bit field
of MAX_MSIX bits. Thus, allocating more than MAX_MSIX
interrupts can't be managed in pool_bm.
Fixing this by capping number of requested MSIXs to
MAX_MSIX.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Thu, 27 Aug 2015 21:19:20 +0000 (14:19 -0700)]
bridge: fdb: rearrange net_bridge_fdb_entry
While looking into fixing the local entries scalability issue I noticed
that the structure is badly arranged because vlan_id would fall in a
second cache line while keeping rcu which is used only when deleting
in the first, so re-arrange the structure and push rcu to the end so we
can get 16 bytes which can be used for other fields (by pushing rcu
fully in the second 64 byte chunk). With this change all the core
necessary information when doing fdb lookups will be available in a
single cache line.
pahole before (note vlan_id):
struct net_bridge_fdb_entry {
struct hlist_node hlist; /* 0 16 */
struct net_bridge_port * dst; /* 16 8 */
struct callback_head rcu; /* 24 16 */
long unsigned int updated; /* 40 8 */
long unsigned int used; /* 48 8 */
mac_addr addr; /* 56 6 */
unsigned char is_local:1; /* 62: 7 1 */
unsigned char is_static:1; /* 62: 6 1 */
unsigned char added_by_user:1; /* 62: 5 1 */
unsigned char added_by_external_learn:1; /* 62: 4 1 */
/* XXX 4 bits hole, try to pack */
/* XXX 1 byte hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
__u16 vlan_id; /* 64 2 */
/* size: 72, cachelines: 2, members: 11 */
/* sum members: 65, holes: 1, sum holes: 1 */
/* bit holes: 1, sum bit holes: 4 bits */
/* padding: 6 */
/* last cacheline: 8 bytes */
}
pahole after (note vlan_id):
struct net_bridge_fdb_entry {
struct hlist_node hlist; /* 0 16 */
struct net_bridge_port * dst; /* 16 8 */
long unsigned int updated; /* 24 8 */
long unsigned int used; /* 32 8 */
mac_addr addr; /* 40 6 */
__u16 vlan_id; /* 46 2 */
unsigned char is_local:1; /* 48: 7 1 */
unsigned char is_static:1; /* 48: 6 1 */
unsigned char added_by_user:1; /* 48: 5 1 */
unsigned char added_by_external_learn:1; /* 48: 4 1 */
/* XXX 4 bits hole, try to pack */
/* XXX 7 bytes hole, try to pack */
struct callback_head rcu; /* 56 16 */
/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
/* size: 72, cachelines: 2, members: 11 */
/* sum members: 65, holes: 1, sum holes: 7 */
/* bit holes: 1, sum bit holes: 4 bits */
/* last cacheline: 8 bytes */
}
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Aug 2015 23:35:51 +0000 (16:35 -0700)]
Merge branch 'ovs-v6-build-err'
Joe Stringer says:
====================
OPENVSWITCH && !NETFILTER build fix.
Fix issues reported by kbuild test robot:
All error/warnings (new ones prefixed by >>):
net/openvswitch/actions.c: In function 'ovs_fragment':
>> net/openvswitch/actions.c:705:16: error: implicit declaration of
function 'nf_get_ipv6_ops' [-Werror=implicit-function-declaration]
const struct nf_ipv6_ops *v6ops = nf_get_ipv6_ops();
^
>> net/openvswitch/actions.c:705:37: warning: initialization makes
pointer from integer without a cast
const struct nf_ipv6_ops *v6ops = nf_get_ipv6_ops();
^
>> net/openvswitch/actions.c:707:19: error: storage size of 'ovs_rt'
isn't known
struct rt6_info ovs_rt;
^
>> net/openvswitch/actions.c:724:8: error: dereferencing pointer to
incomplete type
v6ops->fragment(skb->sk, skb, ovs_vport_output);
^
>> net/openvswitch/actions.c:707:19: warning: unused variable 'ovs_rt'
[-Wunused-variable]
struct rt6_info ovs_rt;
^
cc1: some warnings being treated as errors
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Stringer [Thu, 27 Aug 2015 22:25:46 +0000 (15:25 -0700)]
openvswitch: Include ip6_fib.h.
kbuild test robot reports that certain configurations will not
automatically pick up on the "struct rt6_info" definition, so explicitly
include the header for this structure.
Fixes: 7f8a436 "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Stringer [Thu, 27 Aug 2015 22:25:45 +0000 (15:25 -0700)]
netfilter: Define v6ops in !CONFIG_NETFILTER case.
When CONFIG_OPENVSWITCH is set, and CONFIG_NETFILTER is not set, the
openvswitch IPv6 fragmentation handling cannot refer to ipv6_ops because
it isn't defined. Add a dummy version to avoid #ifdefs in source files.
Fixes: 7f8a436 "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Aug 2015 23:31:17 +0000 (16:31 -0700)]
Merge branch 'mlxsw-small-updates'
Jiri Pirko says:
====================
mlxsw: small driver update
Ido Schimmel (2):
mlxsw: Remove duplicate included header
mlxsw: Make mailboxes 4KB aligned
Jiri Pirko (1):
mlxsw: adjust transmit fail log message level in __mlxsw_emad_transmit
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 27 Aug 2015 15:59:57 +0000 (17:59 +0200)]
mlxsw: Make mailboxes 4KB aligned
The HW-SW contract requires mailboxes passed to the firmware to be 4KB
aligned. Previously, these mailboxes were mapped using streaming DMA
routines, which do not guarantee the bus addresses to be 4KB aligned.
Under certain conditions this constraint was indeed violated and errors
were observed.
By using consistent DMA mapping routines together with a mailbox size of
4KB we are guaranteed not to violate the constraint.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>