firefly-linux-kernel-4.4.55.git
11 years agotile: support rx_dropped/rx_errors in tilepro net driver
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: support rx_dropped/rx_errors in tilepro net driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: set hw_features and vlan_features in setup
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: set hw_features and vlan_features in setup

This change allows the user to configure various features of the tile
networking drivers on and off.  There is no change to the default
initialization state of either the tilegx or tilepro drivers.

Neither driver needs the ndo_fix_features or ndo_set_features callbacks,
since the generic code already handles the dependencies for
fix_features, and there is no hardware state to tweak in set_features.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agogianfar: Remove unused field grp_id from gfar_priv_grp
Claudiu Manoil [Thu, 1 Aug 2013 07:07:05 +0000 (10:07 +0300)]
gianfar: Remove unused field grp_id from gfar_priv_grp

grp->grp_id is obsolete. It has no use in the current driver.
Remove it from gfar_priv_grp and put the 'rstat' member
in its place, in the 2nd cache line, as rstat needs fast access.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: add a temporary sanity check in skb_orphan()
Eric Dumazet [Thu, 1 Aug 2013 18:43:08 +0000 (11:43 -0700)]
net: add a temporary sanity check in skb_orphan()

David suggested to add a BUG_ON() to catch if some layer
sets skb->sk pointer without a corresponding destructor.

As skb can sit in a queue, it's mandatory to make sure the
socket cannot disappear, and it's usually done by taking a
reference on the socket, then releasing it from the skb
destructor.

This patch is a follow-up to commit c34a761231b5
("net: skb_orphan() changes") and will be reverted after
catching all possible offenders if any.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: fib6_rules should return exact return value
Hannes Frederic Sowa [Thu, 1 Aug 2013 06:54:47 +0000 (08:54 +0200)]
ipv6: fib6_rules should return exact return value

With the addition of the suppress operation
(7764a45a8f1fe74d4f7d301eaca2e558e7e2831a ("fib_rules: add .suppress
operation") we rely on accurate error reporting of the fib_rules.actions.

fib6_rule_action always returned -EAGAIN in case we could not find a
matching route and 0 if a rule was matched. This also included a match
for blackhole or prohibited rule actions which could get suppressed by
the new logic.

So adapt fib6_rule_action to always return the correct error code as
its counterpart fib4_rule_action does. This also fixes a possiblity of
nullptr-deref where we don't find a table, thus rt == NULL. Because
the condition rt != ip6_null_entry still holdes it seems we could later
get a nullptr bug on dereference rt->dst.

v2:
a) Fixed a brain fart in the commit msg (the rule => a table, etc). No
   changes to the patch.

Cc: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocls_cgroup.h netprio_cgroup.h: Remove extern from function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:39 +0000 (17:31 -0700)]
cls_cgroup.h netprio_cgroup.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agochecksum: Remove extern from function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:38 +0000 (17:31 -0700)]
checksum: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocfg80211.h/mac80211.h: Remove extern from function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:37 +0000 (17:31 -0700)]
cfg80211.h/mac80211.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoax25.h: Remove extern from function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:36 +0000 (17:31 -0700)]
ax25.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoarp/neighbour.h: Remove extern from function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:35 +0000 (17:31 -0700)]
arp/neighbour.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoaf_rxrpc.h: Remove extern from function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:34 +0000 (17:31 -0700)]
af_rxrpc.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoaf_unix.h: Remove extern from function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:33 +0000 (17:31 -0700)]
af_unix.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoaddrconf.h: Remove extern function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:32 +0000 (17:31 -0700)]
addrconf.h: Remove extern function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoDocumentation: add networking/netdev-FAQ.txt
Paul Gortmaker [Wed, 31 Jul 2013 19:16:20 +0000 (15:16 -0400)]
Documentation: add networking/netdev-FAQ.txt

A collection of expectations and operational details about how
networking development takes place in the context of the netdev
mailing list.

The content is meant to capture specific items that are unique
to netdev workflow, and not re-document generic linux expectations
that are already captured elsewhere.

This was originally proposed[1] as a regular posting mailing list
FAQ, but it probably is more universally accessible here in tree.

[1] https://lwn.net/Articles/559211/

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agofib_rules: add .suppress operation
Stefan Tomanek [Thu, 1 Aug 2013 00:17:15 +0000 (02:17 +0200)]
fib_rules: add .suppress operation

This change adds a new operation to the fib_rules_ops struct; it allows the
suppression of routing decisions if certain criteria are not met by its
results.

The first implemented constraint is a minimum prefix length added to the
structures of routing rules. If a rule is added with a minimum prefix length
>0, only routes meeting this threshold will be considered. Any other (more
general) routing table entries will be ignored.

When configuring a system with multiple network uplinks and default routes, it
is often convinient to reference the main routing table multiple times - but
omitting the default route. Using this patch and a modified "ip" utility, this
can be achieved by using the following command sequence:

  $ ip route add table secuplink default via 10.42.23.1

  $ ip rule add pref 100            table main prefixlength 1
  $ ip rule add pref 150 fwmark 0xA table secuplink

With this setup, packets marked 0xA will be processed by the additional routing
table "secuplink", but only if no suitable route in the main routing table can
be found. By using a minimal prefixlength of 1, the default route (/0) of the
table "main" is hidden to packets processed by rule 100; packets traveling to
destinations with more specific routing entries are processed as usual.

Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: Remove extern from include/net/ scheduling prototypes
Joe Perches [Wed, 31 Jul 2013 05:47:13 +0000 (22:47 -0700)]
net: Remove extern from include/net/ scheduling prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: skb_orphan() changes
Eric Dumazet [Tue, 30 Jul 2013 23:11:15 +0000 (16:11 -0700)]
net: skb_orphan() changes

It is illegal to set skb->sk without corresponding destructor.

Its therefore safe for skb_orphan() to not clear skb->sk if
skb->destructor is not set.

Also avoid clearing skb->destructor if already NULL.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetem: Introduce skb_orphan_partial() helper
Eric Dumazet [Wed, 31 Jul 2013 00:55:08 +0000 (17:55 -0700)]
netem: Introduce skb_orphan_partial() helper

Commit 547669d483e578 ("tcp: xps: fix reordering issues") added
unexpected reorders in case netem is used in a MQ setup for high
performance test bed.

ETH=eth0
tc qd del dev $ETH root 2>/dev/null
tc qd add dev $ETH root handle 1: mq
for i in `seq 1 32`
do
 tc qd add dev $ETH parent 1:$i netem delay 100ms
done

As all tcp packets are orphaned by netem, TCP stack believes it can
set skb->ooo_okay on all packets.

In order to allow producers to send more packets, we want to
keep sk_wmem_alloc from reaching sk_sndbuf limit.

We can do that by accounting one byte per skb in netem queues,
so that TCP stack is not fooled too much.

Tested:

With above MQ/netem setup, scaling number of concurrent flows gives
linear results and no reorders/retransmits

lpq83:~# for n in 1 10 20 30 40 50 60 70 80 90 100
 do echo -n "n:$n " ; ./super_netperf $n -H 10.7.7.84; done
n:1 198.46
n:10 2002.69
n:20 4000.98
n:30 6006.35
n:40 8020.93
n:50 10032.3
n:60 12081.9
n:70 13971.3
n:80 16009.7
n:90 17117.3
n:100 17425.5

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: split rt_genid for ipv4 and ipv6
fan.du [Tue, 30 Jul 2013 00:33:53 +0000 (08:33 +0800)]
net: split rt_genid for ipv4 and ipv6

Current net name space has only one genid for both IPv4 and IPv6, it has below
drawbacks:

- Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
- Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
  entries even when the policy is only applied for one address family.

Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
separately in a fine granularity.

Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosh_eth: r8a7790: Handle the RFE (Receive FIFO overflow Error) interrupt
Laurent Pinchart [Wed, 31 Jul 2013 07:42:11 +0000 (16:42 +0900)]
sh_eth: r8a7790: Handle the RFE (Receive FIFO overflow Error) interrupt

The RFE interrupt is enabled for the r8a7790 but isn't handled,
resulting in the interrupts core noticing unhandled interrupts, and
eventually disabling the ethernet IRQ.

Fix it by adding RFE to the bitmask of error interrupts to be handled
for r8a7790.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Wed, 31 Jul 2013 20:37:47 +0000 (13:37 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to ixgbe and pci.

The first patch for ixgbe from Greg Rose is the second submission.  The
first submission of "ixgbe: Retain VLAN filtering in promiscuous + VT
mode" had a typo, which Joe Perches pointed out and is fixed in this
submission.

Alex updates the ixgbe driver to use the generic helper pci_vfs_assigned
instead of the driver specific function ixgbe_vfs_are_assigned.

Don Skidmore provides 4 patches for ixgbe, the first being a fix for
flow control ethtool reporting.  Originally ixgbe_device_supports_autoneg_fc()
was expected to be called by only copper devices, which lead to false
information being displayed via ethtool.  Two other patches add support
for fixed fiber for SFP+ devices and the addition of a quad-port x520
adapter.  The last patch simply bumps the driver version.

Emil Tantilov provides 3 fixes for ixgbe, two of which resolve
semaphore lock issues.  The third fix resolves several issues in the
previous implementation of the SFF data dumps of SFP+ modules.

The remaining ixgbe and pci patches are from Jacob Keller.  The pci
patches exposes bus speed, link speed and bus width so that drivers
can take advantage of this information.  In addition, adds a pci function
which obtains minimum link width and speed.  Jacob also provides the
ixgbe patch to incorporate the pci function. He provides a patch that
fixes a lockdep issue created due to ixgbe_ptp_stop always running
cancel_work_sync even if the work item had not been created properly with
INIT_WORK. This issue was found and reported by Stephen Hemminger.

-v2-
* fix patch 3 to be a bool function based on David Miller's feedback
* fix patch 4 debug message based on David Miller's feedback
* fix patch 8 moved the extern declarations to pci.h based on Bjorn
  Helgaas's feedback
* fix patch 11 update the error message to include encoding loss based
* fix patch 8/9/10 title based on Bjorn's feedback
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: Remove unused tcpct declarations and comments
Dmitry Popov [Wed, 31 Jul 2013 09:39:45 +0000 (13:39 +0400)]
tcp: Remove unused tcpct declarations and comments

Remove declaration, 4 defines and confusing comment that are no longer used
since 1a2c6181c4 ("tcp: Remove TCPCT").

Signed-off-by: Dmitry Popov <dp@highloadlab.com>
Acked-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoixgbe: add support for quad-port x520 adapter
Don Skidmore [Sat, 27 Jul 2013 06:25:38 +0000 (06:25 +0000)]
ixgbe: add support for quad-port x520 adapter

This is a x520 based quad-port (4x10Gbps) NIC with a single QSFP+
connector.  Changes were required to our identify functions due to
different eeprom address which is also included here.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: clear semaphore bits on timeouts
Emil Tantilov [Tue, 23 Jul 2013 01:57:03 +0000 (01:57 +0000)]
ixgbe: clear semaphore bits on timeouts

This patch changes the error code path in ixgbe_acquire_swfw_sync() to deal
with cases where acquiring SW semaphore times out.

In cases where the SW/FW semaphore bits were set (i.e. due to a crash) the
driver will hang on load. With this patch the driver will clear
the stuck bits if the semaphore was not acquired in the allotted time.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: rename LL_EXTENDED_STATS to use queue instead of q
Jacob Keller [Tue, 16 Jul 2013 07:57:46 +0000 (07:57 +0000)]
ixgbe: rename LL_EXTENDED_STATS to use queue instead of q

This patch renames the stats introduced by the busy poll feature so that they
are more inline with the current statistics naming schemes.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: fix lockdep annotation issue for ptp's work item
Jacob Keller [Fri, 21 Jun 2013 08:14:32 +0000 (08:14 +0000)]
ixgbe: fix lockdep annotation issue for ptp's work item

This patch fixes a lockdep issue created due to ixgbe_ptp_stop always running
cancel_work_sync even if the work item had not been created properly with
INIT_WORK. This is caused because ixgbe_ptp_stop did not check to actually
ensure PTP was running first. The new implementation introduces a state in the
&adapter->state field which is used to indicate that PTP is running. (This
replaces the IXGBE_FLAG2_PTP_ENABLED field). This state will use the atomic
set_bit, test_bit, and test_and_clear_bit functions. ixgbe_ptp_stop will check
to ensure that PTP was enabled, (and if not, it will not attempt to do any
cleanup work from ixgbe_ptp_init). This resolves the lockdep annotation warning
found by Stephen Hemminger

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: call pcie_get_mimimum_link to check if device has enough bandwidth
Jacob Keller [Wed, 31 Jul 2013 06:53:31 +0000 (06:53 +0000)]
ixgbe: call pcie_get_mimimum_link to check if device has enough bandwidth

This patch uses the new pcie_get_minimum_link function to perform a check to
ensure that the adapter is hooked into a slot which is capable of providing the
necessary bandwidth. This check supersedes the original method which only
checked the current pci device. The new method is capable of determining the
minimum speed and link of an entire PCI chain.

-v2-
* update the error message to include encoding loss

CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoPCI: Add function to obtain minimum link width and speed
Jacob Keller [Wed, 31 Jul 2013 06:53:26 +0000 (06:53 +0000)]
PCI: Add function to obtain minimum link width and speed

A PCI Express device can potentially report a link width and speed which it will
not properly fulfill due to being plugged into a slower link higher in the
chain. This function walks up the PCI bus chain and calculates the minimum link
width and speed of this entire chain. This can be useful to enable a device to
determine if it has enough bandwidth for optimum functionality.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agonet: remove an unneeded check
Dan Carpenter [Mon, 29 Jul 2013 19:15:19 +0000 (22:15 +0300)]
net: remove an unneeded check

"ifa->ifa_label" is an array inside the in_ifaddr struct.  It can never
be NULL so we can remove this check.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoflow_dissector: add support for IPPROTO_IPV6
Tom Herbert [Mon, 29 Jul 2013 18:07:42 +0000 (11:07 -0700)]
flow_dissector: add support for IPPROTO_IPV6

Support IPPROTO_IPV6 similar to IPPROTO_IPIP

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoflow_dissector: clean up IPIP case
Tom Herbert [Mon, 29 Jul 2013 18:07:36 +0000 (11:07 -0700)]
flow_dissector: clean up IPIP case

Explicitly set proto to ETH_P_IP and jump directly to ip processing.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoPCI: move enum pcie_link_width into pci.h
Jacob Keller [Wed, 31 Jul 2013 06:53:21 +0000 (06:53 +0000)]
PCI: move enum pcie_link_width into pci.h

pcie_link_width is the enum used to define the link width values for a pcie
device. This enum should not be contained solely in pci_hotplug.h, and this
patch moves it next to pci_bus_speed in pci.h

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoPCI: expose pcie_link_speed and pcix_bus_speed arrays
Jacob Keller [Wed, 31 Jul 2013 06:53:16 +0000 (06:53 +0000)]
PCI: expose pcie_link_speed and pcix_bus_speed arrays

pcie_link_speed and pcix_bus_speed are arrays used by probe.c to correctly
convert lnksta register values into the pci_bus_speed enum. These static arrays
are useful outside probe for this purpose. This patch makes these defines into
conist arrays and exposes them with an extern header in drivers/pci/pci.h

-v2-
* move extern declarations to drivers/pci/pci.h

CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: fix SFF data dumps of SFP+ modules
Emil Tantilov [Wed, 29 May 2013 06:23:10 +0000 (06:23 +0000)]
ixgbe: fix SFF data dumps of SFP+ modules

This patch fixes several issues with the previous implementation of the
SFF data dump of SFP+ modules:

- removed the __IXGBE_READ_I2C flag - I2C access locking is handled in the
  HW specific routines

- fixed the read loop to read data from ee->offset to ee->len

- the reads fail if __IXGBE_IN_SFP_INIT is set in the process - this is
  needed because on some HW I2C operations can take long time and disrupt
  the SFP and link detection process

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: fix semaphore lock for I2C read/writes on 82598
Emil Tantilov [Wed, 29 May 2013 06:23:05 +0000 (06:23 +0000)]
ixgbe: fix semaphore lock for I2C read/writes on 82598

ixgbe_read/write_i2c_phy_82598() does not hold the SWFW_SYNC
semaphore for the entire function. Instead the lock is held only
during the phy.ops.read/write_reg operations. As result when the
function is being called simultaneously the I2C read/writes can
be corrupted.

The following patch introduces the SWFW_SYNC semaphore for the
entire ixgbe_read/write_i2c_phy_82598() function. To accomplish
this I had to create 2 separate functions:

ixgbe_read_phy_reg_mdi()
ixgbe_write_phy_reg_mdi()

Those functions are identical to ixgbe_read/write_phy_reg_generic()
sans the locking, and can be used in ixgbe_read/write_i2c_phy_82598()
with the SWFW_SYNC semaphore being held.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: bump version number
Don Skidmore [Wed, 15 May 2013 07:34:50 +0000 (07:34 +0000)]
ixgbe: bump version number

Bump the version number to better match with a similar version of the
out of tree driver.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: add new media type.
Don Skidmore [Wed, 31 Jul 2013 02:17:40 +0000 (02:17 +0000)]
ixgbe: add new media type.

This patch adds support for a new media type fiber_fixed.  This is useful
to avoid all the SFP+ hot plug support path on devices who's fix fiber need
not worry about such things.  This patch is needed for a following patch
that adds support for "fiber_fixed" devices.

v2: cleaned up logging message based on feedback from David Miller

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoMerge branch 'phys_port'
David S. Miller [Wed, 31 Jul 2013 00:32:12 +0000 (17:32 -0700)]
Merge branch 'phys_port'

Jiri Pirko says:

====================
This patchset is based on patch by Narendra_K@Dell.com
Once device which can change phys port id during its lifetime adopts this,
NETDEV_CHANGEPHYSPORTID event will be added and driver will call
call_netdevice_notifiers(NETDEV_NETDEV_CHANGEPHYSPORTID, dev) to propagate
the change to userspace.

v1->v2: as suggested by Ben, handle -EOPNOTSUPP in rtnl code (wrapped up ndo call)
v2->v3: adjusted patch 1 commit message
v3->v4: used "%phN" for sysfs printf as suggested by DaveM
        added igb/igbvf implementation as requested by Or Gerlitz
v4->v5: used prandom_u32 to generate id in igb_probe
        removed duplicate code in ibgvf_probe
        pushed dev_err string into one line in igbvf_refresh_ppid
v5->v6: use uuid_le_gen for generating 16-byte phys port id for igb/igbvf
as suggested by BenH

1) Why do we need this, and why do existing facilities fail to provide
   a way to accomplish this?

Currenty there's very hard to tell if two netdevs are using the same physical
port. For sr-iov this can be get by sysfs. For other mechanisms, like NPAR
there's very hard to do it (one must learn it from NIC BIOS). But even for
sr-iov there's no way to say if two netdevs are using the same phys port when
these are passed through to virtual guests.

This patchset provides the generic way of letting this information know to
userspace. This info can be used by apps like NetworkManager, teamd, Wicked,
ovs daemon, etc, to do smarter bonding decisions.

2) Why is the physical port ID defined as a 32 byte opaque cookie?
   What formats and layouts need to be accomodated, and which
   influenced the design of the ID?

For user to distinguish if two netdevs are using the same port, he only needs
to compare their phys port ids. Nothing else is needed. This id has no
structure for security reasons. VF should not know anything about PF.

3) Are IDs globally unique?  Why or why not?  If IDs should be
   globally unique, but only in certain cases, what exactly are those
   cases.

Most of the time only uniqueness needed is in scope of single machine.
There might be case when the id should be unique between couple of machines
in virtualization environment. Given that for example for igb/igbvf 16B uuid
is used, there is no problem for this case as well. But each driver can
implement this differently focusing the hw capabilities and needs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: export physical port id via sysfs
Jiri Pirko [Mon, 29 Jul 2013 16:16:51 +0000 (18:16 +0200)]
net: export physical port id via sysfs

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agortnl: export physical port id via RT netlink
Jiri Pirko [Mon, 29 Jul 2013 16:16:50 +0000 (18:16 +0200)]
rtnl: export physical port id via RT netlink

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: add ndo to get id of physical port of the device
Jiri Pirko [Mon, 29 Jul 2013 16:16:49 +0000 (18:16 +0200)]
net: add ndo to get id of physical port of the device

This patch adds a ndo for getting physical port of the device. Driver
which is aware of being virtual function of some physical port should
implement this ndo. This is applicable not only for IOV, but for other
solutions (NPAR, multichannel) as well. Basically if there is possible
to have multiple netdevs on the single hw port.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoixgbe: fix fc autoneg ethtool reporting.
Don Skidmore [Wed, 31 Jul 2013 02:19:24 +0000 (02:19 +0000)]
ixgbe: fix fc autoneg ethtool reporting.

Originally ixgbe_device_supports_autoneg_fc() was only expected to
be called by copper devices.  This would lead to false information
to be displayed via ethtool.

v2: changed ixgbe_device_supports_autoneg_fc() to a bool function,
    it returns bool.  Based on feedback from David Miller

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: Use pci_vfs_assigned instead of ixgbe_vfs_are_assigned
Alexander Duyck [Tue, 26 Mar 2013 00:03:21 +0000 (00:03 +0000)]
ixgbe: Use pci_vfs_assigned instead of ixgbe_vfs_are_assigned

This change makes it so that the ixgbe driver uses the generic helper
pci_vfs_assigned instead of the ixgbe specific function
ixgbe_vfs_are_assigned.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: Retain VLAN filtering in promiscuous + VT mode
Greg Rose [Fri, 22 Feb 2013 02:14:39 +0000 (02:14 +0000)]
ixgbe: Retain VLAN filtering in promiscuous + VT mode

When using the new bridge FDB interface to allow SR-IOV virtual function
network devices to communicate with SW bridged network devices the
physical function is placed into promiscuous mode and hardware VLAN
filtering is disabled.  This defeats the ability to use VLAN tagging
to isolate user networks.  When the device is in promiscuous mode and
VT mode simultaneously ensure that VLAN hardware filtering remains
enabled.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agonet: mvneta: support big endian
Thomas Petazzoni [Mon, 29 Jul 2013 13:21:28 +0000 (15:21 +0200)]
net: mvneta: support big endian

Use the "swap descriptor" feature of the hardware to properly swap the
descriptors when running in big endian mode. Since the swapping occurs
on 64 bits words, we also need to provide a separate structure layout
for the DMA descriptors between little endian and big endian mode,
like is done in the mv643xx_eth driver.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: mvneta: move the RX and TX desc macros outside of the structs
Thomas Petazzoni [Mon, 29 Jul 2013 13:21:27 +0000 (15:21 +0200)]
net: mvneta: move the RX and TX desc macros outside of the structs

The macros used for the various fields of the RX and TX descriptions
are currently declared next to those fields within the structure
definitions of the RX and TX descriptors.

However, in order to support big endian, we'll have to use the "swap
descriptors" features of the hardware, which swaps every byte within
each 64 bits word of the descriptors. This requires a separate
definition of the RX and TX descriptor structures for little and big
endian, as is done in the mv643xx_eth. Those macros can therefore no
longer be defined inside those structures.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agopktgen: Require CONFIG_INET due to use of IPv4 checksum function
Thomas Graf [Mon, 29 Jul 2013 11:44:15 +0000 (13:44 +0200)]
pktgen: Require CONFIG_INET due to use of IPv4 checksum function

Unlike for IPv6, the IPv4 checksum functions are only available
if CONFIG_INET is set.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: add tcp_syncookies mode to allow unconditionally generation of syncookies
Hannes Frederic Sowa [Fri, 26 Jul 2013 15:43:23 +0000 (17:43 +0200)]
tcp: add tcp_syncookies mode to allow unconditionally generation of syncookies

| If you want to test which effects syncookies have to your
| network connections you can set this knob to 2 to enable
| unconditionally generation of syncookies.

Original idea and first implementation by Eric Dumazet.

Cc: Florian Westphal <fw@strlen.de>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agodrivers: net: cpsw: Add support for set MAC address
Mugunthan V N [Thu, 25 Jul 2013 18:14:01 +0000 (23:44 +0530)]
drivers: net: cpsw: Add support for set MAC address

Adding support for setting MAC address to cpsw device via ndo_set_mac_address

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: handle 64-bit statistics in tilepro network driver
Chris Metcalf [Thu, 25 Jul 2013 16:41:15 +0000 (12:41 -0400)]
tile: handle 64-bit statistics in tilepro network driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years ago9p: client: remove unused code and any reference to "cancelled" function
Andi Shyti [Thu, 25 Jul 2013 08:54:24 +0000 (10:54 +0200)]
9p: client: remove unused code and any reference to "cancelled" function

This patch reverts commit

80b45261a0b263536b043c5ccfc4ba4fc27c2acc

which was implementing a 'cancelled' functionality to notify that
a cancelled request will not be replied.

This implementation was not used anywhere and therefore removed.

Signed-off-by: Andi Shyti <andi@etezian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobe2net: don't use dev_err when AER enabling fails
Ivan Vecera [Thu, 25 Jul 2013 14:10:55 +0000 (16:10 +0200)]
be2net: don't use dev_err when AER enabling fails

The driver uses dev_err when enabling of AER fails (e.g. PCIe AER is not
supported). The dev_info is more appropriate to avoid console pollution.

Cc: sathya.perla@emulex.com
Cc: subbu.seetharaman@emulex.com
Cc: ajit.khaparde@emulex.com
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotg3: Update version to 3.133
Nithin Sujir [Mon, 29 Jul 2013 20:58:40 +0000 (13:58 -0700)]
tg3: Update version to 3.133

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotg3: Fix UDP fragments treated as RMCP
Nithin Sujir [Mon, 29 Jul 2013 20:58:39 +0000 (13:58 -0700)]
tg3: Fix UDP fragments treated as RMCP

The 5762 devices sometimes incorrectly treat udp fragments as RMCP
packets and route to the APE. This patch sets the RX_MODE_IPV4_FRAG_FIX
bit for these devices which enables the proper behaviour.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotg3: Enable support for timesync gpio output
Nithin Sujir [Mon, 29 Jul 2013 20:58:38 +0000 (13:58 -0700)]
tg3: Enable support for timesync gpio output

The PTP_CAPABLE tg3 devices have a gpio output that is toggled when the
free running counter matches a watchdog value. This patch adds support
to set the watchdog and enable this feature.

Since the output is controlled via bits in the EAV_REF_CLCK_CTL
register, we have to read-modify-write it when we stop/resume.

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotg3: Implement the shutdown handler
Nithin Sujir [Mon, 29 Jul 2013 20:58:37 +0000 (13:58 -0700)]
tg3: Implement the shutdown handler

Also remove the call to tg3_power_down_prepare() in tg3_power_down()
since tg3_close() calls it.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotg3: Allow NVRAM programming when interface is down
Nithin Sujir [Mon, 29 Jul 2013 20:58:36 +0000 (13:58 -0700)]
tg3: Allow NVRAM programming when interface is down

Previously, when the interface was brought down, the driver would set
the power state to D3hot. In D3hot, we don't have access to the NVRAM.
This patch removes the call to set the power state to PCI_D3hot in
close. A following patch will implement the shutdown handler to properly
set the D3hot state when the system is going down.

Doing the above means that the TG3_PHYFLG_IS_LOW_POWER should not be
checked to validate access to the NVRAM.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotg3: Remove incorrect switch to aux power
Nithin Sujir [Mon, 29 Jul 2013 20:58:35 +0000 (13:58 -0700)]
tg3: Remove incorrect switch to aux power

During probe, the driver is incorrectly switching the power to Vaux on
the 5717 and later devices. At this point, we are in D0 state and
drawing maximum power. We also definitely have Vmain available. It
doesn't make sense to switch to Vaux since it has a lesser maximum power
draw and we might go over the limit. On a new system, we observe that
not all ports are recognized in some of the slots with this call in
place.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocnic: Update version to 2.5.17 and copyright year.
Michael Chan [Mon, 29 Jul 2013 02:04:00 +0000 (19:04 -0700)]
cnic: Update version to 2.5.17 and copyright year.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocnic: Add missing error checking for RAMROD_CMD_ID_CLOSE
Eddie Wai [Mon, 29 Jul 2013 02:03:59 +0000 (19:03 -0700)]
cnic: Add missing error checking for RAMROD_CMD_ID_CLOSE

Completion status field should also be checked for non-zero error
condition.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocnic: Update TCP options setup for iSCSI.
Eddie Wai [Mon, 29 Jul 2013 02:03:58 +0000 (19:03 -0700)]
cnic: Update TCP options setup for iSCSI.

Update TCP delayed ACK and timestamp options setup to match latest bnx2x
firmware.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocnic: Reset tcp_flags during cnic_cm_create().
Eddie Wai [Mon, 29 Jul 2013 02:03:57 +0000 (19:03 -0700)]
cnic: Reset tcp_flags during cnic_cm_create().

Without resetting it, the bnx2i driver cannot use different options for
different iSCSI connections.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocnic: Simplify cnic_release().
Michael Chan [Mon, 29 Jul 2013 02:03:56 +0000 (19:03 -0700)]
cnic: Simplify cnic_release().

Since unregister_netdevice_notifier() will replay the NETDEV_DOWN and
NETDEV_UNREGISTER_EVENTS, the cnic_dev_list will be cleaned up automatically.
The loop to cleanup the cnic_dev_list can be removed in cnic_release().

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocnic: Simplify netdev events handling.
Michael Chan [Mon, 29 Jul 2013 02:03:55 +0000 (19:03 -0700)]
cnic: Simplify netdev events handling.

After this earlier commit to simplify probing:

    commit 4bd9b0fffb193d2e288f67f81821af32df8d4349
    cnic, bnx2x, bnx2: Simplify cnic probing.

we can now reliably receive netdev events and we can simplify the handling
of these events.  We now remove the logic that tries to handle missed
NETDEV_REGISTER events.

This change will allow cleanup to be simplified in the next patch.  We can
now rely on the play back of netdev events during
unregister_netdevice_notifier() to cleanup the structures.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_core: Respond to operation request by firmware
Yevgeny Petrilin [Sun, 28 Jul 2013 15:54:21 +0000 (18:54 +0300)]
net/mlx4_core: Respond to operation request by firmware

This commit adds new firmware command and new firmware event.  The firmware
raises the MLX4_EVENT_TYPE_OP_REQUIRED event in order to signal the driver it
needs to perform an administrative operation throughout the MLX4_CMD_GET_OP_REQ
command. At the moment the supported operation is adding/removing multicast
entries which are used by the firmware for handling NCSI traffic in B0
steering mode.

Also, had to swap the order of mlx4_init_mcg_table() and
mlx4_init_eq_table() to make sure that driver will get events only after
resources are initialized to handle it.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_en: Fix BlueFlame race
Eugenia Emantayev [Thu, 25 Jul 2013 16:21:23 +0000 (19:21 +0300)]
net/mlx4_en: Fix BlueFlame race

Fix a race between BlueFlame flow and stamping in post send flow.
Example:
SW: Build WQE 0 on the TX buffer, except the ownership bit
SW: Set ownership for WQE 0 on the TX buffer
SW: Ring doorbell for WQE 0
SW: Build WQE 1 on the TX buffer, except the ownership bit
SW: Set ownership for WQE 1 on the TX buffer
HW: Read WQE 0 and then WQE 1, before doorbell was rung/BF was done for WQE 1
HW: Produce CQEs for WQE 0 and WQE 1
SW: Process the CQEs, and stamp WQE 0 and WQE 1 accordingly (on the TX buffer)
SW: Copy WQE 1 from the TX buffer to the BF register - ALREADY STAMPED!
HW: CQE error with index 0xFFFF  - the BF WQE's control segment is STAMPED,
so the BF index is 0xFFFF. Error: Invalid Opcode.
As a result QP enters the error state and no traffic can be sent.

Solution:
When stamping - do not stamp last completed wqe.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agopktgen: add needed include file
Stephen Rothwell [Mon, 29 Jul 2013 06:21:26 +0000 (16:21 +1000)]
pktgen: add needed include file

Fixes this on PowerPC (at least):

net/core/pktgen.c: In function 'fill_packet_ipv6':
net/core/pktgen.c:2906:3: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration]
   udph->check = ~csum_ipv6_magic(&iph->saddr, &iph->daddr, udplen, IPPROTO_UDP, 0);
   ^

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Sun, 28 Jul 2013 20:18:49 +0000 (13:18 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to e100 and e1000e.

The e100 patch from Andy simply updates the netif_printk() to use
%*ph to dump small buffers.

The changes to e1000e include a fix from Dean Nelson to resolve a
issue where a pci_clear_master() was accidentally dropped during a
conflict resolution. Wei Young provides 2 patches, one removes an
assignment of the default ring size because it was a duplicate. The
second changes the packet split receive structure to use
PS_PAGE_BUFFERS macro for the length so that problems won't occur
when the length is changed.

The remaining patches for e1000e are from Bruce Allan, where he
provides a number of fixes and updates for I218.  In addition, a
fix for 82583 which can disappear off the PCIe bus, to resolve this,
disable ASPM L1.  Bruce also provides a fix to a previous commit
(commit e60b22c5b7 e1000e: fix accessing to suspended device) so that
devices are only taken out of runtime power management for those
ethtool operations that must access device registers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv4, ipv6: send igmpv3/mld packets with TC_PRIO_CONTROL
Hannes Frederic Sowa [Fri, 26 Jul 2013 15:05:16 +0000 (17:05 +0200)]
ipv4, ipv6: send igmpv3/mld packets with TC_PRIO_CONTROL

v2:
a) Also send ipv4 igmp messages with TC_PRIO_CONTROL

Cc: William Manley <william.manley@youview.com>
Cc: Lukas Tribus <luky-37@hotmail.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoe1000e: fix I217/I218 PHY initialization flow
Bruce Allan [Sat, 29 Jun 2013 07:42:39 +0000 (07:42 +0000)]
e1000e: fix I217/I218 PHY initialization flow

The initialization of the PHY on I217/I218, while similar to 82579, must
also check to see if the MAC and PHY are in the same mode (PCIe vs. SMBus)
otherwise the PHY will be inaccessible by the MAC.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: do not resume device from RPM suspend to read PHY status registers
Bruce Allan [Sat, 29 Jun 2013 07:42:25 +0000 (07:42 +0000)]
e1000e: do not resume device from RPM suspend to read PHY status registers

When the device is runtime suspended (e.g. when there is no link), do not
wake it from D3 to read the PHY status; just set the values to typical
power-on defaults as is done when runtime PM is not enabled and there is no
link.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: enable support for new device IDs
Bruce Allan [Sat, 29 Jun 2013 01:15:16 +0000 (01:15 +0000)]
e1000e: enable support for new device IDs

The device IDs 0x15a0 and 0x15a1 are new SKUs that contain the same MAC as
I217 and same PHY as I218.

The device IDs 0x15a2 and 0x15a3 are the same as existing I218 SKUs.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: ethtool unnecessarily takes device out of RPM suspend
Bruce Allan [Thu, 27 Jun 2013 02:44:44 +0000 (02:44 +0000)]
e1000e: ethtool unnecessarily takes device out of RPM suspend

A previous patch (commit e60b22c5b7 e1000e: fix accessing to suspended
device) added .begin and .complete ethtool driver callbacks so that the
device was resumed from Runtime Power Management (RPM) suspend state for
all ethtool operations.  This is overkill for operations which do not need
to access any registers in the device.  This patch makes it so that the
device is taken out of RPM suspend only for those ethtool operations that
must access device registers.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: Tx hang on I218 when linked at 100Half and slow response at 10Mbps
Bruce Allan [Fri, 21 Jun 2013 09:07:13 +0000 (09:07 +0000)]
e1000e: Tx hang on I218 when linked at 100Half and slow response at 10Mbps

Tx hang is an unintended consequence of another workaround that is in the
EEPROM for an issue with the firmware at 10Mbps when K1 (a power mode of
the MAC-PHY interconnect) is enabled.  The issue is resolved by setting
appropriate Tx re-transmission timeouts in the PHY and associated K1 entry
times in the MAC to allow enough transmissions to occur without triggering
a Tx hang.  A similar change is needed when linked at 10Mbps to improve
latency.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: low throughput using 4K jumbos on I218
Bruce Allan [Fri, 21 Jun 2013 09:07:07 +0000 (09:07 +0000)]
e1000e: low throughput using 4K jumbos on I218

Alter the packet buffer allocation accordingly.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: iAMT connections drop on driver unload when jumbo frames enabled
Bruce Allan [Fri, 21 Jun 2013 09:07:02 +0000 (09:07 +0000)]
e1000e: iAMT connections drop on driver unload when jumbo frames enabled

The jumbo frame configuration in the MAC/PHY should be reverted on 82579
and newer parts when the interface is brought down (not just when the MTU
is changed back to standard frame size) otherwise iAMT connections (e.g.
SoL, IDE-R) will be dropped and cannot be re-acquired until the MTU is
changed again.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: disable ASPM L1 on 82583
Bruce Allan [Fri, 21 Jun 2013 09:07:18 +0000 (09:07 +0000)]
e1000e: disable ASPM L1 on 82583

The 82583 can disappear off the PCIe bus.  This device is a modified 82574
which had the same problem which was fixed by disabling ASPM L1; disabling
it on 82583 fixes the issue on this device.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: Use marco instead of digit for defining e1000_rx_desc_packet_split
Wei Yang [Sat, 25 May 2013 06:23:45 +0000 (06:23 +0000)]
e1000e: Use marco instead of digit for defining e1000_rx_desc_packet_split

In structure e1000_rx_desc_packet_split, the size of wb.upper.length is
defined by a digit. This may introduce some problem when the length is
changed.

This patch use the macro PS_PAGE_BUFFERS for the definition. And move the
definition to hw.h.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: Remove duplicate assignment of default rx/tx ring size
Wei Yang [Mon, 20 May 2013 17:31:09 +0000 (17:31 +0000)]
e1000e: Remove duplicate assignment of default rx/tx ring size

tx_ring/rx_ring size is assigned in function e1000_alloc_queues(), which is
called by e1000_sw_init() in the early stage of e1000_probe().

This patch just remove the duplicate assignment of this default ring size
value.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Reviewed-by: Da Yu Qiu <qiudayu@cn.ibm.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: restore call to pci_clear_master()
Dean Nelson [Thu, 13 Jun 2013 03:55:44 +0000 (03:55 +0000)]
e1000e: restore call to pci_clear_master()

In attempting to resolve a minor merge conflict, commit e5f2ef7ab4690d2e8faa
(Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) accidentally
dropped a call to pci_clear_master() that was intended to remain in place.

Commit 4e0855dff094b0d56d6b (e1000e: fix pci-device enable-counter balance)
replaced a call to pci_disable_device() by one to pci_clear_master(). And then
commit 66148babe728f3e00e13 (e1000e: fix runtime power management transitions)
deleted a number of lines starting two lines following that call.

This patch restores the call to pci_clear_master() in __e1000_shutdown().

v2: added summary lines (enclosed in parens) following commit IDs

Signed-off-by: Dean Nelson <dnelson@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe100: dump small buffers via %*ph
Andy Shevchenko [Wed, 29 May 2013 18:40:36 +0000 (18:40 +0000)]
e100: dump small buffers via %*ph

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agobonding: remove bond_resend_igmp_join_requests read_unlock leftover
nikolay@redhat.com [Sat, 27 Jul 2013 17:10:10 +0000 (19:10 +0200)]
bonding: remove bond_resend_igmp_join_requests read_unlock leftover

After commit 4aa5dee4d9 ("net: convert resend IGMP to notifier event") we
have 1 read_unlock in bond_resend_igmp_join_requests which isn't paired
with a read_lock because it's removed by that commit.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agopktgen: Use ip_send_check() to compute checksum
Thomas Graf [Thu, 25 Jul 2013 12:08:04 +0000 (14:08 +0200)]
pktgen: Use ip_send_check() to compute checksum

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agopktgen: Add UDPCSUM flag to support UDP checksums
Thomas Graf [Thu, 25 Jul 2013 16:12:18 +0000 (18:12 +0200)]
pktgen: Add UDPCSUM flag to support UDP checksums

UDP checksums are optional, hence pktgen has been omitting them in
favour of performance. The optional flag UDPCSUM enables UDP
checksumming. If the output device supports hardware checksumming
the skb is prepared and marked CHECKSUM_PARTIAL, otherwise the
checksum is generated in software.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoVSOCK: Move af_vsock.h and vsock_addr.h to include/net
Asias He [Thu, 25 Jul 2013 09:39:34 +0000 (17:39 +0800)]
VSOCK: Move af_vsock.h and vsock_addr.h to include/net

This is useful for other VSOCK transport implemented outside the
net/vmw_vsock/ directory to use these headers.

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'minnow/net-next' of git://git.infradead.org/users/dvhart/linux-2.6...
David S. Miller [Sun, 28 Jul 2013 03:22:56 +0000 (20:22 -0700)]
Merge branch 'minnow/net-next' of git://git.infradead.org/users/dvhart/linux-2.6 into minnow

Darren Hart says:

====================
Add support for the MinnowBoard in the pch_gbe driver. This was
originally sent to LKML as part of the MinnowBoard support series. That
is now partially merged and this version of the patch has been isolated
from those changes and is now completely self-contained.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoUSBNET: increase max rx/tx qlen for improving USB3 thoughtput
Ming Lei [Thu, 25 Jul 2013 05:47:54 +0000 (13:47 +0800)]
USBNET: increase max rx/tx qlen for improving USB3 thoughtput

The default RX_QLEN()/TX_QLEN() didn't consider super speed
USB device, so only max 4 URBs are scheduled at the same time
for tx/rx, then USB3 NIC can't perform very well.

With this patch, both rx and tx thoughput are increased more than
100Mbps when doing iperf test on ax88179_178a USB 3.0 NIC.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoUSBNET: centralize computing of max rx/tx qlen
Ming Lei [Thu, 25 Jul 2013 05:47:53 +0000 (13:47 +0800)]
USBNET: centralize computing of max rx/tx qlen

This patch centralizes computing of max rx/tx qlen, because:

- RX_QLEN()/TX_QLEN() is called in hot path
- computing depends on device's usb speed, now we have ls/fs, hs, ss,
so more checks need to be involved
- in fact, max rx/tx qlen should not only depend on device USB
speed, but also depend on ethernet link speed, so we need to
consider that in future.
- if SG support is done, max tx qlen may need change too

Generally, hard_mtu and rx_urb_size are changed in bind(), reset()
and link_reset() callback, and change mtu network operation, this
patches introduces the API of usbnet_update_max_qlen(), and calls
it in above path.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotuntap: hardware vlan tx support
Jason Wang [Thu, 25 Jul 2013 05:00:33 +0000 (13:00 +0800)]
tuntap: hardware vlan tx support

Inspired by commit f09e2249c4f5c7c13261ec73f5a7807076af0c8e (macvtap: restore
vlan header on user read). This patch adds hardware vlan tx support for
tuntap. This is done by copying vlan header directly into userspace in
tun_put_user() instead of doing it through __vlan_put_tag() in
dev_hard_start_xmit(). This eliminates one unnecessary memmove() in
vlan_insert_tag() for 802.1ad and 802.1q traffic.

pktgen test shows about 20% improvement for 802.1q traffic:

Before:
  662149pps 317Mb/sec (317831520bps) errors: 0
After:
  801033pps 384Mb/sec (384495840bps) errors: 0

Cc: Basil Gor <basil.gor@gmail.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/sctp: Refactor SCTP skb checksum computation
Joe Stringer [Thu, 25 Jul 2013 01:52:05 +0000 (10:52 +0900)]
net/sctp: Refactor SCTP skb checksum computation

This patch consolidates the SCTP checksum calculation code from various
places to a single new function, sctp_compute_cksum(skb, offset).

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovirtio-net: put virtio net header inline with data
Michael S. Tsirkin [Thu, 25 Jul 2013 00:50:23 +0000 (10:20 +0930)]
virtio-net: put virtio net header inline with data

For small packets we can simplify xmit processing
by linearizing buffers with the header:
most packets seem to have enough head room
we can use for this purpose.
Since existing hypervisors require that header
is the first s/g element, we need a feature bit
for this.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobond: cleanup netpoll code
stephen hemminger [Wed, 24 Jul 2013 18:53:57 +0000 (11:53 -0700)]
bond: cleanup netpoll code

This started out with fixing a sparse warning, then I realized that
the wrapper function bond_netpoll_info could just be removed
by rolling it into the enable code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoteam: cleanup netpoll clode
stephen hemminger [Wed, 24 Jul 2013 18:52:44 +0000 (11:52 -0700)]
team: cleanup netpoll clode

This started out with fixing a sparse warning, then I realized that
the wrapper function team_netpoll_info could just be collapsed away
by rolling it into the enable code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobridge: cleanup netpoll code
stephen hemminger [Wed, 24 Jul 2013 18:51:41 +0000 (11:51 -0700)]
bridge: cleanup netpoll code

This started out with fixing a sparse warning, then I realized that
the wrapper function br_netpoll_info could just be collapsed away
by rolling it into the enable code.

Also, eliminate unnecessary goto's

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: use pre-defined macro in bond_mode_name instead of magic number 0
Wang Sheng-Hui [Wed, 24 Jul 2013 06:53:26 +0000 (14:53 +0800)]
bonding: use pre-defined macro in bond_mode_name instead of magic number 0

We have BOND_MODE_ROUNDROBIN pre-defined as 0, and it's the lowest
mode number.
Use it to check the arg lower bound instead of magic number 0 in
bond_mode_name.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agopch_gbe: Add MinnowBoard support
Darren Hart [Sat, 18 May 2013 21:46:00 +0000 (14:46 -0700)]
pch_gbe: Add MinnowBoard support

The MinnowBoard uses an AR803x PHY with the PCH GBE which requires
special handling. Use the MinnowBoard PCI Subsystem ID to detect this
and add a pci_device_id.driver_data structure and functions to handle
platform setup.

The AR803x does not implement the RGMII 2ns TX clock delay in the trace
routing nor via strapping. Add a detection method for the board and the
PHY and enable the TX clock delay via the registers.

This PHY will hibernate without link for 10 seconds. Ensure the PHY is
awake for probe and then disable hibernation. A future improvement would
be to convert pch_gbe to using PHYLIB and making sure we can wake the
PHY at the necessary times rather than permanently disabling it.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Waskiewicz <peter.p.waskiewicz.jr@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: netdev@vger.kernel.org
11 years agodrivers/net/ethernet/stmicro/stmmac: don't check resource with devm_ioremap_resource
Wolfram Sang [Tue, 23 Jul 2013 18:01:45 +0000 (20:01 +0200)]
drivers/net/ethernet/stmicro/stmmac: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agopch_gbe: Use PCH_GBE_PHY_REGS_LEN instead of 32
Darren Hart [Sat, 18 May 2013 21:45:55 +0000 (14:45 -0700)]
pch_gbe: Use PCH_GBE_PHY_REGS_LEN instead of 32

Avoid using magic numbers when we have perfectly good defines just lying
around.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Waskiewicz <peter.p.waskiewicz.jr@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: netdev@vger.kernel.org
11 years agonet: Make devnet_rename_seq static
Thomas Gleixner [Tue, 23 Jul 2013 14:13:17 +0000 (16:13 +0200)]
net: Make devnet_rename_seq static

No users outside net/core/dev.c.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: TCP_NOTSENT_LOWAT socket option
Eric Dumazet [Tue, 23 Jul 2013 03:27:07 +0000 (20:27 -0700)]
tcp: TCP_NOTSENT_LOWAT socket option

Idea of this patch is to add optional limitation of number of
unsent bytes in TCP sockets, to reduce usage of kernel memory.

TCP receiver might announce a big window, and TCP sender autotuning
might allow a large amount of bytes in write queue, but this has little
performance impact if a large part of this buffering is wasted :

Write queue needs to be large only to deal with large BDP, not
necessarily to cope with scheduling delays (incoming ACKS make room
for the application to queue more bytes)

For most workloads, using a value of 128 KB or less is OK to give
applications enough time to react to POLLOUT events in time
(or being awaken in a blocking sendmsg())

This patch adds two ways to set the limit :

1) Per socket option TCP_NOTSENT_LOWAT

2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets
not using TCP_NOTSENT_LOWAT socket option (or setting a zero value)
Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect.

This changes poll()/select()/epoll() to report POLLOUT
only if number of unsent bytes is below tp->nosent_lowat

Note this might increase number of sendmsg()/sendfile() calls
when using non blocking sockets,
and increase number of context switches for blocking sockets.

Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is
defined as :
 Specify the minimum number of bytes in the buffer until
 the socket layer will pass the data to the protocol)

Tested:

netperf sessions, and watching /proc/net/protocols "memory" column for TCP

With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory
used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458)

lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6     1880      2   45458   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
TCP       1696    508   45458   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y

lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6     1880      2   20567   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
TCP       1696    508   20567   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y

Using 128KB has no bad effect on the throughput or cpu usage
of a single flow, although there is an increase of context switches.

A bonus is that we hold socket lock for a shorter amount
of time and should improve latencies of ACK processing.

lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
Final       Final                                             %     Method %      Method
1651584     6291456     16384  20.00   17447.90   10^6bits/s  3.13  S      -1.00  U      0.353   -1.000  usec/KB

 Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':

           412,514 context-switches

     200.034645535 seconds time elapsed

lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
Final       Final                                             %     Method %      Method
1593240     6291456     16384  20.00   17321.16   10^6bits/s  3.35  S      -1.00  U      0.381   -1.000  usec/KB

 Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':

         2,675,818 context-switches

     200.029651391 seconds time elapsed

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-By: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>