Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

tun: orphan frags on xmit

tun xmit is actually receive of the internal tun
socket. Orphan the frags same as we do for normal rx path.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

skbuff: convert to skb_orphan_frags

Reduce code duplication a bit using the new helper.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

skbuff: add an api to orphan frags

Many places do
if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY))
skb_copy_ubufs(skb, gfp_mask);
to copy and invoke frag destructors if necessary.
Add an inline helper for this.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Fix build with PCI_IOV enabled.

Signed-off-by: David S. Miller <davem@davemloft.net>

forcedeth: advertise transmit time stamping

This driver now offers software transmit time stamping, so it should
advertise that fact via ethtool. Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: advertise transmit time stamping

This driver now offers software transmit time stamping, so it should
advertise that fact via ethtool. Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000: advertise transmit time stamping

This driver now offers software transmit time stamping, so it should
advertise that fact via ethtool. Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: advertise transmit time stamping

This driver now offers software transmit time stamping, so it should
advertise that fact via ethtool. Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnl: Add #ifdef CONFIG_RPS around num_rx_queues reference

Commit 76ff5cc91935c51fcf1a6a99ffa28b97a6e7a884
(rtnl: allow to specify number of rx and tx queues
on device creation) added a reference to the net_device
structure's 'num_rx_queues' member in

net/core/rtnetlink.c:rtnl_fill_ifinfo()

However, the definition for 'num_rx_queues' is surrounded
by an '#ifdef CONFIG_RPS' while the new reference to it is
not. This causes a compile error when CONFIG_RPS is not
defined.

Fix the compile error by surrounding the new reference to
'num_rx_queues' by an '#ifdef CONFIG_RPS'.

CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

--------------------
This series contains updates to ixgbe and ixgbevf.
...
Akeem G. Abodunrin (1):
  igb: reset PHY in the link_up process to recover PHY setting after
    power down.

Alexander Duyck (8):
  ixgbe: Drop probe_vf and merge functionality into ixgbe_enable_sriov
  ixgbe: Change how we check for pre-existing and assigned VFs
  ixgbevf: Add lock around mailbox ops to prevent simultaneous access
  ixgbevf: Add support for PCI error handling
  ixgbe: Fix handling of FDIR_HASH flag
  ixgbe: Reduce Rx header size to what is actually used
  ixgbe: Use num_tcs.pg_tcs as upper limit for TC when checking based
    on UP
  ixgbe: Use 1TC DCB instead of disabling DCB for MSI and legacy
    interrupts

Don Skidmore (1):
  ixgbe: add support for new 82599 device

Greg Rose (1):
  ixgbevf: Fix namespace issue with ixgbe_write_eitr

John Fastabend (2):
  ixgbe: fix RAR entry counting for generic and fdb_add()
  ixgbe: remove extra unused queues in DCB + FCoE case
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'vhost-net-next' of git://git./linux/kernel/git/mst/vhost

wimax: fix printk format warnings

Fix printk format warnings in drivers/net/wimax/i2400m:

drivers/net/wimax/i2400m/control.c: warning: format '%zu' expects argument of type 'size_t', but argument 4 has type 'ssize_t' [-Wformat]
drivers/net/wimax/i2400m/control.c: warning: format '%zu' expects argument of type 'size_t', but argument 5 has type 'ssize_t' [-Wformat]
drivers/net/wimax/i2400m/usb-fw.c: warning: format '%zu' expects argument of type 'size_t', but argument 4 has type 'ssize_t' [-Wformat]

I don't see these warnings on x86. The warnings that are quoted above
are from Geert's kernel build reports.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Cc: linux-wimax@intel.com
Cc: wimax@linuxwimax.org
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Implement quick failover draft from tsvwg

I've seen several attempts recently made to do quick failover of sctp transports
by reducing various retransmit timers and counters. While its possible to
implement a faster failover on multihomed sctp associations, its not
particularly robust, in that it can lead to unneeded retransmits, as well as
false connection failures due to intermittent latency on a network.

Instead, lets implement the new ietf quick failover draft found here:
http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05

This will let the sctp stack identify transports that have had a small number of
errors, and avoid using them quickly until their reliability can be
re-established. I've tested this out on two virt guests connected via multiple
isolated virt networks and believe its in compliance with the above draft and
works well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
CC: joe@perches.com
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix race condition in several drivers when reading stats

Fix race condition in several network drivers when reading stats on 32bit
UP architectures. These drivers update their stats in a BH context and
therefore should use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh
instead of u64_stats_fetch_begin/u64_stats_fetch_retry when reading the
stats.

Signed-off-by: Kevin Groeneveld <kgroeneveld@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: tcp: set unicast_sock uc_ttl to -1

Set unicast_sock uc_ttl to -1 so that we select the right ttl,
instead of sending packets with a 0 ttl.

Bug added in commit be9f4a44e7d4 (ipv4: tcp: remove per net tcp_sock)

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: reset PHY in the link_up process to recover PHY setting after power down.

There was a previous patch to resolve issue with 82576 losing PHY setting
after PHY power down. However that previous implementation triggered speed
mismatch and occasional link lost. Now, this patch resolves both initial
PHY setting and speed mismatch issues.

Signed-off-by: Akeem G. Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Use 1TC DCB instead of disabling DCB for MSI and legacy interrupts

This change makes it so that we can use 1TC DCB in the case of MSI and
legacy interrupts. The advantage to this is that it allows us to fully
support FCoE w/ DCB instead of having to drop to link flow control only
when using these interrupt modes.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: add support for new 82599 device

This patch adds support for a new 82599 device that supports WoL.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: remove extra unused queues in DCB + FCoE case

With DCB and FCoE configured extra queues may be allocated and
never used. After this patch we calculate the max correctly.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fix RAR entry counting for generic and fdb_add()

Do RAR entry accounting correctly so that errors are reported and
promisc mode is set correctly when the number of entries exceeds
the hardware limits.

This can happen with many macvlan devices attached to the PF or
by adding many fdb entries in SR-IOV modes.

Also this includes a small refactor to fdb_add() to avoid having so
many nested if/else statements after adding a check for the number
or RAR entries.

The max entries for the PF is currently 16 we allow 15 additional
entries to account for the defined MAC.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Use num_tcs.pg_tcs as upper limit for TC when checking based on UP

This change makes it so the function ixgbe_dcb_get_tc_from_up will use the
num_tcs.pg_tcs to determine the starting value for determining a traffic
class based on a user priority. The main motivation for this change is to
address possible bad configurations in which more TCs worth of data are
populated then there are actual TCs. By limiting this value we can at
least make certain we are not providing a map with values that are out of
range.

As a result any user priorities that are setup in the configuration with a
traffic class mapping higher than what the hardware supports will be
reported as being on TC 0.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Reduce Rx header size to what is actually used

The recent changes to netdev_alloc_skb actually make it so that the size of
the buffer now actually has a more direct input on the truesize. So in
order to make best use of the piece of a page we are allocated I am
reducing the IXGBE_RX_HDR_SIZE to 256 so that our truesize will be reduced
by 256 bytes as well.

This should result in performance improvements since the number of uses per
page should increase from 4 to 6 in the case of a 4K page. In addition we
should see socket performance improvements due to the truesize dropping
to less than 1K for buffers less than 256 bytes.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbevf: Fix namespace issue with ixgbe_write_eitr

Make the function static to cleanup namespace.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <Sibai.li@intel.com
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Fix handling of FDIR_HASH flag

This change makes it so that we can use the atr_sample_rate to determine if
we are capable of supporting ATR. The advantage to this approach is that it
allows us to now determine the setting of the IXGBE_FLAG_FDIR_HASH_CAPABLE
based on the queueing scheme, instead of the queueing scheme being based on
the flag.

Using this approach there are essentially 5 conditions that must be checked
prior to trying to enable ATR:
1.  Is SR-IOV disabled?
2.  Are the number of TCs <= 1?
3.  Is RSS queueing limit greater than 1?
4.  Is atr_sample_rate set?
5.  Is Flow Director perfect filtering disabled?

If any of these conditions are enabled they should disable ATR filtering.
Note that in the case of conditions 1 through 4 being met we will set
things up for ATR queueing, however if test 5 fails we will still leave the
queues allocated for use by perfect filters.  The reason for this is to
allow for us to switch back and forth between ntuple and ATR without
needing to reallocate the descriptor rings.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbevf: Add support for PCI error handling

This change adds support for handling IO errors and slot resets.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbevf: Add lock around mailbox ops to prevent simultaneous access

This change adds a spinlock around the mailbox accesses to prevent
simultaneous access to the mailboxes.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Change how we check for pre-existing and assigned VFs

This patch does two things.  First it drops the unnecessary work of
searching for enabled VFs when we first bring up the adapter and instead
just uses pci_num_vf to determine how many VFs are enabled on the adapter.

The second thing it does is drop the use of vfdev from the vf_data_storage
structure.  Instead we just search the entire system for a VF that has us
as it's PF, and then if that VF is assigned we indicate that the VFs are
assigned.  This allows us to still check for assigned VFs even if the
vfinfo allocation has failed, or vfinfo has been freed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Drop probe_vf and merge functionality into ixgbe_enable_sriov

This is meant to fix a bug in which we were not checking for pre-existing
VFs if we were not setting the max_vfs value at driver load. What happens
now is that we always call ixgbe_enable_sriov and this checks for
pre-existing VFs ore requested VFs prior to deciding on no SR-IOV.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

vhost: make vhost work queue visible

The vhost work queue allows processing to be done in vhost worker thread
context, which uses the owner process mm. Access to the vring and guest
memory is typically only possible from vhost worker context so it is
useful to allow work to be queued directly by users.

Currently vhost_net only uses the poll wrappers which do not expose the
work queue functions. However, for tcm_vhost (vhost_scsi) it will be
necessary to queue custom work.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost: Separate vhost-net features from vhost features

In order for other vhost devices to use the VHOST_FEATURES bits the
vhost-net specific bits need to be moved to their own VHOST_NET_FEATURES
constant.

(Asias: Update drivers/vhost/test.c to use VHOST_NET_FEATURES)

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Asias He <asias@redhat.com>
Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

forcedeth: spin_unlock_irq in interrupt handler fix

The replacement of spin_lock_irq/spin_unlock_irq pair in interrupt
handler by spin_lock_irqsave/spin_lock_irqrestore pair.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Denis Efremov <yefremov.denis@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/jesse/openvswitch

Jesse Gross says:

====================
A few bug fixes and small enhancements for net-next/3.6.
...
Ansis Atteka (1):
      openvswitch: Do not send notification if ovs_vport_set_options() failed

Ben Pfaff (1):
      openvswitch: Check gso_type for correct sk_buff in queue_gso_packets().

Jesse Gross (2):
      openvswitch: Enable retrieval of TCP flags from IPv6 traffic.
      openvswitch: Reset upper layer protocol info on internal devices.

Leo Alterman (1):
      openvswitch: Fix typo in documentation.

Pravin B Shelar (1):
      openvswitch: Check currect return value from skb_gso_segment()

Raju Subramanian (1):
      openvswitch: Replace Nicira Networks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Fix neigh lookup keying over loopback/point-to-point devices.

We were using a special key "0" for all loopback and point-to-point
device neigh lookups under ipv4, but we wouldn't use that special
key for the neigh creation.

So basically we'd make a new neigh at each and every lookup :-)

This special case to use only one neigh for these device types
is of dubious value, so just remove it entirely.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Fix typo in documentation.

Signed-off-by: Leo Alterman <lalterman@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

openvswitch: Check gso_type for correct sk_buff in queue_gso_packets().

At the point where it was used, skb_shinfo(skb)->gso_type referred to a
post-GSO sk_buff. Thus, it would always be 0. We want to know the pre-GSO
gso_type, so we need to obtain it before segmenting.

Before this change, the kernel would pass inconsistent data to userspace:
packets for UDP fragments with nonzero offset would be passed along with
flow keys that indicate a zero offset (that is, the flow key for "later"
fragments claimed to be "first" fragments). This inconsistency tended
to confuse Open vSwitch userspace, causing it to log messages about
"failed to flow_del" the flows with "later" fragments.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

openvswitch: Check currect return value from skb_gso_segment()

Fix return check typo.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

atl1c: fix issue of io access mode for AR8152 v2.1

When io access mode is enabled by BOOTROM or BIOS for AR8152 v2.1,
the register can't be read/write by memory access mode.
Clearing Bit 8 of Register 0x21c could fixed the issue.

Signed-off-by: Cloud Ren <cjren@qca.qualcomm.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun: fix a crash bug and a memory leak

This patch fixes a crash
tun_chr_close -> netdev_run_todo -> tun_free_netdev -> sk_release_kernel ->
sock_release -> iput(SOCK_INODE(sock))
introduced by commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d

The problem is that this socket is embedded in struct tun_struct, it has
no inode, iput is called on invalid inode, which modifies invalid memory
and optionally causes a crash.

sock_release also decrements sockets_in_use, this causes a bug that
"sockets: used" field in /proc/*/net/sockstat keeps on decreasing when
creating and closing tun devices.

This patch introduces a flag SOCK_EXTERNALLY_ALLOCATED that instructs
sock_release to not free the inode and not decrement sockets_in_use,
fixing both memory corruption and sockets_in_use underflow.

It should be backported to 3.3 an 3.4 stabke.

Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: show pmtu in route list

Override the metrics with rt_pmtu

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jerr Kirsher says:

====================
This series contains updates to ixgbe.
...
Alexander Duyck (9):
  ixgbe: Use VMDq offset to indicate the default pool
  ixgbe: Fix memory leak when SR-IOV VFs are direct assigned
  ixgbe: Drop references to deprecated pci_ DMA api and instead use
    dma_ API
  ixgbe: Cleanup configuration of FCoE registers
  ixgbe: Merge all FCoE percpu values into a single structure
  ixgbe: Make FCoE allocation and configuration closer to how rings
    work
  ixgbe: Correctly set SAN MAC RAR pool to default pool of PF
  ixgbe: Only enable anti-spoof on VF pools
  ixgbe: Enable FCoE FSO and CRC offloads based on CAPABLE instead of
    ENABLED flag
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'team_multiq'

Jiri Pirko says:

====================
This patchset represents the way I walked when I was adding multiqueue
support for team driver.

Jiri Pirko (6):
  net: honour netif_set_real_num_tx_queues() retval
  rtnl: allow to specify different num for rx and tx queue count
  rtnl: allow to specify number of rx and tx queues on device creation
  net: rename bond_queue_mapping to slave_dev_queue_mapping
  bond_sysfs: use ream_num_tx_queues rather than params.tx_queue
  team: add multiqueue support
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

team: add multiqueue support

Largely copied from bonding code.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

bond_sysfs: use real_num_tx_queues rather than params.tx_queue

Since now number of tx queues can be specified during bond instance
creation and therefore it may differ from params.tx_queues, use rather
real_num_tx_queues for boundary check.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: rename bond_queue_mapping to slave_dev_queue_mapping

As this is going to be used not only by bonding.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnl: allow to specify number of rx and tx queues on device creation

This patch introduces IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES by
which userspace can set number of rx and/or tx queues to be allocated
for newly created netdevice.
This overrides ops->get_num_[tr]x_queues()

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnl: allow to specify different num for rx and tx queue count

Also cut out unused function parameters and possible err in return
value.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: honour netif_set_real_num_tx_queues() retval

In netif_copy_real_num_queues() the return value of
netif_set_real_num_tx_queues() should be checked.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: improve latencies of timer triggered events

Modern TCP stack highly depends on tcp_write_timer() having a small
latency, but current implementation doesn't exactly meet the
expectations.

When a timer fires but finds the socket is owned by the user, it rearms
itself for an additional delay hoping next run will be more
successful.

tcp_write_timer() for example uses a 50ms delay for next try, and it
defeats many attempts to get predictable TCP behavior in term of
latencies.

Use the recently introduced tcp_release_cb(), so that the user owning
the socket will call various handlers right before socket release.

This will permit us to post a followup patch to address the
tcp_tso_should_defer() syndrome (some deferred packets have to wait
RTO timer to be transmitted, while cwnd should allow us to send them
sooner)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: John Heffner <johnwheffner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: fix ABC in tcp_slow_start()

When/if sysctl_tcp_abc > 1, we expect to increase cwnd by 2 if the
received ACK acknowledges more than 2*MSS bytes, in tcp_slow_start()

Problem is this RFC 3465 statement is not correctly coded, as
the while () loop increases snd_cwnd one by one.

Add a new variable to avoid this off-by one error.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: John Heffner <johnwheffner@gmail.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: use hash_32() in tcp_metrics

Fix a missing roundup_pow_of_two(), since tcpmhash_entries is not
guaranteed to be a power of two.

Uses hash_32() instead of custom hash.

tcpmhash_entries should be an unsigned int.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Return bool instead of int where appropriate

Applied to a set of static inline functions in tcp_input.c

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: use PCI_VENDOR_ID_INTEL

Use PCI_VENDOR_ID_INTEL from pci_ids.h instead of creating its own
vendor ID #define.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Cc: Alex Duyck <alexander.h.duyck@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgb: use PCI_VENDOR_ID_INTEL

Use PCI_VENDOR_ID_INTEL from pci_ids.h instead of creating its own
vendor ID #define.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Cc: Alex Duyck <alexander.h.duyck@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

myri10ge: update MAINTAINERS

Remove myself from myri10ge MAINTAINERS list

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next

Marc Kleine-Budde says:

====================
the fifth pull request for upcoming v3.6 net-next cleans up and
improves the janz-ican3 driver (6 patches by Ira W. Snyder, one by me).
A patch by Steffen Trumtrar adds imx53 support to the flexcan driver.
And another patch by me, which marks the bit timing constant in the CAN
drivers as "const".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next into for-davem

can: janz-ican3: add support for one shot mode

The Janz VMOD-ICAN3 hardware has support for one shot packet
transmission. This means that a packet will be attempted to be sent
once, with no automatic retries.

The SocketCAN core has a controller-wide setting for this mode:
CAN_CTRLMODE_ONE_SHOT. The Janz VMOD-ICAN3 hardware supports this flag
on a per-packet level, but the SocketCAN core does not.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: janz-ican3: avoid firmware lockup caused by infinite bus error quota

If the bus error quota is set to infinite and the host CPU cannot keep
up, the Janz VMOD-ICAN3 firmware will stop responding to control
messages until the controller is reset.

The firmware will automatically stop sending bus error messages when the
quota is reached, and will only resume sending bus error messages when
the quota is re-set to a positive value.

This limitation is worked around by setting the bus error quota to one
message, and then re-setting the quota to one message every time a bus
error message is received. By doing this, the firmware never stops
responding to control messages. The CAN bus can be reset without a
hard-reset of the controller card.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: janz-ican3: fix support for CAN_RAW_RECV_OWN_MSGS

The Janz VMOD-ICAN3 firmware does not support any sort of TX-done
notification or interrupt. The driver previously used the hardware
loopback to attempt to work around this deficiency, but this caused all
sockets to receive all messages, even if CAN_RAW_RECV_OWN_MSGS is off.

Using the new function ican3_cmp_echo_skb(), we can drop the loopback
messages and return the original skbs. This fixes the issues with
CAN_RAW_RECV_OWN_MSGS.

A private skb queue is used to store the echo skbs. This avoids the need
for any index management.

Due to a lack of TX-error interrupts, bus errors are permanently
enabled, and are used as a TX-error notification. This is used to drop
an echo skb when transmission fails. Bus error packets are not generated
if the user has not enabled bus error reporting.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: janz-ican3: fix error and byte counters

The error and byte counter statistics were being incremented
incorrectly. For example, a TX error would be counted both in tx_errors
and rx_errors.

This corrects the problem so that tx_errors and rx_errors are only
incremented for errors caused by packets sent to the bus. Error packets
generated by the driver are not counted.

The byte counters are only increased for packets which are actually
transmitted or received from the bus. Error packets generated by the
driver are not counted.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: janz-ican3: cleanup of ican3_to_can_frame and can_frame_to_ican3

This patch cleans up the ICAN3 to Linux CAN frame and vice versa
conversion functions:

- RX: Use get_can_dlc() to limit the dlc value.
- RX+TX: Don't copy the whole frame, only copy the amount of bytes
specified in cf->can_dlc.

Acked-by: Ira W. Snyder <iws@ovro.caltech.edu>
Tested-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: janz-ican3: drop invalid skbs

The commit which added the janz-ican3 driver and commit
3ccd4c61 "can: Unify droping of invalid tx skbs and netdev stats" were
committed into mainline Linux during the same merge window.

Therefore, the addition of this code to the janz-ican3 driver was
forgotten. This patch adds the expected code.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: janz-ican3: remove dead code

The code which used this variable was removed during review, before the
driver was added to mainline Linux. It is now dead code, and can be
removed.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: flexcan: add 2nd clock to support imx53 and newer

This patch adds support for a second clock to the flexcan driver. On
modern freescale ARM cores like the imx53 and imx6q two clocks ("ipg"
and "per") must be enabled in order to access the CAN core.

In the original driver, the clock was requested without specifying the
connection id, further all mainline ARM archs with flexcan support
(imx28, imx25, imx35) register their flexcan clock without a
connection id, too.

This patch first renames the existing clk variable to clk_ipg and
converts it to devm for easier error handling. The connection id "ipg"
is added to the devm_clk_get() call. Then a second clock "per" is
requested. As all archs don't specify a connection id, both clk_get
return the same clock. This ensures compatibility to existing flexcan
support and adds support for imx53 at the same time.

After this patch hits mainline, the archs may give their existing
flexcan clock the "ipg" connection id and implement a dummy "per"
clock.

This patch has been tested on imx28 (unmodified clk tree) and on imx53
with a seperate "ipg" and "per" clock.

Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Acked-by: Hui Wang <jason77.wang@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mark bittiming_const pointer in struct can_priv as const

This patch marks the bittiming_const pointer as in the struct can_pric as
"const". This allows us to mark the struct can_bittiming_const in the CAN
drivers as "const", too.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

ixgbe: Enable FCoE FSO and CRC offloads based on CAPABLE instead of ENABLED flag

Instead of only setting the FCOE segmentation offload and CRC offload flags
if we enable FCoE, we could just set them always since there are no
modifications needed to the hardware or adapter FCoE structure in order to
use these features.

The advantage to this is that if FCoE enablement fails, for example because
SR-IOV was enabled on 82599, we will still have use of the FCoE
segmentation offload and Tx/Rx CRC offloads which should still help to
improve the FCoE performance.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Only enable anti-spoof on VF pools

The current logic is enabling anti-spoof on all pools and then clearing
anti-spoof on just the first PF pool. The correct approach is to only set
anti-spoof on the VF pools and to leave all of the PF pools unchecked.

This allows for items such as FCoE to use adjacent pools within the PF for
transmit and receive queues without the traffic being blocked by this
security feature.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Correctly set SAN MAC RAR pool to default pool of PF

This change corrects an issue in which an FCoE enabled adapter was always
setting the FCoE SAN MAC MPSAR register to 0x1. This results in the first
VF being assigned the SAN MAC address in the case of SR-IOV and as such is
incorrect. To resolve this I am adding a new function that will update the
SAN MAC pool address after reset.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Make FCoE allocation and configuration closer to how rings work

This patch changes the behavior of the FCoE configuration so that it is
much closer to how the main body of the ixgbe driver works for ring
allocation.

The first piece is the ixgbe_fcoe_ddp_enable/disable calls.  These allocate
the percpu values and if successful set the fcoe_ddp_xid value indicating
that we can support DDP.

The next piece is the ixgbe_setup/free_ddp_resources calls.  These are
called on open/close and will allocate and free the DMA pools.

Finally ixgbe_configure_fcoe is now just register configuration.  It can go
through and enable the registers for the FCoE redirection offload, and FIP
configuration without any interference from the DDP pool allocation.

The net result of all this is two fold.  First it adds a certain amount of
exception handling.  So for example if ixgbe_setup_fcoe_resources fails we
will actually generate an error in open and refuse to bring up the
interface.

Secondly it provides a much more graceful failure case than the previous
model which would skip setting up the registers for FCoE on failure to
allocate DDP resources leaving no Rx functionality enabled instead of just
disabling DDP.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Merge all FCoE percpu values into a single structure

This change merges the 2 statistics values for noddp and noddp_ext_buff
and the dma_pool into a single structure that can be allocated per CPU.

The advantages to this are several fold. First we only need to do one
alloc_percpu call now instead of 3, so that means less overhead for
handling memory allocation failures. Secondly in the case of
ixgbe_fcoe_ddp_setup we only need to call get_cpu once which makes things a
bit cleaner since we can drop a put_cpu() from the exception path.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Cleanup configuration of FCoE registers

This change makes it so we always use the FCoE redirection table. We just
set all 8 entries to the same value in the case of only having one queue
for FCoE.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Drop references to deprecated pci_ DMA api and instead use dma_ API

The networking side of the code had already been updated to use dma_ calls
instead of the old pci_ calls. However it looks like the FCoE code was
never updated. This change goes through and moves everything from the pci
APIs to the dma APIs.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Fix memory leak when SR-IOV VFs are direct assigned

The VF driver had a memory leak that would occur if VFs were assigned to a
guest.  The amount of leak would vary with the number of VFs but could max
out at about 14K per PF.  To reproduce the leak all you would need to do is
enable all the VFs on the first PF.  Then start a loop of loading and
unloading the driver with max_vfs=63 for the first port.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Use VMDq offset to indicate the default pool

This change makes it so that we can use the VMDq ring feature offset value
to determine the default pool instead of using num_vfs. The reason for
this change is to avoid issues should we fail to allocate vfinfo but have
pre-existing VFs. What should happen in this case is that num_vfs will go
to 0, but the VMDq offset will contain the location of the first PF pool.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <Sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch 'net' of git://git./linux/kernel/git/cmetcalf/linux-tile

ipv4: Fix again the time difference calculation

Fix again the diff value in rt_bind_exception
after collision of two latest patches, my original commit
actually fixed the same problem.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Conflicts:
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c

net-tcp: Fast Open client - cookie-less mode

In trusted networks, e.g., intranet, data-center, the client does not
need to use Fast Open cookie to mitigate DoS attacks. In cookie-less
mode, sendmsg() with MSG_FASTOPEN flag will send SYN-data regardless
of cookie availability.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-tcp: Fast Open client - detecting SYN-data drops

On paths with firewalls dropping SYN with data or experimental TCP options,
Fast Open connections will have experience SYN timeout and bad performance.
The solution is to track such incidents in the cookie cache and disables
Fast Open temporarily.

Since only the original SYN includes data and/or Fast Open option, the
SYN-ACK has some tell-tale sign (tcp_rcv_fastopen_synack()) to detect
such drops. If a path has recurring Fast Open SYN drops, Fast Open is
disabled for 2^(recurring_losses) minutes starting from four minutes up to
roughly one and half day. sendmsg with MSG_FASTOPEN flag will succeed but
it behaves as connect() then write().

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)

sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2)
and write(2). The application should replace connect() with it to
send data in the opening SYN packet.

For blocking socket, sendmsg() blocks until all the data are buffered
locally and the handshake is completed like connect() call. It
returns similar errno like connect() if the TCP handshake fails.

For non-blocking socket, it returns the number of bytes queued (and
transmitted in the SYN-data packet) if cookie is available. If cookie
is not available, it transmits a data-less SYN packet with Fast Open
cookie request option and returns -EINPROGRESS like connect().

Using MSG_FASTOPEN on connecting or connected socket will result in
simlar errno like repeating connect() calls. Therefore the application
should only use this flag on new sockets.

The buffer size of sendmsg() is independent of the MSS of the connection.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-tcp: Fast Open client - receiving SYN-ACK

On receiving the SYN-ACK after SYN-data, the client needs to
a) update the cached MSS and cookie (if included in SYN-ACK)
b) retransmit the data not yet acknowledged by the SYN-ACK in the final ACK of
the handshake.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-tcp: Fast Open client - sending SYN-data

This patch implements sending SYN-data in tcp_connect(). The data is
from tcp_sendmsg() with flag MSG_FASTOPEN (implemented in a later patch).

The length of the cookie in tcp_fastopen_req, init'd to 0, controls the
type of the SYN. If the cookie is not cached (len==0), the host sends
data-less SYN with Fast Open cookie request option to solicit a cookie
from the remote. If cookie is not available (len > 0), the host sends
a SYN-data with Fast Open cookie option. If cookie length is negative,
the SYN will not include any Fast Open option (for fall back operations).

To deal with middleboxes that may drop SYN with data or experimental TCP
option, the SYN-data is only sent once. SYN retransmits do not include
data or Fast Open options. The connection will fall back to regular TCP
handshake.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-tcp: Fast Open client - cookie cache

With help from Eric Dumazet, add Fast Open metrics in tcp metrics cache.
The basic ones are MSS and the cookies. Later patch will cache more to
handle unfriendly middleboxes.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-tcp: Fast Open base

This patch impelements the common code for both the client and server.

1. TCP Fast Open option processing. Since Fast Open does not have an
   option number assigned by IANA yet, it shares the experiment option
   code 254 by implementing draft-ietf-tcpm-experimental-options
   with a 16 bits magic number 0xF989. This enables global experiments
   without clashing the scarce(2) experimental options available for TCP.

   When the draft status becomes standard (maybe), the client should
   switch to the new option number assigned while the server supports
   both numbers for transistion.

2. The new sysctl tcp_fastopen

3. A place holder init function

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4_en: map entire pages to increase throughput

In its receive path, mlx4_en driver maps each page chunk that it pushes
to the hardware and unmaps it when pushing it up the stack. This limits
throughput to about 3Gbps on a Power7 8-core machine.

One solution is to map the entire allocated page at once. However, this
requires that we keep track of every page fragment we give to a
descriptor. We also need to work with the discipline that all fragments will
be released (in the sense that it will not be reused by the driver
anymore) in the order they are allocated to the driver.

This requires that we don't reuse any fragments, every single one of
them must be reallocated. We do that by releasing all the fragments that
are processed and only after finished processing the descriptors, we
start the refill.

We also must somehow guarantee that we either refill all fragments in a
descriptor or none at all, without resorting to giving up a page
fragment that we would have already given. Otherwise, we would break the
discipline of only releasing the fragments in the order they were
allocated.

This has passed page allocation fault injections (restricted to the
driver by using required-start and required-end) and device hotplug
while 16 TCP streams were able to deliver more than 9Gbps.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: initialize dynamic sysfs attributes for lockdep

Dynamically allocated sysfs attributes must be initialized using
sysfs_attr_init(), otherwise lockdep complains:
BUG: key <address> not in .data!

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: update documentation references

Update the references to bridge utilities and web pages
to current locations

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: e100: ucode is optional in some cases

commit 9ac32e1b firmware: convert e100 driver to request_firmware()

did a straight conversion of the in-driver ucode to external
files.  This introduced the possibility of the driver failing
to enable an interface due to missing ucode. There was no
evaluation of the importance of the ucode at the time.

Based on comments in earlier versions of this driver, and in
the source code for the FreeBSD fxp driver, we can assume that
the ucode implements the "CPU Cycle Saver" feature on supported
adapters.  Although generally wanted, this is an optional
feature. The ucode source is not available, preventing it from
being included in free distributions. This creates unnecessary
problems for the end users. Doing a network install based on a
free distribution installer requires the user to download and
insert the ucode into the installer.

Making the ucode optional when possible improves the user
experience and driver usability.

The ucode for some adapters include a bugfix, making it
essential.  We continue to fail for these adapters unless the
ucode is available.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

asix: AX88172A driver depends on phylib

Since commit 16626b0cc3d5afe250850f96759b241f8a403b52 the asix
driver depends on the phylib. Select phylib when the asix driver is
selected.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: kernel-janitors@vger.kernel.org
Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

asix: Add support for programming the EEPROM

This patch adds the asix_set_eeprom() function to provide support for
programming the configuration EEPROM via ethtool.

Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>

asix: Rework reading from EEPROM

The current code for reading the EEPROM via ethtool in the asix
driver has a few issues. It cannot handle odd length values
(accesses must be aligned at 16 bit boundaries) and interprets the
offset provided by ethtool as 16 bit word offset instead as byte offset.

The new code for asix_get_eeprom() introduced by this patch is
modeled after the code in
drivers/net/ethernet/atheros/atl1e/atl1e_ethtool.c
and provides read access to the entire EEPROM with arbitrary
offsets and lengths.

Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Add ip version to dts bindings

Because there are multiple variants to the stmmac/dwmac driver, the
dts bindings should be updated to include version of the IP used.

Signed-off-by: Dinh Nguyen <dinguyen@altera.com>
Acked-by: Stefan Roese <sr@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb3: Set vlan_feature on net_device

cxgb3 interface has a bad performance when VLAN is set. On my current
setup, a PowerLinux 7R2, I am able to get around 7 Gbps on a TCP_STREAM
(8 instances, 4k message).
With this patch, I am able to reach 9.5 Gbps.

Signed-off-by: Breno Leitao <brenohl@br.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipx: move peII functions

The Ethernet II wrapper is only used by IPX protocol, may have once
been used by Appletalk but not currently. Therefore it makes sense to
move it to the IPX dust bin and drop the exports.

Build tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix warnings in dst_ops.h

include/net/dst_ops.h:28:20: warning: ‘struct sock’ declared inside parameter list

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: tcp: remove per net tcp_sock

tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
per network namespace.

This leads to bad behavior on multiqueue NICS, because many cpus
contend for the socket lock and once socket lock is acquired, extra
false sharing on various socket fields slow down the operations.

To better resist to attacks, we use a percpu socket. Each cpu can
run without contention, using appropriate memory (local node)

Additional features :

1) We also mirror the queue_mapping of the incoming skb, so that
answers use the same queue if possible.

2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()

3) We now limit the number of in-flight RST/ACK [1] packets
per cpu, instead of per namespace, and we honor the sysctl_wmem_default
limit dynamically. (Prior to this patch, sysctl_wmem_default value was
copied at boot time, so any further change would not affect tcp_sock
limit)

[1] These packets are only generated when no socket was matched for
the incoming packet.

Reported-by: Bill Sommerfeld <wsommerfeld@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: use seqlock for nh_exceptions

Use global seqlock for the nh_exceptions. Call
fnhe_oldest with the right hash chain. Correct the diff
value for dst_set_expires.

v2: after suggestions from Eric Dumazet:
* get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
* continue daddr search in rt_bind_exception

v3:
* remove the daddr check before seqlock in rt_bind_exception
* restart lookup in rt_bind_exception on detected seqlock change,
as suggested by David Miller

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

libertas: firmware.c: remove duplicated include

Signed-off-by: Duan Jiong <djduanjiong@gmail.com>

Merge branch 'for-john' of git://git./linux/kernel/git/jberg/mac80211-next

ipv4: Fix time difference calculation in rt_bind_exception().

Reported-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>