Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

cxgb4i : remove spurious use of rcu

As pointed out by the intel guys, there is no need to hold rcu read lock in
cxgbi_inet6addr_handler(), this patch removes it.

Fixes: 759a0cc5a3e1 ("cxgb4i: Add ipv6 code to driver, call into libcxgbi ipv6 api")
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'concurrent_hash_tables'

Thomas Graf says:

====================
Lockless netlink_lookup() with new concurrent hash table

Netlink sockets are maintained in a hash table to allow efficient lookup
via the port ID for unicast messages. However, lookups currently require
a read lock to be taken. This series adds a new generic, resizable,
scalable, concurrent hash table based on the paper referenced in the first
patch. It then makes use of the new data type to implement lockless
netlink_lookup().

Patch 3/3 to convert nft_hash is included for reference but should be
merged via the netfilter tree. Inclusion in this series is to provide
context for the suggested API.

Against net-next since the initial user of the new hash table is in net/

Changes:
v4-v5:
- use GFP_KERNEL to alloc Netlink buckets as suggested by Nikolay
   Aleksandrov
- free nft hash element on removal as spotted by Nikolay Aleksandrov
   and Patrick McHardy
v3-v4:
- fixed wrong shift assignment placement as spotted by Nikolay Aleksandrov
- reverted default size of nft_hash to 4 as requested by Patrick McHardy,
   default size for other hash tables remains at 64 if no hint is given
- fixed copyright as requested by Patrick McHardy
v2-v3:
- fixed typo in nft_hash_destroy() when passing rhashtable handle
v1-v2:
- fixed traversal off-by-one as spotted by Tobias Klauser
- removed unlikely() from BUG_ON() as spotted by Josh Triplett
- new 3rd patch to convert nft_hash to rhashtable
- make rhashtable_insert() return void
- nl_sk_hash_lock must be a mutex
- fixed wrong name of rht_shrink_below_30()
- exported symbols rht_grow_above_75() and rht_shrink_below_30()
- allow table freeing with RCU callback
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nftables: Convert nft_hash to use generic rhashtable

The sizing of the hash table and the practice of requiring a lookup
to retrieve the pprev to be stored in the element cookie before the
deletion of an entry is left intact.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Patrick McHardy <kaber@trash.net>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: Convert netlink_lookup() to use RCU protected hash table

Heavy Netlink users such as Open vSwitch spend a considerable amount of
time in netlink_lookup() due to the read-lock on nl_table_lock. Use of
RCU relieves the lock contention.

Makes use of the new resizable hash table to avoid locking on the
lookup.

The hash table will grow if entries exceeds 75% of table size up to a
total table size of 64K. It will automatically shrink if usage falls
below 30%.

Also splits nl_table_lock into a separate mutex to protect hash table
mutations and allow synchronize_rcu() to sleep while waiting for readers
during expansion and shrinking.

Before:
   9.16%  kpktgend_0  [openvswitch]      [k] masked_flow_lookup
   6.42%  kpktgend_0  [pktgen]           [k] mod_cur_headers
   6.26%  kpktgend_0  [pktgen]           [k] pktgen_thread_worker
   6.23%  kpktgend_0  [kernel.kallsyms]  [k] memset
   4.79%  kpktgend_0  [kernel.kallsyms]  [k] netlink_lookup
   4.37%  kpktgend_0  [kernel.kallsyms]  [k] memcpy
   3.60%  kpktgend_0  [openvswitch]      [k] ovs_flow_extract
   2.69%  kpktgend_0  [kernel.kallsyms]  [k] jhash2

After:
  15.26%  kpktgend_0  [openvswitch]      [k] masked_flow_lookup
   8.12%  kpktgend_0  [pktgen]           [k] pktgen_thread_worker
   7.92%  kpktgend_0  [pktgen]           [k] mod_cur_headers
   5.11%  kpktgend_0  [kernel.kallsyms]  [k] memset
   4.11%  kpktgend_0  [openvswitch]      [k] ovs_flow_extract
   4.06%  kpktgend_0  [kernel.kallsyms]  [k] _raw_spin_lock
   3.90%  kpktgend_0  [kernel.kallsyms]  [k] jhash2
   [...]
   0.67%  kpktgend_0  [kernel.kallsyms]  [k] netlink_lookup

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lib: Resizable, Scalable, Concurrent Hash Table

Generic implementation of a resizable, scalable, concurrent hash table
based on [0]. The implementation supports both, fixed size keys specified
via an offset and length, or arbitrary keys via own hash and compare
functions.

Lookups are lockless and protected as RCU read side critical sections.
Automatic growing/shrinking based on user configurable watermarks is
available while allowing concurrent lookups to take place.

Objects to be hashed must include a struct rhash_head. The reason for not
using the existing struct hlist_head is that the expansion and shrinking
will have two buckets point to a single entry which would lead in obscure
reverse chaining behaviour.

Code includes a boot selftest if CONFIG_TEST_RHASHTABLE is defined.

[0] https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'intel-next'

Aaron Brown says:

====================
Intel Wired LAN Driver Updates

This series contains updates to the i40e and i40evf drivers.

Vasu adds FCOE support, build options and a documentation pointer to i40e.

Shannon exposes a Firmware API request used to do register writes on the
driver's behalf and disables local loopback on VMDQ VSI in order to stop the
VEB from echoing the VMDQ packets back at the VSI.

Ashish corrects the vf_id offset for virtchnl messages in the case of multiple
PFs, removes support for vf unicast promiscuos mode to disallow VFs from
receiving traffic intended for another VF, updates the vfr_stat state check to
handle the existing and future mechanism and adds an adapter state check to
prevent re-arming the watchdog timer after i40evf_remove has been called and
the timer has been deleted.

Serey fixes an issue where a guest OS would panic when removing the vf driver
while the device is being reset due to an attempt to clean a non initialized
mac_filter_list.

Akeem makes a minor comment change.

Jessie changes an instance of sprintf to snprintf that was missed when the
driver was converted to use snprintf everywhere.

Mitch plugs a few memory leaks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: Fixed guest OS panic when removing vf driver

Removing VF driver during device still in reset caused guest OS panic.

in the i40evf_remove(), we're trying to clean mac_filter_list which has
not been initialized since the device is still stuck at the reset.
The change is to initialize the filter_list before setting any task.

Change-ID: I8b59df7384416c7e6f2d264b598f447e1c2c92b0
Signed-off-by: Serey Kong <serey.kong@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: fix memory leak on unused interfaces

If the driver is loaded and then unloaded before the interface is
brought up, then it will allocate a MAC filter entry and never free it.
To fix this, on unload, run through the mac filter list and free all the
entries. We also do this during reset recovery when the driver cannot
contact the PF and needs to shut down completely.

Change-ID: I15fabd67eb4a1bfc57605a7db60d0b5d819839db
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: don't leak queue vectors

Fix a memory leak. Driver was allocating memory for queue vectors on
init but not freeing them on shutdown. These need to be freed at two
different times: during module unload, and during reset recovery when
the driver cannot contact the PF driver and needs to give up.

Change-ID: I7c1d0157a776e960d4da432dfe309035aad7c670
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: do not re-arm watchdog after remove

Add in an adapter state check to prevent re-arming watchdog timer after
i40evf_remove has been called and timer has been deleted.

Change-ID: I636ba7c6322be8cbf053231959f90c0a2d8d803a
Signed-off-by: Ashish Shah <ashish.n.shah@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: future-proof vfr_stat state check

Previously defined state I40E_VFR_VFACTIVE uses bit 1 which is now set to
"reserved." Update the state checks to also include I40E_VFR_COMPLETED.
This change will allow the VF to work with both existing and future PFs.

Change-ID: Ifd1d34f79f3b0ffd6d2550ee4dadc55825ff52f8
Signed-off-by: Ashish Shah <ashish.n.shah@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: remove support for vf unicast promiscuous mode

Remove the ability of a VF to set unicast promiscuous mode.
Considered to be a security risk to allow VFs to receive traffic
intended for other VFs so don't allow it, simply ignore the flag.

Also fix it to send the correct seid to aq for multicast promiscuous set.

Change-ID: Icb9c49a281a8e9d3aeebf991ef1533ac82b84b14
Signed-off-by: Ashish Shah <ashish.n.shah@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: Minor comment changes

Fixes comment for reset reason

Change-ID: I6fda4fa292255e6eb0f874502b4d38d722149b10
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: fix scan warning on sprintf

The driver was converted to use snprintf everywhere but this one function.
Just use snprintf, instead of sprintf.

Also a small spelling correction in a comment.

Change-ID: I59d45f94a52754c7b4cd6034df9a61d8132b7f77
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: disable local loopback on vmdq vsi

The local loopback should only be enabled for VSIs that are supporting
cascaded VEBs or VEPA setups. This is not the case here, and we need
to stop the VEB from echoing the VMDQ VSI packets back at the VSI.

Change-ID: I9dfb6ac79db24d04360d7efde62d81e20abc5090
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: use correct vf_id offset for virtchnl message

The vf_id needs to be offset by the vf_base_id from hw function capabilities
for the case of multiple PFs.

Change-ID: I20ca8621f98e9cdf98649380b8eeaa35db52677c
Signed-off-by: Ashish Shah <ashish.n.shah@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: expose debug_write_register request

Now that the HW registers are no longer in debug mode and many are
locked down for writes, we need to expose the Firmware API request
used to do writes on the driver's behalf.

Change-ID: I09a05c4dc9ea0b24c00193faac34d7799eaa8496
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <jamesx.m.young>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: adds FCoE to build and updates its documentation

Adds newly added FCoE files to the build but only if FCoE module is configured.

Also, updates i40e document for added FCoE support.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Jack Morgan<jack.morgan@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: Adds FCoE related code to i40e core driver

Adds FCoE specific code to existing i40e core driver to:-

1. have separate FCoE VSI with additional FCoE queues pairs.
2. have FCoE related hash defines.
3. have additional FCoE related stats code.
4. export and then re-use existing functions required by FCoE build.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Jack Morgan<jack.morgan@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: adds FCoE code to the i40e driver

This patch adds FCoE ( Fibre Channel Over Ethernet ) code for
Intel XL710 adapters. This patch is limited to only new FCoE
offloads code in newly added files by this patch and then
following patches in the series modifies rest of the existing
driver to enable FCoE with i40e driver.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Jack Morgan<jack.morgan@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'amd-xgbe-next'

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver update 2014-08-01

The following series of patches includes minor fixes/updates to the
driver.

- Remove some uses of spinlock around ethtool/phylib areas
- Update Rx/Tx ready check logic
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe-phy: Allow more time for Rx/Tx to become ready

The current time range waiting for Rx/Tx to become ready can sometimes
be too short if a connection is not present. Increase the number of
retries and the sleep to give a bit more time. Also, change level of
the message issued from _err to _dbg if Rx/Tx do not become ready
since the underlying logic will function as if no link is established
and retry eventually.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Remove unnecessary spinlocks

Remove the spinlocks around the ethtool get and set settings
functions and within the link adjustment callback routine.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dnet: Use managed interfaces

This patch introduces the use of managed interfaces like
devm_ioremap_resource and does away with the calls to free the
allocated memory in the probe and remove functions. Also, some
labels and variable are done away with. This fixes a bug as there
was a missing release_mem_region in the remove function.

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ks8851-ml: Use devm_ioremap_resource

This patch introduces the use of devm_ioremap_resource, devm_kmalloc and
does away with the functions to free the allocated memory in the probe
and remove functions. Also, some labels are done away with. A bug is
fixed as two regions are allocated in the probe function, but only one
is freed in the remove function.

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

cirrus: cs89x0: Use managed interfaces

This patch introduces the use of managed interfaces like
devm_ioremap_resource and does away with the functions to free the
allocated memory in the probe and remove functions. Also, many labels
are done away with. The field size in no longer needed and is hence
removed from the struct net_local.

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sh_eth: Add r8a7794 support

Signed-off-by: Hisashi Nakamura <hisashi.nakamura.ak@renesas.com>
[uli: added bindings documentation]
Signed-off-by: Ulrich Hecht <ulrich.hecht+renesas@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'be2net-next'

Sathya Perla says:

====================
be2net: patch set

Patch 1 fixes a regression caused by a previous commit on net-next.
Old versions of BE3 FW may not support cmds to re-provision (and hence
optimize) resources/queues in SR-IOV config. Do not treat this FW cmd
failure as fatal and fail the function initialization. Instead, just
enable SR-IOV with the resources provided by the FW.

Patch 2 ignores a VF mac address setting if the new mac is already active
on the VF.

Patch 3 adds support to delete a FW-dump via ethtool on Lancer adapters.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: support deleting FW dump via ethtool (only for Lancer)

This patch adds support to delete an existing FW-dump in Lancer via ethtool.
Initiating a new dump is not allowed if a FW dump is already present in the
adapter. The existing dump has to be first explicitly deleted.

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: ignore VF mac address setting for the same mac

ndo_set_vf_mac() call may be issued for a mac-addr that is already
active on a VF. If so, silently ignore the request.

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: ignore get/set profile FW cmd failures

Old versions of BE3 FW may not support cmds to re-provision (and hence
optimize) resources/queues in SR-IOV config. Do not treat this FW cmd
failure as fatal and fail the function initialization. Instead, just
enable SR-IOV with the resources provided by the FW.

Prior to the "create optimal number of queues on SR-IOV config" patch
such failures were ignored.
Fixes: bec84e6b2 ("create optimal number of queues on SR-IOV config")
Reported-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'inet-frags-next'

Nikolay Aleksandrov says:

====================
inet: frags: cleanup and kmem_cache use

This patchset does a couple of small cleanups in patches 1-5 and then in
patch 06 it introduces the use of kmem_cache for allocation/freeing of
inet_frag_queue+header objects.

v2: Broke up patch 02 into 3 patches as David suggested

Here are the results of a couple of netperf runs:
netperf options: -l 30 -I95,5 -i 15,10 -m 64k

- 10 gig before the patchset
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.155.1 () port 0 AF_INET : +/-2.500% @ 95% conf.
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   64000   30.00      442466      0    7551.39
212992           30.00      439130           7494.45

- 10 gig after the patchset
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.155.1 () port 0 AF_INET : +/-2.500% @ 95% conf.
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   64000   30.00      458846      0    7830.94
212992           30.00      457575           7809.25

- Virtio before the patchset
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.144.1 () port 0 AF_INET : +/-2.500% @ 95% conf.
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   64000   30.00      735000      0    12543.96
212992           30.00      560322           9562.79

- Virtio after the patchset
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.144.1 () port 0 AF_INET : +/-2.500% @ 95% conf.
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   64000   30.00      731729      0    12488.14
212992           30.00      647241           11046.21
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

inet: frags: use kmem_cache for inet_frag_queue

Use kmem_cache to allocate/free inet_frag_queue objects since they're
all the same size per inet_frags user and are alloced/freed in high volumes
thus making it a perfect case for kmem_cache.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: frags: use INET_FRAG_EVICTED to prevent icmp messages

Now that we have INET_FRAG_EVICTED we might as well use it to stop
sending icmp messages in the "frag_expire" functions instead of
stripping INET_FRAG_FIRST_IN from their flags when evicting.
Also fix the comment style in ip6_expire_frag_queue().

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: frags: fix function declaration alignments in inet_fragment

Fix a couple of functions' declaration alignments.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: frags: enum the flag definitions and add descriptions

Move the flags to an enum definion, swap FIRST_IN/LAST_IN to be in increasing
order and add comments explaining each flag and the inet_frag_queue struct
members.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: frags: rename last_in to flags

The last_in field has been used to store various flags different from
first/last frag in so give it a more descriptive name: flags.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: frags: use INC_STATS_BH in the ipv6 reassembly code

Softirqs are already disabled so no need to do it again, thus let's be
consistent and use the IP6_INC_STATS_BH variant.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

dm9000: NULL dereferences on error in probe()

The dm9000_release_board() function is called with NULL ->data_req and
->addr_req pointers if dm9000_probe() fails.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: remove nested rcu_read_lock/unlock

ip_local_deliver_finish() already have a rcu_read_lock/unlock, so
the rcu_read_lock/unlock is unnecessary.

See the stack below:
ip_local_deliver_finish
|
|
->icmp_rcv
|
|
->icmp_socket_deliver

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'filter-next'

Alexei Starovoitov says:

====================
net: filter: split sk_filter into socket and bpf, cleanup names

The main goal of the series is to split 'struct sk_filter' into socket and
bpf parts and cleanup names in the following way:
- everything that deals with sockets keeps 'sk_*' prefix
- everything that is pure BPF is changed to 'bpf_*' prefix

split 'struct sk_filter' into
struct sk_filter {
atomic_t        refcnt;
struct rcu_head rcu;
struct bpf_prog *prog;
};
and
struct bpf_prog {
        u32                     jited:1,
                                len:31;
        struct sock_fprog_kern  *orig_prog;
        unsigned int            (*bpf_func)(const struct sk_buff *skb,
                                            const struct bpf_insn *filter);
        union {
                struct sock_filter      insns[0];
                struct bpf_insn         insnsi[0];
                struct work_struct      work;
        };
};
so that 'struct bpf_prog' can be used independent of sockets and cleans up
'unattached' bpf use cases:
isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf
which don't need refcnt/rcu fields.

It's a follow up to the rcu cleanup started by Pablo in
commit 34c5bd66e5 ("net: filter: don't release unattached filter through call_rcu()")

Patch 1 - cleans up socket memory charging and makes it possible for functions
  sk(bpf)_migrate_filter(), sk(bpf)_prepare_filter() to be socket independent
Patches 2-4 - trivial renames
Patch 5 - sk_filter split and renames of related sk_*() functions
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: filter: split 'struct sk_filter' into socket and bpf parts

clean up names related to socket filtering and bpf in the following way:
- everything that deals with sockets keeps 'sk_*' prefix
- everything that is pure BPF is changed to 'bpf_*' prefix

split 'struct sk_filter' into
struct sk_filter {
atomic_t        refcnt;
struct rcu_head rcu;
struct bpf_prog *prog;
};
and
struct bpf_prog {
        u32                     jited:1,
                                len:31;
        struct sock_fprog_kern  *orig_prog;
        unsigned int            (*bpf_func)(const struct sk_buff *skb,
                                            const struct bpf_insn *filter);
        union {
                struct sock_filter      insns[0];
                struct bpf_insn         insnsi[0];
                struct work_struct      work;
        };
};
so that 'struct bpf_prog' can be used independent of sockets and cleans up
'unattached' bpf use cases

split SK_RUN_FILTER macro into:
    SK_RUN_FILTER to be used with 'struct sk_filter *' and
    BPF_PROG_RUN to be used with 'struct bpf_prog *'

__sk_filter_release(struct sk_filter *) gains
__bpf_prog_release(struct bpf_prog *) helper function

also perform related renames for the functions that work
with 'struct bpf_prog *', since they're on the same lines:

sk_filter_size -> bpf_prog_size
sk_filter_select_runtime -> bpf_prog_select_runtime
sk_filter_free -> bpf_prog_free
sk_unattached_filter_create -> bpf_prog_create
sk_unattached_filter_destroy -> bpf_prog_destroy
sk_store_orig_filter -> bpf_prog_store_orig_filter
sk_release_orig_filter -> bpf_release_orig_filter
__sk_migrate_filter -> bpf_migrate_filter
__sk_prepare_filter -> bpf_prepare_filter

API for attaching classic BPF to a socket stays the same:
sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *)
and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program
which is used by sockets, tun, af_packet

API for 'unattached' BPF programs becomes:
bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *)
and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program
which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: filter: rename sk_convert_filter() -> bpf_convert_filter()

to indicate that this function is converting classic BPF into eBPF
and not related to sockets

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: filter: rename sk_chk_filter() -> bpf_check_classic()

trivial rename to indicate that this functions performs classic BPF checking

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: filter: rename sk_filter_proglen -> bpf_classic_proglen

trivial rename to better match semantics of macro

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: filter: simplify socket charging

attaching bpf program to a socket involves multiple socket memory arithmetic,
since size of 'sk_filter' is changing when classic BPF is converted to eBPF.
Also common path of program creation has to deal with two ways of freeing
the memory.

Simplify the code by delaying socket charging until program is ready and
its size is known

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use inet6_iif instead of IP6CB()->iif

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ISDN: pcbit: off by one bugs in pcbit_set_msn()

1) We don't allocate enough space for the NUL terminator so we end up
   corrupting one character beyond the end of the buffer.

2) The "len - 1" should just be "len".  The code is trying to copy a
   word from a buffer up to a comma or the last word in the buffer.
   Say you have the buffer, "foo,bar,baz", then this code truncates the
   last letter off each word so you get "fo", "ba", and "ba".  You would
   hope this kind of bug would get noticed in testing...

   I'm not very familiar with this code and I can't test it, but I think
   we should copy the final character.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

AX88179_178A: Add ethtool ops for EEE support

Add functions to support ethtool EEE manipulating, and the EEE
is disabled in default setting to enhance the compatibility
with certain switch.

Signed-off-by: Freddy Xin <freddy@asix.com.tw>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: Use PAGE_ALIGNED macro

Use PAGE_ALIGNED(...) instead of IS_ALIGNED(..., PAGE_SIZE).

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix the counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS

When dealing with ICMPv[46] Error Message, function icmp_socket_deliver()
and icmpv6_notify() do some valid checks on packet's length, but then some
protocols check packet's length redaudantly. So remove those duplicated
statements, and increase counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS in
function icmp_socket_deliver() and icmpv6_notify() respectively.

In addition, add missed counter in udp6/udplite6 when socket is NULL.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Fixup v4mapped behaviour to comply with Sock API

The SCTP socket extensions API document describes the v4mapping option as
follows:

8.1.15.  Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR)

   This socket option is a Boolean flag which turns on or off the
   mapping of IPv4 addresses.  If this option is turned on, then IPv4
   addresses will be mapped to V6 representation.  If this option is
   turned off, then no mapping will be done of V4 addresses and a user
   will receive both PF_INET6 and PF_INET type addresses on the socket.
   See [RFC3542] for more details on mapped V6 addresses.

This description isn't really in line with what the code does though.

Introduce addr_to_user (renamed addr_v4map), which should be called
before any sockaddr is passed back to user space. The new function
places the sockaddr into the correct format depending on the
SCTP_I_WANT_MAPPED_V4_ADDR option.

Audit all places that touched v4mapped and either sanely construct
a v4 or v6 address then call addr_to_user, or drop the
unnecessary v4mapped check entirely.

Audit all places that call addr_to_user and verify they are on a sycall
return path.

Add a custom getname that formats the address properly.

Several bugs are addressed:
- SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for
   addresses to user space
- The addr_len returned from recvmsg was not correct when
   returning AF_INET on a v6 socket
- flowlabel and scope_id were not zerod when promoting
   a v4 to v6
- Some syscalls like bind and connect behaved differently
   depending on v4mapped

Tested bind, getpeername, getsockname, connect, and recvmsg for proper
behaviour in v4mapped = 1 and 0 cases.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: kernel-doc compliant documentation for net_device

Net_device is a vast and important structure, but it has no kernel-doc
compliant documentation. This patch extracts the comments from the structure
to clean it up, and let the scripts extract documentation from it. I know that
the patch is big, but it's just reordering of comments into the appropriate
form, and adding a few more, for the missing members.

Signed-off-by: Karoly Kemeny <karoly.kemeny@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'stmmac-next'

Vince Bridgers says:

====================
net: stmmac: Improve mcast/ucast filter for snps

This patch series adds Synopsys specific bindings for the Synopsys EMAC
filter characteristics since those are implementation dependent. The
multicast and unicast filtering code was improved to handle different
configuration variations based on device tree settings.

I verified the operation of the multicast and unicast filters through
Synopsys support as requested during the V1 review, and tested the GMAC
configuration on an Altera Cyclone 5 SOC (which supports 256 multicast
bins and 128 Unicast addresses). The 10/100 variant of this driver
modification was not tested, although it was compile tested. I shared
the email thread results of the investigation through Synopsys with the
stmmac maintainer.

V4: Remove patch from series that addressed a sparse issue from a
    down rev'd version of sparse that does not show up in the
    latest version of sparse.
V3: Break up the patch into interface and functional change patches
    per review comments
V2: Confirm with Synopsys methods to determine number of Multicast bins
    and Unicast address filter entries per first round review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Support devicetree configs for mcast and ucast filter entries

This patch adds and modifies code to support multiple Multicast and Unicast
Synopsys MAC filter configurations. The default configuration is defined to
support legacy driver behavior, which is 64 Multicast bins. The Unicast
filter code previously assumed all controllers support 32 or 16 Unicast
addresses based on controller version number, but this has been corrected
to support a default of 1 Unicast address. The filter configuration may
be specified through the devicetree using a Synopsys specific device tree
entry. This information was verified with Synopsys through
Synopsys Support Case #8000684337 and shared with the maintainer.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: socfpga: Add socfpga Ethernet filter attributes entries

This patch adds socfpga Ethernet filter attributes for multicast
and unicast filters per Synopsys Ethernet IP configuration chosen
by Altera for the Cyclone 5 and Arria SOC FPGAs.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dts: Add bindings for multicast hash bins and perfect filter entries

This change adds bindings for the number of multicast hash bins and perfect
filter entries supported by the Synopsys EMAC. The Synopsys EMAC core is
configurable at device creation time, and can be configured for a different
number of multicast hash bins and a different number of perfect filter
entries. The device does not provide a way to query these parameters,
therefore parameters are required. The Altera Cyclone V SOC has support for
256 multicast hash bins and 128 perfect filter entries, and is different
than what's currently provided in the stmmac driver.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Correct set_filter for multicast and unicast cases

This patch removes the check for the number of mulitcast addresses
when using hash based filtering since it's not necessary. If the number
of multicast addresses in the list exceeds the number of multicast hash
bins, the bins will "fold" over into one of the bins configured and
enabled for the particular component instance.

The default number of maximum unicast addresses was changed from 32 to 1
since this number is not dependent on the component revision. The maximum
number of multicast and unicast addresses is dependent on the configuration
of the Synopsys EMAC configured by the SOC architect at the time the
features were selected and configured for a particular component. Sadly,
Synopsys does not provide a way to query the precise number supported
by a particular component, so we must fall back on a devicetree entry.
This configuration could vary from vendor to vendor (such as STMicro,
Altera, etc).

The multicast bins are set for every possible filtering case (including
no entries) - previously the bits were set only if multicast filter entries
were present.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Change MAC interface to support multiple filter configurations

The synopsys EMAC can be configured for different numbers of multicast hash
bins and perfect filter entries at device creation time and there's no way
to query this configuration information at runtime. As a result, a devicetree
parameter is required in order for the driver to program these filters
correctly for a particular device instance. This patch modifies the
10/100/1000 MAC software interface such that these configuration parameters
can be set at initialization time.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains netfilter updates for net-next, they are:

1) Add the reject expression for the nf_tables bridge family, this
   allows us to send explicit reject (TCP RST / ICMP dest unrech) to
   the packets matching a rule.

2) Simplify and consolidate the nf_tables set dumping logic. This uses
   netlink control->data to filter out depending on the request.

3) Perform garbage collection in xt_hashlimit using a workqueue instead
   of a timer, which is problematic when many entries are in place in
   the tables, from Eric Dumazet.

4) Remove leftover code from the removed ulog target support, from
   Paul Bolle.

5) Dump unmodified flags in the netfilter packet accounting when resetting
   counters, so userspace knows that a counter was in overquota situation,
   from Alexey Perevalov.

6) Fix wrong usage of the bitwise functions in nfnetlink_acct, also from
   Alexey.

7) Fix a crash when adding new set element with an empty NFTA_SET_ELEM_LIST
   attribute.

This patchset also includes a couple of cleanups for xt_LED from
Duan Jiong and for nf_conntrack_ipv4 (using coccinelle) from
Himangi Saraogi.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: don't require root to read tcp_metrics

commit d23ff7016 (tcp: add generic netlink support for tcp_metrics) introduced
netlink support for the new tcp_metrics, however it restricted getting of
tcp_metrics to root user only. This is a change from how these values could
have been fetched when in the old route cache. Unless there's a legitimate
reason to restrict the reading of these values it would be better if normal
users could fetch them.

Cc: Julian Anastasov <ja@ssi.bg>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

team: fix releasing uninitialized pointer to BPF prog

Commit 34c5bd66e5ed introduced the possibility that an
uninitialized pointer on the stack (orig_fp) can call into
sk_unattached_filter_destroy() when its value is non NULL.

Before that commit orig_fp was only destroyed in the same
block where it was assigned a valid BPF prog before. Fix it
up by initializing it to NULL.

Fixes: 34c5bd66e5ed ("net: filter: don't release unattached filter through call_rcu()")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Pablo Neira <pablo@netfilter.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nf_tables: check for unset NFTA_SET_ELEM_LIST_ELEMENTS attribute

Otherwise, the kernel oopses in nla_for_each_nested when iterating over
the unset attribute NFTA_SET_ELEM_LIST_ELEMENTS in the
nf_tables_{new,del}setelem() path.

netlink: 65524 bytes leftover after parsing attributes in process `nft'.
[...]
Oops: 0000 [#1] SMP
[...]
CPU: 2 PID: 6287 Comm: nft Not tainted 3.16.0-rc2+ #169
RIP: 0010:[<ffffffffa0526e61>] [<ffffffffa0526e61>] nf_tables_newsetelem+0x82/0xec [nf_tables]
[...]
Call Trace:
[<ffffffffa05178c4>] nfnetlink_rcv+0x2e7/0x3d7 [nfnetlink]
[<ffffffffa0517939>] ? nfnetlink_rcv+0x35c/0x3d7 [nfnetlink]
[<ffffffff8137d300>] netlink_unicast+0xf8/0x17a
[<ffffffff8137d6a5>] netlink_sendmsg+0x323/0x351
[...]

Fix this by returning -EINVAL if this attribute is not set, which
doesn't make sense at all since those commands are there to add and to
delete elements from the set.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink_acct: avoid using NFACCT_F_OVERQUOTA with bit helper functions

Bit helper functions were used for manipulation with NFACCT_F_OVERQUOTA,
but they are accepting pit position, but not a bit mask. As a result
not a third bit for NFACCT_F_OVERQUOTA was set, but forth. Such
behaviour was dangarous and could lead to unexpected overquota report
result.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2014-07-30

This is the last pull request for ipsec-next before I'll be
off for two weeks starting on friday. David, can you please
take urgent ipsec patches directly into net/net-next during
this time?

1) Error handling simplifications for vti and vti6.
From Mathias Krause.

2) Remove a duplicate semicolon after a return statement.
From Christoph Paasch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: correct spelling

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'libphy_mmd'

Vince Bridgers says:

====================
net: libphy: Add phy specific functions to access mmd regs

This set of patches addresses a problem found with the Micrel ksz9021 phy and
libphy, where the ksz9021 phy does not support mmd extended register access
per the IEEE specification as assumed by libphy. The first patch adds a
framework for phy specific support to specify their own function to access
extended phy registers, return a failure code if not supported, or to default
to libphy's IEEE defined method for accessing the mmd extended phy registers.

This issue was found by using the Synopsys EMAC and a Micrel ksz9021 phy on the
Altera Cyclone 5 SOC development kit. This patch was tested on the same system
in both positive and negative test cases.

V5: Revert name of mmd register access functions, check for phy specific
    driver override functions in mmd register access functions per
    Florian's comments to minimize source code changes
V4: Correct error when formatting V3 patch - erroneous text cut from code
V3: Correct formatting of function arguments, remove return statement from
    NULL functions, and add patch for PHY driver documentation per review
    comments.
V2: Split the original patch submission into seperate patches for the libphy
    framework required for the modification and for the Micrel Phy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Documentation: networking: phy.txt: Update text for indirect MMD access

Update the PHY library documentation to describe how a specific PHY
driver can use the PAL MMD register access routines or override those
routines with it's own in the event the PHY does not support the IEEE
standard for reading and writing MMD phy registers.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: libphy: Add stubs to hook IEEE MMD Register reads and writes

The Micrel ksz9021 PHY does not support standard IEEE standard MMD
extended register access, therefore requires stubs to fail the read
register method and do nothing for the write register method when
libphy attempts to read and/or configure Energy Efficient Ethernet
features in PHYS that do support those features. This problem
was observed on an Altera Cyclone V SOC development kit that
uses the Synopsys EMAC and the Micrel ksz9021 PHY. This patch
was tested on the same board, and Energy Efficient Ethernet is
now disabled as expected since the Micrel PHY does not support that
feature.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: libphy: Add phy specific function to access mmd phy registers

libphy was originally written assuming all phy devices support clause 45
access extensions to the mmd registers through the indirection registers
located within the first 16 phy registers. This assumption is not true
in all cases, and one specific example is the Micrel ksz9021 10/100/1000
Mbps phy. Using the stmmac driver, accessing the mmd registers to query
and configure energy efficient Ethernet (EEE) features yielded unexpected
behavior.

This patch adds mmd access functions to the phy driver that can be
overriden by the phy specific driver if the phy does not support this
mechanism or uses it's own non-standard access mechanism. By default,
the IEEE Compatible clause 45 access mechanism described in clause 22
is used. With this patch, EEE query/configure functions as expected
using the stmmac and the Micrel ksz9021 phy.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/fsl: Add format length modifier to avoid negative values

Signed-off-by: Shruti Kanetkar <Shruti@Freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/fsl: fix misspelled word

Fix one misspelled word reported by codespell.

Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Signed-off-by: Shruti Kanetkar <Shruti@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: filter: don't release unattached filter through call_rcu()

sk_unattached_filter_destroy() does not always need to release the
filter object via rcu. Since this filter is never attached to the
socket, the caller should be responsible for releasing the filter
in a safe way, which may not necessarily imply rcu.

This is a short summary of clients of this function:

1) xt_bpf.c and cls_bpf.c use the bpf matchers from rules, these rules
   are removed from the packet path before the filter is released. Thus,
   the framework makes sure the filter is safely removed.

2) In the ppp driver, the ppp_lock ensures serialization between the
   xmit and filter attachment/detachment path. This doesn't use rcu
   so deferred release via rcu makes no sense.

3) In the isdn/ppp driver, it is called from isdn_ppp_release()
   the isdn_ppp_ioctl(). This driver uses mutex and spinlocks, no rcu.
   Thus, deferred rcu makes no sense to me either, the deferred releases
   may be just masking the effects of wrong locking strategy, which
   should be fixed in the driver itself.

4) In the team driver, this is the only place where the rcu
   synchronization with unattached filter is used. Therefore, this
   patch introduces synchronize_rcu() which is called from the
   genetlink path to make sure the filter doesn't go away while packets
   are still walking over it. I think we can revisit this once struct
   bpf_prog (that only wraps specific bpf code bits) is in place, then
   add some specific struct rcu_head in the scope of the team driver if
   Jiri thinks this is needed.

Deferred rcu release for unattached filters was originally introduced
in 302d663 ("filter: Allow to create sk-unattached filters").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: xt_bpf: add mising opaque struct sk_filter definition

This structure is not exposed to userspace, so fix this by defining
struct sk_filter; so we skip the casting in kernelspace. This is safe
since userspace has no way to lurk with that internal pointer.

Fixes: e6f30c7 ("netfilter: x_tables: add xt_bpf match")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

CAPI: use correct structure type name in sizeof

Correct typo in the name of the type given to sizeof. Because it is the
size of a pointer that is wanted, the typo has no impact on compilation or
execution.

This problem was found using Coccinelle (http://coccinelle.lip6.fr/). The
semantic patch used can be found in message 0 of this patch series.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'amd-xgbe-next'

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver update 2014-07-25

This patch series is dependent on the following patch that was
applied to the net tree and needs to be applied to the net-next
tree:
  332cfc823d18 - amd-xgbe: Fix error return code in xgbe_probe()

The following series of patches includes fixes and new support in the
driver.

- Device bindings documentation update
- Hardware timestamp support
- 2.5GbE support changes
- Fifo sizes based on active queues/rings
- Phylib driver updates for:
  - Rate change completion check
  - KR training initiation
  - Auto-negotiation results
- Traffic class support, including DCB support

This patch series is based on net-next.

Changes in V2:
  - Remove DBGPR(...., __func__) calls
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Add traffic class support

This patch adds support for traffic classes as well as support
for Data Center Bridging interfaces related to traffic classes
and priority flow control.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe-phy: Print out the auto-negotiation method used

Add a netdev_info statement detailing whether auto-negotiation was
completed through parallel detection or through the auto-negotiation
protocol.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe-phy: Updates to KR training initiation

As part of changing rates to KR mode, KR training is initiated. If
the KR training is restarted it is possible to enter an invalid logic
state. This can be avoided by asserting a training reset bit before
initiating the KR training and then clearing the training reset bit.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe-phy: Updates to rate change complete check

Currently, the logic will loop endlessly waiting for a rate change
to complete. Add a counter so that if the rate change signals
never indicate complete the loop will eventually exit.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Base queue fifo size and enablement on ring count

When setting the fifo sizes for the queues and enabling the queues
use the number of active Tx and Rx queues that have been enabled
not the maximum number available.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Update/fix 2.5GbE support

Update the amd-xgbe driver and phylib driver to better support
the 2.5GbE mode for the hardware. In order to be able establish
2.5GbE using clause 73 auto negotiation the device will support
speed sets of 1GbE/10GbE and 2.5GbE/10GbE.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Add hardware timestamp support

This patch adds support for Tx and Rx hardware timestamping.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Add dma-coherent to device bindings documentation

An earlier patch added support for the "dma-coherent" device property.
This patch adds this optional property to the amd-xgbe device bindings
documentation.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Remove unlikely() for WARN_ON() conditions

No need for the unlikely(), WARN_ON() and BUG_ON() internally use
unlikely() on the condition.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dcbnl : Fix misleading dcb_app->priority explanation

Current explanation of dcb_app->priority is wrong. It says priority is
expected to be a 3-bit unsigned integer which is only true when working with
DCBx-IEEE. Use of dcb_app->priority by DCBx-CEE expects it to be 802.1p user
priority bitmap. Updated accordingly

This affects the cxgb4 driver, but I will post those changes as part of a
larger changeset shortly.

Fixes: 3e29027af4372 ("dcbnl: add support for ieee8021Qaz attributes")
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: add platform init/exit for Altera's ARM socfpga

This patch adds platform init/exit functions and modifications to support
suspend/resume for the Altera Cyclone 5 SOC Ethernet controller. The platform
exit function puts the controller into reset using the socfpga reset
controller driver. The platform init function sets up the Synopsys mac by
first making sure the Ethernet controller is held in reset, programming the
phy mode through external support logic, then deasserts reset through
the socfpga reset manager driver.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx5-next'

Eli Cohen says:

====================
mlx5 driver changes related to PCI handling ***

The first of these patches is changing the pci device driver from mlx5_ib to
mlx5_core in a similar manner it is done in mlx4. This set the grounds for us
to introduce Ethernet driver for HW which uses mlx5.

The other two patches contain minor fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlx5: Adjust events to use unsigned long param instead of void *

In the event flow, we currently pass only a port number in the
void *data argument. Rather than pass a pointer to the event handlers,
we should use an "unsigned long" parameter, and pass the port number
value directly.

In the future, if necessary for some events, we can use the unsigned long
parameter to pass a pointer.

Based on a patch by Eli Cohen <eli@mellanox.com>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx5: minor fixes (mainly avoidance of hidden casts)

There were many places where parameters which should be u8/u16 were
integer type.

Additionally, in 2 places, a check for a non-null pointer was added
before dereferencing the pointer (this is actually a bug fix).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx5: Move pci device handling from mlx5_ib to mlx5_core

In preparation for a new mlx5 device which is VPI (i.e., ports can be
either IB or ETH), move the pci device functionality from mlx5_ib
to mlx5_core.

This involves the following changes:
1. Move mlx5_core_dev struct out of mlx5_ib_dev. mlx5_core_dev
   is now an independent structure maintained by mlx5_core.
   mlx5_ib_dev now has a pointer to that struct.
   This requires changing a lot of places where the core_dev
   struct was accessed via mlx5_ib_dev (now, this needs to
   be a pointer dereference).
2. All PCI initializations are now done in mlx5_core. Thus,
   it is now mlx5_core which does pci_register_device (and not
   mlx5_ib, as was previously).
3. mlx5_ib now registers itself with mlx5_core as an "interface"
   driver. This is very similar to the mechanism employed for
   the mlx4 (ConnectX) driver. Once the HCA is initialized
   (by mlx5_core), it invokes the interface drivers to do
   their initializations.
4. There is a new event handler which the core registers:
   mlx5_core_event(). This event handler invokes the
   event handlers registered by the interfaces.

Based on a patch by Eli Cohen <eli@mellanox.com>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

random32: mix in entropy from core to late initcall

Currently, we have a 3-stage seeding process in prandom():

Phase 1 is from the early actual initialization of prandom()
subsystem which happens during core_initcall() and remains
most likely until the beginning of late_initcall() phase.
Here, the system might not have enough entropy available
for seeding with strong randomness from the random driver.
That means, we currently have a 32bit weak LCG() seeding
the PRNG status register 1 and mixing that successively
into the other 3 registers just to get it up and running.

Phase 2 starts with late_initcall() phase resp. when the
random driver has initialized its non-blocking pool with
enough entropy. At that time, we throw away *all* inner
state from its 4 registers and do a full reseed with strong
randomness.

Phase 3 starts right after that and does a periodic reseed
with random slack of status register 1 by a strong random
source again.

A problem in phase 1 is that during bootup data structures
can be initialized, e.g. on module load time, and thus access
a weakly seeded prandom and are never changed for the rest
of their live-time, thus carrying along the results from a
week seed. Lets make sure that current but also future users
access a possibly better early seeded prandom.

This patch therefore improves phase 1 by trying to make it
more 'unpredictable' through mixing in seed from a possible
hardware source. Now, the mix-in xors inner state with the
outcome of either of the two functions arch_get_random_{,seed}_int(),
preferably arch_get_random_seed_int() as it likely represents
a non-deterministic random bit generator in hw rather than
a cryptographically secure PRNG in hw. However, not all might
have the first one, so we use the PRNG as a fallback if
available. As we xor the seed into the current state, the
worst case would be that a hardware source could be unverifiable
compromised or backdoored. In that case nevertheless it
would be as good as our original early seeding function
prandom_seed_very_weak() since we mix through xor which is
entropy preserving.

Joint work with Daniel Borkmann.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nfnetlink_acct: dump unmodified nfacct flags

NFNL_MSG_ACCT_GET_CTRZERO modifies dumped flags, in this case
client see unmodified (uncleared) counter value and cleared
overquota state - end user doesn't know anything about overquota state,
unless end user subscribed on overquota report.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux

Pull Exynos platform DT fix from Grant Likely:
"Device tree Exynos bug fix for v3.16-rc7

  This bug fix has been brewing for a while.  I hate sending it to you
  so late, but I only got confirmation that it solves the problem this
  past weekend.  The diff looks big for a bug fix, but the majority of
  it is only executed in the Exynos quirk case.  Unfortunately it
  required splitting early_init_dt_scan() in two and adding quirk
  handling in the middle of it on ARM.

  Exynos has buggy firmware that puts bad data into the memory node.
  Commit 1c2f87c22566 ("ARM: Get rid of meminfo") exposed the bug by
  dropping the artificial upper bound on the number of memory banks that
  can be added.  Exynos fails to boot after that commit.  This branch
  fixes it by splitting the early DT parse function and inserting a
  fixup hook.  Exynos uses the hook to correct the DT before parsing
  memory regions"

* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
  arm: Add devicetree fixup machine function
  of: Add memory limiting function for flattened devicetrees
  of: Split early_init_dt_scan into two parts

Merge tag 'stable/for-linus-3.16-rc7-tag' of git://git./linux/kernel/git/xen/tip

Pull Xen fix from David Vrabel:
"Fix BUG when trying to expand the grant table.  This seems to occur
  often during boot with Ubuntu 14.04 PV guests"

* tag 'stable/for-linus-3.16-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: safely map and unmap grant frames when in atomic context

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fix from Paolo Bonzini:
"Fix a bug which allows KVM guests to bring down the entire system on
some 64K enabled ARM64 hosts"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform

Revert "cdc_subset: deal with a device that needs reset for timeout"

This reverts commit 20fbe3ae990fd54fc7d1f889d61958bc8b38f254.

As reported by Stephen Rothwell, it causes compile failures in certain
configurations:

  drivers/net/usb/cdc_subset.c:360:15: error: 'dummy_prereset' undeclared here (not in a function)
    .pre_reset = dummy_prereset,
                 ^
  drivers/net/usb/cdc_subset.c:361:16: error: 'dummy_postreset' undeclared here (not in a function)
    .post_reset = dummy_postreset,
                  ^

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: David Miller <davem@davemloft.net>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Make fragmentation IDs less predictable, from Eric Dumazet.

2) TSO tunneling can crash in bnx2x driver, fix from Dmitry Kravkov.

3) Don't allow NULL msg->msg_name just because msg->msg_namelen is
    non-zero, from Andrey Ryabinin.

4) ndm->ndm_type set using wrong macros, from Jun Zhao.

5) cdc-ether devices can come up with entries in their address filter,
    so explicitly clear the filter after the device initializes.  From
    Oliver Neukum.

6) Forgotten refcount bump in xfrm_lookup(), from Steffen Klassert.

7) Short packets not padded properly, exposing random data, in bcmgenet
    driver.  Fix from Florian Fainelli.

8) xgbe_probe() doesn't return an error code, but rather zero, when
    netif_set_real_num_tx_queues() fails.  Fix from Wei Yongjun.

9) USB speed not probed properly in r8152 driver, from Hayes Wang.

10) Transmit logic choosing the outgoing port in the sunvnet driver
    needs to consider a) is the port actually up and b) whether it is a
    switch port.  Fix from David L Stevens.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
  net: phy: re-apply PHY fixups during phy_register_device
  cdc-ether: clean packet filter upon probe
  cdc_subset: deal with a device that needs reset for timeout
  net: sendmsg: fix NULL pointer dereference
  isdn/bas_gigaset: fix a leak on failure path in gigaset_probe()
  ip: make IP identifiers less predictable
  neighbour : fix ndm_type type error issue
  sunvnet: only use connected ports when sending
  can: c_can_platform: Fix raminit, use devm_ioremap() instead of devm_ioremap_resource()
  bnx2x: fix crash during TSO tunneling
  r8152: fix the checking of the usb speed
  net: phy: Ensure the MDIO bus module is held
  net: phy: Set the driver when registering an MDIO bus device
  bnx2x: fix set_setting for some PHYs
  hyperv: Fix error return code in netvsc_init_buf()
  amd-xgbe: Fix error return code in xgbe_probe()
  ath9k: fix aggregation session lockup
  net: bcmgenet: correctly pad short packets
  net: sctp: inherit auth_capable on INIT collisions
  mac80211: fix crash on getting sta info with uninitialized rate control
  ...

x86/xen: safely map and unmap grant frames when in atomic context

arch_gnttab_map_frames() and arch_gnttab_unmap_frames() are called in
atomic context but were calling alloc_vm_area() which might sleep.

Also, if a driver attempts to allocate a grant ref from an interrupt
and the table needs expanding, then the CPU may already by in lazy MMU
mode and apply_to_page_range() will BUG when it tries to re-enable
lazy MMU mode.

These two functions are only used in PV guests.

Introduce arch_gnttab_init() to allocates the virtual address space in
advance.

Avoid the use of apply_to_page_range() by using saving and using the
array of PTE addresses from the alloc_vm_area() call (which ensures
that the required page tables are pre-allocated).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>