Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

bonding: don't validate arp if we don't have to

Currently, we validate all the incoming arps if arp_validate not 0.
However, we don't have to validate backup slaves if arp_validate == active
and vice versa, so return early in bond_arp_rcv() in these cases.

It works correctly now because we verify arp_validate in slave_last_rx(),
however we're just doing useless work in bond_arp_rcv().

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: don't add duplicate targets to arp_ip_target

Print a warning and skip them.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: add helper function bond_get_targets_ip(targets, ip)

Add function bond_get_targets_ip(targets, ip) which searches through
targets array of ips (arp_targets) and returns the position of first
match. If ip == 0, returns the first free slot. On failure to find the
ip or free slot, return -1.

Use it to verify if the arp we've received is valid and in sysfs.

v1->v2:
Fix "[2/6] bonding: add helper function bond_get_targets_ip(targets, ip)",
per Nikolay's advice, to verify if source ip != 0.0.0.0, otherwise we might
update 'null' arp_ip_targets' last_rx. Also, address style.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: davinci_mdio: gaurd the DT code with IS_ENABLED(CONFIG_OF)

guard the davinci_mdio_of_mtable table and davinci_mdio_probe_dt()
with CONFIG_OF.

Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: davinci_emac: simplify the OF parser code

This patch cleans up the OF parser code, removes unnecessary checks
on of_property_read_*() and guards davinci_emac_of_match table with
CONFIG_OF.

Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: davinci: emac: Convert to devm_* api

Use devm_ioremap_resource instead of devm_request_mem_region()/devm_ioremap()
and devm_request_irq() instead of request_irq().

This ensures more consistent error values and simplifies error paths.

Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

doc: fix some syntax errors in netlink mmap sample code

Cc: Patrick McHardy <kaber@trash.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: Perform GSO on forwarding path.

When macvtap forwards skb to its tap, it needs to check
if GSO needs to be performed. This is sometimes necessary
when the HW device performed GRO, but the guest reading
from the tap does not support it (ex: Windows 7).

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: Let TUNSETOFFLOAD actually controll offload features.

When the user issues TUNSETOFFLOAD ioctl, macvtap does not do
anything other then to verify arguments. This patch adds
functionality to allow users to actually control offload features.
NETIF_F_GSO and NETIF_F_GRO are always on, but the rest of the
features can be controlled.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: Consistently use rcu functions

Currently macvtap uses rcu_bh functions in its
user facing fuction macvtap_get_user() and macvtap_put_user().
However, its packet handlers use normal rcu as the rcu_read_lock()
is taken in netif_receive_skb(). We can safely discontinue
the usage or rcu with bh disabled.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: Convert to using rtnl lock

Macvtap uses a private lock to protect the relationship between
macvtap_queue and macvlan_dev. The private lock is not needed
since the relationship is managed by user via open(), release(),
and dellink() calls. dellink() already happens under rtnl, so
we can safely convert open() and release(), and use it in ioctl()
as well.

Suggested by Eric Dumazet.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: poll/select low latency socket support

select/poll busy-poll support.

Split sysctl value into two separate ones, one for read and one for poll.
updated Documentation/sysctl/net.txt

Add a new poll flag POLL_LL. When this flag is set, sock_poll will call
sk_poll_ll if possible. sock_poll sets this flag in its return value
to indicate to select/poll when a socket that can busy poll is found.

When poll/select have nothing to report, call the low-level
sock_poll again until we are out of time or we find something.

Once the system call finds something, it stops setting POLL_LL, so it can
return the result to the user ASAP.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/arc/arc_emac - Add new driver

Driver for non-standard on-chip ethernet device ARC EMAC 10/100,
instantiated in some legacy ARC (Synopsys) FPGA Boards such as
ARCAngel4/ML50x.

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Joe Perches <joe@perches.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Mischa Jonker <mjonker@synopsys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-kernel@vger.kernel.org
Cc: devicetree-discuss@lists.ozlabs.org
Cc: Florian Fainelli <florian@openwrt.org>
Cc: David Laight <david.laight@aculab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: simplify sctp_get_port

No need to have an extra ret variable when we directly can return
the value of sctp_get_port_local().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: decouple cleaning some socket data from endpoint

Rather instead of having the endpoint clean the garbage from the
socket, use a sk_destruct handler sctp_destruct_sock(), that does
the job for that when there are no more references on the socket.
At least do this for our crypto transform through crypto_free_hash()
that is allocated when in listening state.

Also, perform sctp_put_port() only when sk is valid. At a later
point in time we can still determine if there's an option of
placing this into sk_prot->unhash() or sctp_endpoint_free() without
any races. For now, leave it in sctp_endpoint_destroy() though.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: minor: sctp_seq_dump_local_addrs add missing newline

A trailing newline has been forgotten to add into the WARN().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: migrate cookie life from timeval to ktime

Currently, SCTP code defines its own timeval functions (since timeval
is rarely used inside the kernel by others), namely tv_lt() and
TIMEVAL_ADD() macros, that operate on SCTP cookie expiration.

We might as well remove all those, and operate directly on ktime
structures for a couple of reasons: ktime is available on all archs;
complexity of ktime calculations depending on the arch is less than
(reduces to a simple arithmetic operations on archs with
BITS_PER_LONG == 64 or CONFIG_KTIME_SCALAR) or equal to timeval
functions (other archs); code becomes more readable; macros can be
thrown out.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ktime: add ms_to_ktime() and ktime_add_ms() helpers

Add two ktime helper functions that i) convert a given msec value to
a ktime structure and ii) that adds a msec value to a ktime structure.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: remove TEST_FRAME ifdef

We do neither ship a test_frame.h, nor will this be compatible with
the 2.5 out-of-tree lksctp kernel test suite anyway. So remove this
artefact.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Fail device init if num_vfs is negative

Should not allow negative num_vfs

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Add warning in case of command timeouts

Warning prints when there are command timeout to help debugging future
failures.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Replace sscanf() with kstrtoint()

It is not safe to use sscanf.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Remove an unnecessary test

Since this variable is now part of a structure and not allocated dynamically,
this test is irrelevant now.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Add prints when TX timeout occurs

Print a warning when a TX timeout is detected

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix a race between napi poll function and RX ring cleanup

The RX rings were cleaned while there was still possible RX traffic completion
handling.
Change the sequance of events so that the port is closed and the QPs are being
stopped before RX cleanup.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Change log level from error to debug for vlan related messages

The port vlan table size is 126 (used for IBoE) so after 126 we will
not have space and the user need to see it only in debug print and not
error.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Move register_netdev() to the end of initialization function

To avoid a race between the open function and everything that happens after
register_netdev() move it to be the last operation called.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Do not query stats when device port is down

There are no counters allocated to the eth device when the port is down, so
this query is meaningless at that time.

It also leads to querying incorrect counters (since the counter_index is not
valid when the device port is down).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix resource leak in error flow

Wrong condition was used when calling iounmap.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: remove old token ipv6 address as soon as possible

If the tokenized ip address is re-set on an interface we depend on the
arrival of a new router advertisment to call addrconf_verify to clean
up the old address (which valid_lft is now set to 0). Old addresses can
linger around for a longer time if e.g. the source of router advertisments
vanishes.

So, call addrconf_verify immediately after setting the new tokenized
address to get rid of the old tokenized addresses.

Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: don't disable interface if last ipv6 address is removed

The reason behind this change is that as soon as we delete
the last ipv6 address of an interface we also lose the
/proc/sys/net/ipv6/conf/<interface> directory. This seems to be a
usability problem for me.

I don't see any reason why we should shutdown ipv6 on that interface in
such cases.

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: split duplicate address detection and router solicitation timer

This patch splits the timers for duplicate address detection and router
solicitations apart. The router solicitations timer goes into inet6_dev
and the dad timer stays in inet6_ifaddr.

The reason behind this patch is to reduce the number of unneeded router
solicitations send out by the host if additional link-local addresses
are created. Currently we send out RS for every link-local address on
an interface.

If the RS timer fires we pick a source address with ipv6_get_lladdr. This
change could hurt people adding additional link-local addresses and
specifying these addresses in the radvd clients section because we
no longer guarantee that we use every ll address as source address in
router solicitations.

Cc: Flavio Leitner <fleitner@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4: allow order-0 memory allocations in RX path

Signed-off-by: Eric Dumazet <edumazet@google.com>
mlx4 exclusively uses order-2 allocations in RX path, which are
likely to fail under memory pressure.

We therefore drop frames more than needed.

This patch tries order-3, order-2, order-1 and finally order-0
allocations to keep good performance, yet allow allocations if/when
memory gets fragmented.

By using larger pages, and avoiding unnecessary get_page()/put_page()
on compound pages, this patch improves performance as well, lowering
false sharing on struct page.

Also use GFP_KERNEL allocations in initialization path, as allocating 12
MB (390 order-3 pages) can easily fail with GFP_ATOMIC.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-davem' of git://git./linux/kernel/git/bwh/sfc-next

Ben Hutchings says:

====================
1. Make EEH recovery work when using legacy interrupts, from Alexandre
   Rames.

2. Enable accelerated RFS for VLAN-tagged flows, from Andy Lutomirski.

3. Improve performance for non-TCP (and particularly UDP) traffic, which
   regressed in 3.10 when we switched to always allocating paged RX
   buffers.  Partly by Jon Cooper.

4. Some minor bug fixes to IOMMU detection, timestamping capabilities,
   and IRQ cleanup on the probe failure path.

I've dropped the RX skb cache, which improved some benchmarks but
perhaps needs some reworking to be more generally useful.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qeth: use default napi weight

Since commit 82dc3c63c692b1e1d59378ecee948ac88e034aad
"net: introduce NAPI_POLL_WEIGHT" network drivers receive a warning
when they use napi weight higher than NAPI_POLL_WEIGHT. This patch
reduces QETH_NAPI_WEIGHT from 128 to 64 (NAPI_POLL_WEIGHT).

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qeth: Fix crash on initial MTU size change

When the initial MTU size is changed prior to any activity on the device
(e.g. by attaching a z/VM vNIC already configured in Linux to a guestLAN),
we call dev_kfree_skb_irq(NULL) which results in a kernel panic.
Adding a proper check for NULL pointers to address this issue.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qeth: change default standard blkt settings for OSA

blkt settings (or LAN idle settings) for an OSA Express card
determine when and how often an OSA Express card tells the
operating system about new incoming packets. The semantic of
these settings has changed starting with OSA Express3. Currently
the qeth standard settings apply to OSA Express2 and older
generations of OSA Express cards, while new generations of OSA
Express cards require extra coding of their reasonable default.

To cover future OSA Express generations the qeth default standard
blkt setting is now the desired setting for OSA generations
starting with OSA Express3, while the fixed set of older OSA
Express cards receives its blkt settings explicitly.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qeth: Increase default MTU for OSA devices

Increase the default MTU for real OSA devices in layer 2 mode
to 1500 Bytes for increased compatibility.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netiucv: remove unused macro

If someone is interested to dump something they may consider to use
print_hex_dump() or print_hex_dump_bytes() kernel helpers.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Remove sparse and coccinelle warnings

This patch solves several sparse issues as well as an unneeded semicolon
found via coccinelle.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: add include file to suppress sparse warnings

commit f88c91ddba95 ("ipv6: statically link
register_inet6addr_notifier()" added following sparse warnings :

net/ipv6/addrconf_core.c:83:5: warning: symbol
'register_inet6addr_notifier' was not declared. Should it be static?
net/ipv6/addrconf_core.c:89:5: warning: symbol
'unregister_inet6addr_notifier' was not declared. Should it be static?
net/ipv6/addrconf_core.c:95:5: warning: symbol
'inet6addr_notifier_call_chain' was not declared. Should it be static?

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: remove invalid __rcu annotation

struct tcp_fastopen_context has a field named tfm, which is a pointer
to a crypto_cipher structure.

It currently has a __rcu annotation, which is not needed at all.

tcp_fastopen_ctx is the pointer fetched by rcu_dereference(), but once
we have a pointer to current tcp_fastopen_context, we do not use/need
rcu_dereference() to access tfm.

This fixes a lot of sparse errors like the following :

net/ipv4/tcp_fastopen.c:21:31: warning: incorrect type in argument 1 (different address spaces)
net/ipv4/tcp_fastopen.c:21:31: expected struct crypto_cipher *tfm
net/ipv4/tcp_fastopen.c:21:31: got struct crypto_cipher [noderef] <asn:4>*tfm

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

packet: nlmon: virtual netlink monitoring device for packet sockets

Currently, there is no good possibility to debug netlink traffic that
is being exchanged between kernel and user space. Therefore, this patch
implements a netlink virtual device, so that netlink messages will be
made visible to PF_PACKET sockets. Once there was an approach with a
similar idea [1], but it got forgotten somehow.

I think it makes most sense to accept the "overhead" of an extra netlink
net device over implementing the same functionality from PF_PACKET
sockets once again into netlink sockets. We have BPF filters that can
already be easily applied which even have netlink extensions, we have
RX_RING zero-copy between kernel- and user space that can be reused,
and much more features. So instead of re-implementing all of this, we
simply pass the skb to a given PF_PACKET socket for further analysis.

Another nice benefit that comes from that is that no code needs to be
changed in user space packet analyzers (maybe adding a dissector, but
not more), thus out of the box, we can already capture pcap files of
netlink traffic to debug/troubleshoot netlink problems.

Also thanks goes to Thomas Graf, Flavio Leitner, Jesper Dangaard Brouer.

[1] http://marc.info/?l=linux-netdev&m=113813401516110

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netlink: virtual tap device management

Similarly to the networking receive path with ptype_all taps, we add
the possibility to register netdevices that are for ARPHRD_NETLINK to
the netlink subsystem, so that those can be used for netlink analyzers
resp. debuggers. We do not offer a direct callback function as out-of-tree
modules could do crap with it. Instead, a netdevice must be registered
properly and only receives a clone, managed by the netlink layer. Symbols
are exported as GPL-only.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: if_arp: add ARPHRD_NETLINK type

This small patch adds the definition of ARPHRD_NETLINK which can for
example be used by netlink monitoring devices as device type. So that
sockaddr_ll can pick it up and based on that choose the correct packet
dissector.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Restore unintentional reverts.

This restores commits:

c573972c111eb4c6b3f3250ad71e7c75cc799833
1a5904342c7380ceddd61c0b37544d752d0b1433
da2e2c214953f37c2a6be20226537ca5a329724c

which initially accidently went into 'net', were
reverted there, and then properly placed into 'net-next'.
But the next net --> net-next merge accidently wiped them
out again.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Improve test for IOMMU in use

The device::iommu_group field may be set even if no IOMMU is in use.
iommu_present() is still a better indicator, although it doesn't tell
us whether *our* device is affected.

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

sfc: Fix IRQ cleanup in case of a probe failure

The lifetime of an irq_cpu_rmap is odd: we have to allocate it before
installing IRQ handlers and free it before removing the IRQ handlers.
As a result of this asymmetry, it was omitted from some failure paths.

On another failure path, we could try to remove IRQ handlers we
had not yet installed.

Move the irq_cpu_rmap allocation and freeing alongside IRQ handler
installation and removal, in efx_nic_{init,fini}_interrupts().
Count the number of IRQ handlers successfully installed and only
remove those on the failure path.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

sfc: Do not pass non-TCP packets into GRO code

GRO can handle non-TCP packets and pass them up without coalescing,
but it has to do some extra work to parse the packet which we can
bypass using the hardware parse result. (This condition yields a
false negative for TCP/IPv6 packets received by Falcon, but its
performance is already poor in that case due to lack of checksum
offload.)

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

sfc: Define and set RX buffer flag for packets parsed as TCP

This will be useful for shortcutting some software packet parsing.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

sfc: Enable accelerated RFS on vlans

As far as I know, the hardware doesn't support matching on both IP
fields and vlan tag, but it can at least match on the IP fields.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

sfc: Report software timestamping capabilities

The kernel can generate software receive timestamps and we should
report those for all ports regardless of hardware capabilities.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

sfc: Increase size of RX SKB header area

This allows the SKB to hold the headers without reallocation more often.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

sfc: Enable RX checksum offload for packets not handled by GRO

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

sfc: Fix EEH with legacy interrupts.

PCI legacy interrupts are level-triggered, and we cannot mask them up
on an isolated device. Instead, disable the IRQ at the controller
until we have recovered.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

bnx2x: Fix compilation with no IOV support

This fixes an issue caused by submit 78c3bcc5d1af64f51d9f30b0f5a2d1985bf69734
`bnx2x: Improve PF behaviour toward VF', which made the bnx2x driver fail
compilation when PCI_IOV is not set.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Unmap fragment page once iterator is done

Callers of skb_seq_read() are currently forced to call skb_abort_seq_read()
even when consuming all the data because the last call to skb_seq_read (the
one that returns 0 to indicate the end) fails to unmap the last fragment page.

With this patch callers will be allowed to traverse the SKB data by calling
skb_prepare_seq_read() once and repeatedly calling skb_seq_read() as originally
intended (and documented in the original commit 677e90eda), that is, only call
skb_abort_seq_read() if the sequential read is actually aborted.

Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless-next

John W. Linville says:

====================
I would guess that this is the last big wireless pull request before
the 3.11 merge window...

Regarding the mac80211 bits, Johannes says:

"I have a number of mesh fixes and improvements from Colleen, Jacob,
Ashok and Thomas, powersave fixes in mac80211 from Alex, improved
management-TX from Antonio, and a few various things, including locking
fixes, from others and myself. Overall though, nothing really stands
out."

As for the iwlwifi bits, Johannes says:

"Emmanuel contributed two AP mode fixes, removed an unused field, fixed a
comment and added a warning for something that shouldn't happen in
practice, and I removed the declaration of a function that doesn't even
exist and cleaned up a small include."

"This time I have a number of cleanups, a small fix from Emmanuel and two
performance improvements that combined reduce our driver's CPU
utilisation as much as 75% in high TX-throughput scenarios."

"These two patches fix two issues with using rfkill randomly during
traffic, which would then cause our driver to stop working and not be
able to recover at all."

Regarding the ath6kl bits, Kalle says:

"Here are few simple patches for ath6kl. We have a suspend crash fix for
USB from Shafi, use of mac_pton(), a compiler warning fix and a fix for
module initialisation error path."

Kalle also sends the biggest single item of note, the new ath10k
driver for Qualcomm Atheros 802.11ac CQA98xx devices.

Included is an NFC pull, of which Samuel says:

"These are the pending NFC patches for the 3.11 merge window.

It contains the pending fixes that were on nfc-fixes (nfc-fixes-3.10-2),
along with a few more for the pn544 and pn533 drivers, the LLCP
disconnection path and an LLCP memory leak.

Highlights for this one are:

- An initial secure element API. NFC chipsets can carry an embedded
  secure element or get access to the SIM one. In both cases they
  control the secure elements and this API provides a way to discover,
  enable and disable the available SEs. It also exports that to
  userspace in order for SE focused middleware to actually do something
  with them (e.g. payments).

- NCI over SPI support. SPI is the most complex NCI specified transport
  layer and we now have support for it in the kernel. The next step will
  be to implement drivers for NCI chipsets using this transport like
  e.g. bcm2079x.

- NFC p2p hardware simulation driver. We now have an nfcsim driver that
  is mostly a loopback device between 2 NFC interfaces. It also
  implements the rest of the NFC core API like polling and target
  detection. This driver, with neard running on top of it, allows us to
  completely test the LLCP, SNEP and Handover implementation without
  physical hardware.

- A Firmware update netlink API. Most (All ?) HCI chipsets have a
  special firmware update mode where applications can push a new
  firmware that will be flashed. We now have a netlink API for providing
  that mode to e.g. nfctool."

On top of all that, there are a variety of updates to brcmfmac,
iwlegacy, rtlwifi, wil6210, and the TI wl12xx drivers.  As usual,
the bcma and ssb busses get a little love as well, as do a handful
of others here and there.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: double free on unload

There is a typo here, "i" vs "j", so we would crash on module_exit().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip_tunnel: Protect tunnel functions with CONFIG_INET guard.

Tunnel constants can be used in generic code but in these cases
the inline functions in ip_tunnels.h cause compilation problems
if CONFIG_INET is not set.

CC: Pravin Shelar <pshelar@nicira.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Use correct config guard.

This bug was introduced by commit aa310701e787087
(openvswitch: Add gre tunnel support.)

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: fix a typo in comments

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: allow large number of tx queues

netif_alloc_netdev_queues() uses kcalloc() to allocate memory
for the "struct netdev_queue *_tx" array.

For large number of tx queues, kcalloc() might fail, so this
patch does a fallback to vzalloc().

As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
to kzalloc() flags to do this fallback only when really needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bnx2x'

Yuval Mintz says:

====================
This patch series mostly revolves around improving SR-IOV implementation
(Better PF-VF relation, sanity checks and timings), as well as including
a patch correcting the (outward) advertisement of 20G capabilities.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Fix 20G KR2 support claims

Don't claim 20G is supported if the speed is unsupported by the phys
(reflected by various ethtools and ndos).

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: improve VF timings

Wait 100ms for FLR to complete in parallel over all VFs instead of serializing
the waits (which can amount to several seconds with 64 VFs).

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: VF ndo sanity

If iproute2 VF callbacks are invoked before PF is loaded,
abort gracefully.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Improve PF behaviour toward VF

If PF is unloaded with loaded VFs, signal towards VFs so they can detect
this gracefully.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
----
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h       |  2 ++
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c   |  3 +++
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c  | 23 +++++++++++++++++++---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c | 24 ++++++++++++++++++++---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h |  2 ++
drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c  | 12 +++++++++++-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.h  |  5 ++++-
7 files changed, 63 insertions(+), 8 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>

VSOCK: Fix VSOCK_HASH and VSOCK_CONN_HASH

If we mod with VSOCK_HASH_SIZE -1, we get 0, 1, .... 249. Actually, we
have vsock_bind_table[0 ... 250] and vsock_connected_table[0 .. 250].
In this case the last entry will never be used.

We should mod with VSOCK_HASH_SIZE instead.

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

VSOCK: Remove unnecessary label

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

VSOCK: Return VMCI_ERROR_NO_MEM when fails to allocate skb

vmci_transport_recv_dgram_cb always return VMCI_SUCESS even if we fail
to allocate skb, return VMCI_ERROR_NO_MEM instead.

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

VSOCK: Introduce vsock_auto_bind helper

This peace of code is called three times, let's have a helper for it.

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: remove a useless pr_info() in addrconf_gre_config()

This is debug info, should at least be pr_debug(), but given
that this code is in upstream for two years, there is no
need to keep this debugging printk any more, so just remove it.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Update version to 5.2.44

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Add support for 83xx suspend and resume.

o Implement shutdown and resume handlers for 83xx.
o Refactor 82xx shutdown and resume handlers.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Add support for 'set driver version' in 83XX

Issue 'set driver version' during driver load and after reset recovery
to notify the driver version to the firmware.

Signed-off-by: Pratik Pujar <pratik.pujar@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Cleanup of structure qlcnic_hardware_context

Signed-off-by: Pratik Pujar <pratik.pujar@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Add support for PEX DMA method to read memory section of adapter dump

This patch adds support to read memory section of adapter
dump using PEX DMA method. This method significantly improves
total adapter dump collection time.

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Minimize sleep duration within loopback diagnostic test.

o Minimize sleep duration and check for adapter status.
o Exit from loopback test if adapter reset is detected.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Secondary unicast MAC address support.

Add support for configuring secondary unicast address which
will use existing HW filters to store all the unicast MAC
addresses and prevent device going into promiscuous mode.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Handle qlcnic_alloc_mbx_args() failure

qlcnic_alloc_mbx_args() may fail due to failure in memory allocation.
This patch checks for failure of qlcnic_alloc_mbx_args() to avoid
potential invalid memory access.

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next into for-davem

Conflicts:
net/wireless/nl80211.c

ndisc: Convert use of typedef ctl_table to struct ctl_table

This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Convert use of typedef ctl_table to struct ctl_table

This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: frag , remove an empty ifdef.

This patch removes an empty ifdef from inet_frag_intern()
in net/ipv4/inet_fragment.c.

commit b67bfe0d42cac56c512dd5da4b1b347a23f4b70a
(hlist: drop the node parameter from iterators) removed hlist from
net/ipv4/inet_fragment.c, but did not remove the enclosing ifdef command,
which is now empty.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

htb: refactor struct htb_sched fields for performance

htb_sched structures are big, and source of false sharing on SMP.

Every time a packet is queued or dequeue, many cache lines must be
touched because structures are not lay out properly.

By carefully splitting htb_sched in two parts, and define sub structures
to increase data locality, we can improve performance dramatically on
SMP.

New htb_prio structure can also be used in htb_class to increase data
locality.

I got 26 % performance increase on a 24 threads machine, with 200
concurrent netperf in TCP_RR mode, using a HTB hierarchy of 4 classes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: introduce a per-route knob for quick ack

In previous discussions, I tried to find some reasonable heuristics
for delayed ACK, however this seems not possible, according to Eric:

"ACKS might also be delayed because of bidirectional
traffic, and is more controlled by the application
response time. TCP stack can not easily estimate it."

"ACK can be incredibly useful to recover from losses in
a short time.

The vast majority of TCP sessions are small lived, and we
send one ACK per received segment anyway at beginning or
retransmits to let the sender smoothly increase its cwnd,
so an auto-tuning facility wont help them that much."

and according to David:

"ACKs are the only information we have to detect loss.

And, for the same reasons that TCP VEGAS is fundamentally
broken, we cannot measure the pipe or some other
receiver-side-visible piece of information to determine
when it's "safe" to stretch ACK.

And even if it's "safe", we should not do it so that losses are
accurately detected and we don't spuriously retransmit.

The only way to know when the bandwidth increases is to
"test" it, by sending more and more packets until drops happen.
That's why all successful congestion control algorithms must
operate on explicited tested pieces of information.

Similarly, it's not really possible to universally know if
it's safe to stretch ACK or not."

It still makes sense to enable or disable quick ack mode like
what TCP_QUICK_ACK does.

Similar to TCP_QUICK_ACK option, but for people who can't
modify the source code and still wants to control
TCP delayed ACK behavior. As David suggested, this should belong
to per-path scope, since different pathes may want different
behaviors.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Rick Jones <rick.jones2@hp.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Convert __list_for_each use to list_for_each

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: use pdev->pm_cap instead of pci_find_capability(.., PCI_CAP_ID_PM)

Pci core has been saved pm cap register offset by pdev->pm_cap in pci_pm_init()
in init path. So we can use pdev->pm_cap instead of using
pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and simplified code.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Michael Chan <mchan@broadcom.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

amd8111e: use pdev->pm_cap instead of pci_find_capability(.., PCI_CAP_ID_PM)

Pci core has been saved pm cap register offset by pdev->pm_cap in pci_pm_init()
in init path. So we can use pdev->pm_cap instead of using
pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and simplified code.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: netdev@vger.kernel.org (open list:NETWORKING DRIVERS)
Signed-off-by: David S. Miller <davem@davemloft.net>

Bnx2x: remove redundant D0 power state set

Pci_enable_device() will set device power state to D0,
so it's no need to do it again in bnx2x_init_dev().
Also remove redundant PM Cap find code, because pci core
has been saved the pci device pm cap value.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add missing dependencies on NETDEVICES

ETRAX_ETHERNET selects ETHERNET and MII, which depend on NETDEVICES.
I don't think anything should select NETDEVICES, so make it a
dependency. It also doesn't need to select or depend on ETHERNET,
which has nothing to do with the Ethernet library functions.

BPCTL selects MII, which depends on NETDEVICES. But everything in the
drivers/staging/silicom directory is related to net devices, so make
NET_VENDOR_SILICOM depend on NETDEVICES and remove the now-redundant
dependencies on NET.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

at91_ether: Do not select NET_CORE

This has no dependency on any of the drivers under NET_CORE.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Move MII out from under NET_CORE and hide it

All drivers that select MII also need to select NET_CORE because MII
depends on it. This is a bit ridiculous because NET_CORE is just a
menu option that doesn't enable any code by itself.

There is also no need for it to be a visible option, since its users
all select it.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp:typo unset should be unsent

Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: trivial: make alb use bond_slave_has_mac()

Also, cleanup bond_alb_handle_active_change() from 2 identical ifs.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: use pci_vfs_assigned()/pci_num_vf() instead of be_find_vfs()

be_find_vfs() is no longer needed as the common PCI calls provide the same
functionality.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sit: fix an oops when IFLA_IPTUN_PROTO is not set

The use of this attribute has been added in 32b8a8e59c9c (sit: add IPv4 over
IPv4 support). It is optional, by default proto is IPPROTO_IPV6.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sock: adapt SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF

The current situation is that SOCK_MIN_RCVBUF is 2048 + sizeof(struct sk_buff))
while SOCK_MIN_SNDBUF is 2048. Since in both cases, skb->truesize is used for
sk_{r,w}mem_alloc accounting, we should have both sizes adjusted via defining a
TCP_SKB_MIN_TRUESIZE.

Further, as Eric Dumazet points out, the minimal skb truesize in transmit path is
SKB_TRUESIZE(2048) after commit f07d960df33c5 ("tcp: avoid frag allocation for
small frames"), and tcp_sendmsg() tries to limit skb size to half the congestion
window, meaning we try to build two skbs at minimum. Thus, having SOCK_MIN_SNDBUF
as 2048 can hit a small regression for some applications setting to low
SO_SNDBUF / SO_RCVBUF. Note that we define a TCP_SKB_MIN_TRUESIZE, because
SKB_TRUESIZE(2048) adds SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), but in
case of TCP skbs, the skb_shared_info is part of the 2048 bytes allocation for
skb->head.

The minor adaption in sk_stream_moderate_sndbuf() is to silence a warning by
using a typed max macro, as similarly done in SOCK_MIN_RCVBUF occurences, that
would appear otherwise.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

neigh: disallow un-init_net to change thresh of neigh

thresh and interval are global resources,
only init net can change them.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>