Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

net: sctp: fix incorrect type in gfp initializer

This fixes the following sparse warning:

  net/sctp/associola.c:1556:29: warning: incorrect type in initializer (different base types)
  net/sctp/associola.c:1556:29:    expected bool [unsigned] [usertype] preload
  net/sctp/associola.c:1556:29:    got restricted gfp_t

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: improve sctp_select_active_and_retran_path selection

In function sctp_select_active_and_retran_path(), we walk the
transport list in order to look for the two most recently used
ACTIVE transports (trans_pri, trans_sec). In case we didn't find
anything ACTIVE, we currently just camp on a possibly PF or
INACTIVE transport that is primary path; this behavior actually
dates back to linux-history tree of the very early days of
lksctp, and can yield a behavior that chooses suboptimal
transport paths.

Instead, be a bit more clever by reusing and extending the
recently introduced sctp_trans_elect_best() handler. In case
both transports are evaluated to have the same score resulting
from their states, break the tie by looking at: 1) transport
patch error count 2) last_time_heard value from each transport.

This is analogous to Nishida's Quick Failover draft [1],
section 5.1, 3:

  The sender SHOULD avoid data transmission to PF destinations.
  When all destinations are in either PF or Inactive state,
  the sender MAY either move the destination from PF to active
  state (and transmit data to the active destination) or the
  sender MAY transmit data to a PF destination. In the former
  scenario, (i) the sender MUST NOT notify the ULP about the
  state transition, and (ii) MUST NOT clear the destination's
  error counter. It is recommended that the sender picks the
  PF destination with least error count (fewest consecutive
  timeouts) for data transmission. In case of a tie (multiple PF
  destinations with same error count), the sender MAY choose the
  last active destination.

Thus for sctp_select_active_and_retran_path(), we keep track of
the best, if any, transport that is in PF state and in case no
ACTIVE transport has been found (hence trans_{pri,sec} is NULL),
we select the best out of the three: current primary_path and
retran_path as well as a possible PF transport.

The secondary may still camp on the original primary_path as
before. The change in sctp_trans_elect_best() with a more fine
grained tie selection also improves at the same time path selection
for sctp_assoc_update_retran_path() in case of non-ACTIVE states.

  [1] http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: migrate most recently used transport to ktime

Be more precise in transport path selection and use ktime
helpers instead of jiffies to compare and pick the better
primary and secondary recently used transports. This also
avoids any side-effects during a possible roll-over, and
could lead to better path decision-making.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: refactor active path selection

This patch just refactors and moves the code for the active
path selection into its own helper function outside of
sctp_assoc_control_transport() which is already big enough.
No functional changes here.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ktime: add ktime_after and ktime_before helper

Add two minimal helper functions analogous to time_before() and
time_after() that will later on both be needed by SCTP code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

target: Report correct response length for some commands

When an initiator sends an allocation length bigger than what its
command consumes, the target should only return the actual response data
and set the residual length to the unused part of the allocation length.

Add a helper function that command handlers (INQUIRY, READ CAPACITY,
etc) can use to do this correctly, and use this code to get the correct
residual for commands that don't use the full initiator allocation in the
handlers for READ CAPACITY, READ CAPACITY(16), INQUIRY, MODE SENSE and
REPORT LUNS.

This addresses a handful of failures as reported by Christophe with
the Windows Certification Kit:

http://permalink.gmane.org/gmane.linux.scsi.target.devel/6515

Signed-off-by: Roland Dreier <roland@purestorage.com>
Tested-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Merge branch 'mac802154'

Phoebe Buckheister says:

====================
Recent llsec code introduced a memory leak on decryption failures during rx.
This fixes said leak, and optimizes the receive loops for monitor and wpan
devices to only deliver skbs to devices that are actually up. Also changes a
dev_kfree_skb to kfree_skb when an invalid packet is dropped before being
pushed into the stack.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mac802154: don't deliver packets to devices that are down

Only one WPAN devices can be active at any given time, so only deliver
packets to that one interface that is actually up. Multiple monitors may
be up at any given time, but we don't have to deliver to monitors that
are down either.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mac802154: properly free incoming skbs on decryption failure

mac802154 RX did not free skbs on decryption failure, assuming that the
caller would when the local rx handler returned _DROP. This was false.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ACPI / hotplug / PCI: Add hotplug contexts to PCI host bridges

After relatively recent changes in the ACPI-based PCI hotplug
(ACPIPHP) code, the acpiphp_check_host_bridge() executed for PCI
host bridges via acpi_pci_root_scan_dependent() doesn't do anything
useful, because those bridges do not have hotplug contexts. That
happens by mistake, so fix it by making acpiphp_enumerate_slots()
add hotplug contexts to PCI host bridges too and modify
acpiphp_remove_slots() to drop those contexts for host bridges
as appropriate.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=76901
Fixes: 2d8b1d566a5f (ACPI / hotplug / PCI: Get rid of check_sub_bridges())
Reported-and-tested-by: Gavin Guo <gavin.guo@canonical.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

target/sbc: Check that the LBA and number of blocks are correct in VERIFY

This patch extracts LBA + sectors for VERIFY, and adds a goto check_lba
to perform the end-of-device checking.

(Update patch to drop lba_check usage - nab)

Signed-off-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target/sbc: Remove sbc_check_valid_sectors()

A similar check is performed at the end of sbc_parse_cdb() and is now
enforced if the SYNCHRONIZE CACHE command's backend supports
->execute_sync_cache().

(Add check_lba goto to avoid *_max_sectors checks - nab)

Signed-off-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Target/iscsi: Fix sendtargets response pdu for iser transport

In case the transport is iser we should not include the
iscsi target info in the sendtargets text response pdu.
This causes sendtargets response to include the target
info twice.

Modify iscsit_build_sendtargets_response to filter
transport types that don't match.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Target/iser: Fix a wrong dereference in case discovery session is over iser

In case the discovery session is carried over iser, we can't
access the assumed network portal since the default portal is
used. In this case we don't really need to allocate the fastreg
pool, just prepare to the text pdu that will follow.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reported-by: Alex Tabachnik <alext@mellanox.com>
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Merge tag 'soc2-for-3.16' of git://git./linux/kernel/git/arm/arm-soc

Pull part two of ARM SoC updates from Arnd Bergmann:
"This is a small follow-up to the larger ARM SoC updates merged last
  week, almost entirely for the keystone platform.

  The main change here is to use the new dma-ranges parsing code that
  came in through Russell's ARM tree.  This allows the keystone platform
  to do cache-coherent DMA and to finally support all the available
  physical memory when LPAE is enabled.

  Aside from this, the keystone reset driver has been rewritten, and
  there is a small bug fix to allow building the orion5x platform again"

* tag 'soc2-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: keystone: Drop use of meminfo since its not available anymore
  ARM: orion5x: fix mvebu_mbus_dt_init call
  ARM: configs: keystone: enable reset driver support
  ARM: dts: keystone: update reset node to work with reset driver
  ARM: keystone: remove redundant reset stuff
  ARM: keystone: Update the dma offset for non-dt platform devices
  ARM: keystone: Switch over to coherent memory address space
  ARM: configs: keystone: add MTD_SPI_NOR (new dependency for M25P80)
  ARM: configs: keystone: drop CONFIG_COMMON_CLK_DEBUG

Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs

Pull reiserfs and ext3 changes from Jan Kara:
"Big reiserfs cleanup from Jeff, an ext3 deadlock fix, and some small
  cleanups"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (34 commits)
  reiserfs: Fix compilation breakage with CONFIG_REISERFS_CHECK
  ext3: Fix deadlock in data=journal mode when fs is frozen
  reiserfs: call truncate_setsize under tailpack mutex
  fs/jbd/revoke.c: replace shift loop by ilog2
  reiserfs: remove obsolete __constant_cpu_to_le32
  reiserfs: balance_leaf refactor, split up balance_leaf_when_delete
  reiserfs: balance_leaf refactor, format balance_leaf_finish_node
  reiserfs: balance_leaf refactor, format balance_leaf_new_nodes_paste
  reiserfs: balance_leaf refactor, format balance_leaf_paste_right
  reiserfs: balance_leaf refactor, format balance_leaf_insert_right
  reiserfs: balance_leaf refactor, format balance_leaf_paste_left
  reiserfs: balance_leaf refactor, format balance_leaf_insert_left
  reiserfs: balance_leaf refactor, pull out balance_leaf{left, right, new_nodes, finish_node}
  reiserfs: balance_leaf refactor, pull out balance_leaf_finish_node_paste
  reiserfs: balance_leaf refactor pull out balance_leaf_finish_node_insert
  reiserfs: balance_leaf refactor, pull out balance_leaf_new_nodes_paste
  reiserfs: balance_leaf refactor, pull out balance_leaf_new_nodes_insert
  reiserfs: balance_leaf refactor, pull out balance_leaf_paste_right
  reiserfs: balance_leaf refactor, pull out balance_leaf_insert_right
  reiserfs: balance_leaf refactor, pull out balance_leaf_paste_left
  ...

PCI/MSI: Fix memory leak in free_msi_irqs()

free_msi_irqs() is leaking memory, since list_for_each_entry(entry,
&dev->msi_list, list) {...} is never executed, because dev->msi_list is
made empty by the loop just above this one.

Fix it by relying on zero termination of attribute array like
populate_msi_sysfs() does.

Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: stable@vger.kernel.org # v3.14+

Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs updates from Chris Mason:
"The biggest change here is Josef's rework of the btrfs quota
  accounting, which improves the in-memory tracking of delayed extent
  operations.

  I had been working on Btrfs stack usage for a while, mostly because it
  had become impossible to do long stress runs with slab, lockdep and
  pagealloc debugging turned on without blowing the stack.  Even though
  you upgraded us to a nice king sized stack, I kept most of the
  patches.

  We also have some very hard to find corruption fixes, an awesome sysfs
  use after free, and the usual assortment of optimizations, cleanups
  and other fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (80 commits)
  Btrfs: convert smp_mb__{before,after}_clear_bit
  Btrfs: fix scrub_print_warning to handle skinny metadata extents
  Btrfs: make fsync work after cloning into a file
  Btrfs: use right type to get real comparison
  Btrfs: don't check nodes for extent items
  Btrfs: don't release invalid page in btrfs_page_exists_in_range()
  Btrfs: make sure we retry if page is a retriable exception
  Btrfs: make sure we retry if we couldn't get the page
  btrfs: replace EINVAL with EOPNOTSUPP for dev_replace raid56
  trivial: fs/btrfs/ioctl.c: fix typo s/substract/subtract/
  Btrfs: fix leaf corruption after __btrfs_drop_extents
  Btrfs: ensure btrfs_prev_leaf doesn't miss 1 item
  Btrfs: fix clone to deal with holes when NO_HOLES feature is enabled
  btrfs: free delayed node outside of root->inode_lock
  btrfs: replace EINVAL with ERANGE for resize when ULLONG_MAX
  Btrfs: fix transaction leak during fsync call
  btrfs: Avoid trucating page or punching hole in a already existed hole.
  Btrfs: update commit root on snapshot creation after orphan cleanup
  Btrfs: ioctl, don't re-lock extent range when not necessary
  Btrfs: avoid visiting all extent items when cloning a range
  ...

Merge tag 'xfs-for-linus-3.16-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs updates from Dave Chinner:
"This update contains:
   - cleanup removing unused function args
   - rework of the filestreams allocator to use dentry cache parent
     lookups
   - new on-disk free inode btree and optimised inode allocator
   - various bug fixes
   - rework of internal attribute API
   - cleanup of superblock feature bit support to remove historic cruft
   - more fixes and minor cleanups
   - added a new directory/attribute geometry abstraction
   - yet more fixes and minor cleanups"

* tag 'xfs-for-linus-3.16-rc1' of git://oss.sgi.com/xfs/xfs: (86 commits)
  xfs: fix xfs_da_args sparse warning in xfs_readdir
  xfs: Fix rounding in xfs_alloc_fix_len()
  xfs: tone down writepage/releasepage WARN_ONs
  xfs: small cleanup in xfs_lowbit64()
  xfs: kill xfs_buf_geterror()
  xfs: xfs_readsb needs to check for magic numbers
  xfs: block allocation work needs to be kswapd aware
  xfs: remove redundant geometry information from xfs_da_state
  xfs: replace attr LBSIZE with xfs_da_geometry
  xfs: pass xfs_da_args to xfs_attr_leaf_newentsize
  xfs: use xfs_da_geometry for block size in attr code
  xfs: remove mp->m_dir_geo from directory logging
  xfs: reduce direct usage of mp->m_dir_geo
  xfs: move node entry counts to xfs_da_geometry
  xfs: convert dir/attr btree threshold to xfs_da_geometry
  xfs: convert m_dirblksize to xfs_da_geometry
  xfs: convert m_dirblkfsbs to xfs_da_geometry
  xfs: convert directory segment limits to xfs_da_geometry
  xfs: convert directory db conversion to xfs_da_geometry
  xfs: convert directory dablk conversion to xfs_da_geometry
  ...

i40e/i40evf: Bump i40e to version 0.4.10 and i40evf to 0.9.34

Bump versions.

Change-ID: Ic4a84354955061ca18321b1e97c9c30fe1563b5c
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: use stored base_queue value

No need to read the PCI register for the PF's base queue on every single Tx
queue enable and disable as we already have the value stored from reading
the capability features at startup.

Change-ID: Ic02fb622757742f43cb8269369c3d972d4f66555
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Fix a bug in ethtool for FD drop packet filter action

A drop action comes down as a ring_cookie value, so allow it as
a special value that can be used to configure destination control.

Also fix the output to filter read command accordingly.

Change-ID: I9956723cee42f3194885403317dd21ed4a151144
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Add Flow director stats to PF stats

Add members to stat struct to keep track of Flow director ATR and
SideBand filter packet matches.

Change-ID: Ibbb31a53c7adcc2bb96991dd80565442a2f2513c
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: remove FTYPE

This change drops the FTYPE field from the Rx descriptor, to
match the hardware implementation.

Change-ID: I66d31d2b43861da45e8ace4fb03df033abe88bab
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40evf: check admin queue error bits

FW can indicate any admin queue error states to the driver via some bits
in the length registers. Each time we process an admin queue message,
check these bits and log any errors we find. Since the VF really can't
do much, we just print the message and depend on the PF driver to clear
things up on our behalf.

Change-ID: I92bc6c53ce3b4400544e0ca19c5de2d27490bd0d
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: User ether_addr_copy instead of memcpy

Linux gives us a function to copy Ethernet MAC addresses, let's use it.

Change-ID: I0c861900029ca5ea65a53ca39565852fb633f6fd
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Do not accept tagged packets by default

Remove the filter created by the firmware with the default MAC address it
reads out of the NVM storage and a promiscuous VLAN tag and replace it
with a filter that will not accept tagged packets by default. The system
must request a VLAN tag packet filter to get packets with that tag.

Change-ID: I119e6c3603a039bd68282ba31bf26f33a575490a
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Separate out DCB capability and enabled flags

Currently if the firmware reports DCB capability the driver enables
I40E_FLAG_DCB_ENABLED flag. When this flag is enabled the driver
inserts a tag when transmitting a packet from the port even if there
are no DCB traffic classes configured at the port.

This patch adds a new flag I40E_FLAG_DCB_CAPABLE that will be set
when the DCB capability is present and the existing flag
I40E_FLAG_DCB_ENABLED will be set only if there are more than one
traffic classes configured at the port.

Change-ID: I24ccbf53ef293db2eba80c8a9772acf729795bd5
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40evf: don't go further down

If the device is down, there's no place to go but up, so don't try to go
down even more. This prevents a CPU soft lock in napi_disable().

Change-ID: I8b058b9ee974dfa01c212fae2597f4f54b333314
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Change the notion of src and dst for FD_SB in ethtool

In XL710 devices we program FD filter's fields from Tx perspective of the flow.
However the user interface exposed in ethtool should be compliant with the
previous generation of drivers where a filter src and dst field are from
the RX perspective. This patch changes the ethtool interface in this regard
to match the other drivers.

Change-ID: Iec6ccddd87357c4fb53ccf33aa0fae699faf70cf
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: AdminQ API update for new FW

Add set_pf_context, replace set_phy_reset with set_phy_debug, add
nvm_config_read/write, remove nvm_read/write_reg_se and add some
PHY types.

With these changes we bump the API version to 1.2.

Change-ID: I4dc3aec175c2316f66fc9b726b3f7d594699d84e
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: set headwb Tx context flags and use them

Set appropriate fields in Tx queue configuration virtchnl message
to pf to enable headwb and setup headwb addr.
Then use that info from the VF to set headwb and headwb_addr instead of
always enabling them.

Change-ID: I7d393d1b2b07f0f3355b3a4f7c2d3c6ee3b0d622
Signed-off-by: Ashish Shah <ashish.n.shah@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: separate hardware setting from the set_ts_config ioctl

This patch separates the hardware logic from the set function, so that
we can re-use it during a ptp_reset. This enables the reset to return
functionality to the last known timestamp mode, rather than resetting
the value. We initialize the mode to off during the ptp_init cycle.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: unhide invariant returns

Return a 0 directly rather than a constant.

Reported-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block layer fixes from Jens Axboe:
"Final small batch of fixes to be included before -rc1.  Some general
  cleanups in here as well, but some of the blk-mq fixes we need for the
  NVMe conversion and/or scsi-mq.  The pull request contains:

   - Support for not merging across a specified "chunk size", if set by
     the driver.  Some NVMe devices perform poorly for IO that crosses
     such a chunk, so we need to support it generically as part of
     request merging avoid having to do complicated split logic.  From
     me.

   - Bump max tag depth to 10Ki tags.  Some scsi devices have a huge
     shared tag space.  Before we failed with EINVAL if a too large tag
     depth was specified, now we truncate it and pass back the actual
     value.  From me.

   - Various blk-mq rq init fixes from me and others.

   - A fix for enter on a dying queue for blk-mq from Keith.  This is
     needed to prevent oopsing on hot device removal.

   - Fixup for blk-mq timer addition from Ming Lei.

   - Small round of performance fixes for mtip32xx from Sam Bradshaw.

   - Minor stack leak fix from Rickard Strandqvist.

   - Two __init annotations from Fabian Frederick"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: add __init to blkcg_policy_register
  block: add __init to elv_register
  block: ensure that bio_add_page() always accepts a page for an empty bio
  blk-mq: add timer in blk_mq_start_request
  blk-mq: always initialize request->start_time
  block: blk-exec.c: Cleaning up local variable address returnd
  mtip32xx: minor performance enhancements
  blk-mq: ->timeout should be cleared in blk_mq_rq_ctx_init()
  blk-mq: don't allow queue entering for a dying queue
  blk-mq: bump max tag depth to 10K tags
  block: add blk_rq_set_block_pc()
  block: add notion of a chunk size for request merging

Merge tag 'for-linus-20140610' of git://git.infradead.org/linux-mtd

Pull MTD updates from Brian Norris:
- refactor m25p80.c driver for use as a general SPI NOR framework for
   other drivers which may speak to SPI NOR flash without providing full
   SPI support (i.e., not part of drivers/spi/)
- new Freescale QuadSPI driver (utilizing new SPI NOR framework)
- updates for the STMicro "FSM" SPI NOR driver
- fix sync/flush behavior on mtd_blkdevs
- fixup subpage write support on a few NAND drivers
- correct the MTD OOB test for odd-sized OOB areas
- add BCH-16 support for OMAP NAND
- fix warnings and trivial refactoring
- utilize new ECC DT bindings in pxa3xx NAND driver
- new LPDDR NVM driver
- address a few assorted bugs caught by Coverity
- add new imx6sx support for GPMI NAND
- use a bounce buffer for NAND when non-DMA-able buffers are used

* tag 'for-linus-20140610' of git://git.infradead.org/linux-mtd: (77 commits)
  mtd: gpmi: add gpmi support for imx6sx
  mtd: maps: remove check for CONFIG_MTD_SUPERH_RESERVE
  mtd: bf5xx_nand: use the managed version of kzalloc
  mtd: pxa3xx_nand: make the driver work on big-endian systems
  mtd: nand: omap: fix omap_calculate_ecc_bch() for-loop error
  mtd: nand: r852: correct write_buf loop bounds
  mtd: nand_bbt: handle error case for nand_create_badblock_pattern()
  mtd: nand_bbt: remove unused variable
  mtd: maps: sc520cdp: fix warnings
  mtd: slram: fix unused variable warning
  mtd: pfow: remove unused variable
  mtd: lpddr: fix Kconfig dependency, for I/O accessors
  mtd: nand: pxa3xx: Add supported ECC strength and step size to the DT binding
  mtd: nand: pxa3xx: Use ECC strength and step size devicetree binding
  mtd: nand: pxa3xx: Clean pxa_ecc_init() error handling
  mtd: nand: Warn the user if the selected ECC strength is too weak
  mtd: nand: omap: Documentation: How to select correct ECC scheme for your device ?
  mtd: nand: omap: add support for BCH16_ECC - NAND driver updates
  mtd: nand: omap: add support for BCH16_ECC - ELM driver updates
  mtd: nand: omap: add support for BCH16_ECC - GPMC driver updates
  ...

Merge tag 'md/3.16' of git://neil.brown.name/md

Pull md updates from Neil Brown:
"Assorted md fixes for 3.16

  Mostly performance improvements with a few corner-case bug fixes"

* tag 'md/3.16' of git://neil.brown.name/md:
  raid5: speedup sync_request processing
  md/raid5: deadlock between retry_aligned_read with barrier io
  raid5: add an option to avoid copy data from bio to stripe cache
  md/bitmap: remove confusing code from filemap_get_page.
  raid5: avoid release list until last reference of the stripe
  md: md_clear_badblocks should return an error code on failure.
  md/raid56: Don't perform reads to support writes until stripe is ready.
  md: refuse to change shape of array if it is active but read-only

reiserfs: Fix compilation breakage with CONFIG_REISERFS_CHECK

There was a bug in debug printout when CONFIG_REISERFS_CHECK was
enabled so one of the assertions in do_balan.c didn't compile. Fix it.

Fixes: 0080e9f9d3ac717537dbd6db1fc8ef72ce0b9cc1
Signed-off-by: Jan Kara <jack@suse.cz>

Merge tag 'sunxi-clk-for-3.16-2' of https://github.com/mripard/linux into clk-next

Rebase of Emilio's clk-sunxi-for-3.16 on top of clk-next

Fixed a few compilation warnings exposed by a patch introduced during the 3.16
merge window.

Original tag message:

Allwinner sunXi SoCs clock changes

This pull contains some new code to add support for A31 clocks by Maxime
and Boris. It also reworks the driver a bit to avoid having a huge
single file when we have a full folder for ourselves, and separating
different functional units makes sense.

powerpc/book3s: Fix guest MC delivery mechanism to avoid soft lockups in guest.

Currently we forward MCEs to guest which have been recovered by guest.
And for unhandled errors we do not deliver the MCE to guest. It looks like
with no support of FWNMI in qemu, guest just panics whenever we deliver the
recovered MCEs to guest. Also, the existig code used to return to host for
unhandled errors which was casuing guest to hang with soft lockups inside
guest and makes it difficult to recover guest instance.

This patch now forwards all fatal MCEs to guest causing guest to crash/panic.
And, for recovered errors we just go back to normal functioning of guest
instead of returning to host. This fixes soft lockup issues in guest.
This patch also fixes an issue where guest MCE events were not logged to
host console.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/book3s: Increment the mce counter during machine_check_early call.

We don't see MCE counter getting increased in /proc/interrupts which gives
false impression of no MCE occurred even when there were MCE events.
The machine check early handling was added for PowerKVM and we missed to
increment the MCE count in the early handler.

We also increment mce counters in the machine_check_exception call, but
in most cases where we handle the error hypervisor never reaches there
unless its fatal and we want to crash. Only during fatal situation we may
see double increment of mce count. We need to fix that. But for
now it always good to have some count increased instead of zero.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/book3s: Add stack overflow check in machine check handler.

Currently machine check handler does not check for stack overflow for
nested machine check. If we hit another MCE while inside the machine check
handler repeatedly from same address then we get into risk of stack
overflow which can cause huge memory corruption. This patch limits the
nested MCE level to 4 and panic when we cross level 4.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/book3s: Fix machine check handling for unhandled errors

Current code does not check for unhandled/unrecovered errors and return from
interrupt if it is recoverable exception which in-turn triggers same machine
check exception in a loop causing hypervisor to be unresponsive.

This patch fixes this situation and forces hypervisor to panic for
unhandled/unrecovered errors.

This patch also fixes another issue where unrecoverable_exception routine
was called in real mode in case of unrecoverable exception (MSR_RI = 0).
This causes another exception vector 0x300 (data access) during system crash
leading to confusion while debugging cause of the system crash.

Also turn ME bit off while going down, so that when another MCE is hit during
panic path, system will checkstop and hypervisor will get restarted cleanly
by SP.

With the above fixes we now throw correct console messages (see below) while
crashing the system in case of unhandled/unrecoverable machine checks.

--------------
Severe Machine check interrupt [[Not recovered]
  Initiator: CPU
  Error type: UE [Instruction fetch]
    Effective address: 0000000030002864
Oops: Machine check, sig: 7 [#1]
SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: bork(O) bridge stp llc kvm [last unloaded: bork]
CPU: 36 PID: 55162 Comm: bash Tainted: G           O 3.14.0mce #1
task: c000002d72d022d0 ti: c000000007ec0000 task.ti: c000002d72de4000
NIP: 0000000030002864 LR: 00000000300151a4 CTR: 000000003001518c
REGS: c000000007ec3d80 TRAP: 0200   Tainted: G           O  (3.14.0mce)
MSR: 9000000000041002 <SF,HV,ME,RI>  CR: 28222848  XER: 20000000
CFAR: 0000000030002838 DAR: d0000000004d0000 DSISR: 00000000 SOFTE: 1
GPR00: 000000003001512c 0000000031f92cb0 0000000030078af0 0000000030002864
GPR04: d0000000004d0000 0000000000000000 0000000030002864 ffffffffffffffc9
GPR08: 0000000000000024 0000000030008af0 000000000000002c c00000000150e728
GPR12: 9000000000041002 0000000031f90000 0000000010142550 0000000040000000
GPR16: 0000000010143cdc 0000000000000000 00000000101306fc 00000000101424dc
GPR20: 00000000101424e0 000000001013c6f0 0000000000000000 0000000000000000
GPR24: 0000000010143ce0 00000000100f6440 c000002d72de7e00 c000002d72860250
GPR28: c000002d72860240 c000002d72ac0038 0000000000000008 0000000000040000
NIP [0000000030002864] 0x30002864
LR [00000000300151a4] 0x300151a4
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 7285f0beac1e29d3 ]---

Sending IPI to other CPUs
IPI complete
OPAL V3 detected !
--------------

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/eeh: Dump PE location code

As Ben suggested, it's meaningful to dump PE's location code
for site engineers when hitting EEH errors. The patch introduces
function eeh_pe_loc_get() to retireve the location code from
dev-tree so that we can output it when hitting EEH errors.

If primary PE bus is root bus, the PHB's dev-node would be tried
prior to root port's dev-node. Otherwise, the upstream bridge's
dev-node of the primary PE bus will be check for the location code
directly.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

clk: sunxi: document PRCM clock compatible strings

Document new compatible strings for clock provided by the PRCM
(Power/Reset/Clock Management) unit.

Signed-off-by: Boris BREZILLON <boris.brezillon@free-electrons.com>
Signed-off-by: Emilio López <emilio@elopez.com.ar>

clk: sunxi: add PRCM (Power/Reset/Clock Management) clks support

The PRCM (Power/Reset/Clock Management) unit provides several clock
devices:
- AR100 clk: used to clock the Power Management co-processor
- AHB0 clk: used to clock the AHB0 bus
- APB0 clk and gates: used to clk peripherals connected to the APB0 bus

Add support for these clks in a separate driver so that they can be probed
as platform devices instead of registered during early init.
This is needed to be able to probe PRCM MFD subdevices.

Signed-off-by: Boris BREZILLON <boris.brezillon@free-electrons.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Emilio López <emilio@elopez.com.ar>

clk: sun6i: Protect SDRAM gating bit

Prevent the SDRAM controller from being gated by force-enabling it in the
machine code.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Acked-by: Mike Turquette <mturquette@linaro.org>
Signed-off-by: Emilio López <emilio@elopez.com.ar>

clk: sun6i: Protect CPU clock

Right now, AHB is an indirect child clock of the CPU clock. If that
happens to change, since the CPU clock has no other consumers declared
in Linux, it would be shut down, which is not really a good idea.

Prevent this by forcing it enabled.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Acked-by: Mike Turquette <mturquette@linaro.org>
Signed-off-by: Emilio López <emilio@elopez.com.ar>

clk: sunxi: Rework clock protection code

Since we start to have a lot of clocks to protect, some of them in a
few SoCs only, it becomes difficult to handle the clock protection
without having to add per machine exceptions.

Add per-SoC data to tell which clock to leave enabled.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Acked-by: Mike Turquette <mturquette@linaro.org>
Signed-off-by: Emilio López <emilio@elopez.com.ar>

clk: sunxi: Move the GMAC clock to a file of its own

Since we have a folder of our own, we can actually make use of it by
splitting the huge clock file into several sub drivers.

The gmac clock is pretty easy to deal with, since it's pretty much
isolated and doesn't have any dependency on the other clocks.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Acked-by: Mike Turquette <mturquette@linaro.org>
Signed-off-by: Emilio López <emilio@elopez.com.ar>

clk: sunxi: Move the 24M oscillator to a file of its own

Since we have a folder of our own, we can actually make use of it by
splitting the huge clock file into several sub drivers.

The main oscillator is pretty easy to deal with, since it's pretty much
isolated.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Acked-by: Mike Turquette <mturquette@linaro.org>
Signed-off-by: Emilio López <emilio@elopez.com.ar>

clk: sunxi: Remove calls to clk_put

Callers of clk_put must disable the clock first. This also means that
as long as the clock is enabled the driver should hold a reference to
that clock. Hence, the call to clk_put here are bogus and should be
removed.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Acked-by: Mike Turquette <mturquette@linaro.org>
Signed-off-by: Emilio López <emilio@elopez.com.ar>

clk: sunxi: document new A31 USB clock compatible

Support for the USB gates and resets on A31 has been recently added
using a new compatible, so let's document it here.

Signed-off-by: Emilio López <emilio@elopez.com.ar>

clk: sunxi: Implement A31 USB clock

The A31 USB clock slightly differ from its older counterparts, mostly
because it has a different gate for each PHY, while the older one had
a single gate for all the phy.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Mike Turquette <mturquette@linaro.org>
Signed-off-by: Emilio López <emilio@elopez.com.ar>

amd-xgbe: Rename MAX_DMA_CHANNELS to avoid powerpc conflict

MAX_DMA_CHANNELS is defined in asm/scatterlist.h of the powerpc
architecture. Rename this #define in xgbe.h to avoid the
redefined warning issued during compilation.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix UDP tunnel GSO of frag_list GRO packets

This patch fixes a kernel BUG_ON in skb_segment. It is hit when
testing two VMs on openvswitch with one VM acting as VXLAN gateway.

During VXLAN packet GSO, skb_segment is called with skb->data
pointing to inner TCP payload. skb_segment calls skb_network_protocol
to retrieve the inner protocol. skb_network_protocol actually expects
skb->data to point to MAC and it calls pskb_may_pull with ETH_HLEN.
This ends up pulling in ETH_HLEN data from header tail. As a result,
pskb_trim logic is skipped and BUG_ON is hit later.

Move skb_push in front of skb_network_protocol so that skb->data
lines up properly.

kernel BUG at net/core/skbuff.c:2999!
Call Trace:
[<ffffffff816ac412>] tcp_gso_segment+0x122/0x410
[<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390
[<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170
[<ffffffff816b3658>] skb_udp_tunnel_segment+0xd8/0x390
[<ffffffff816b3c00>] udp4_ufo_fragment+0x120/0x140
[<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390
[<ffffffff8109d742>] ? default_wake_function+0x12/0x20
[<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170
[<ffffffff8164b4d0>] __skb_gso_segment+0x60/0xc0
[<ffffffff8164b6b3>] dev_hard_start_xmit+0x183/0x550
[<ffffffff8166c91e>] sch_direct_xmit+0xfe/0x1d0
[<ffffffff8164bc94>] __dev_queue_xmit+0x214/0x4f0
[<ffffffff8164bf90>] dev_queue_xmit+0x10/0x20
[<ffffffff81687edb>] ip_finish_output+0x66b/0x890
[<ffffffff81688a58>] ip_output+0x58/0x90
[<ffffffff816c628f>] ? fib_table_lookup+0x29f/0x350
[<ffffffff816881c9>] ip_local_out_sk+0x39/0x50
[<ffffffff816cbfad>] iptunnel_xmit+0x10d/0x130
[<ffffffffa0212200>] vxlan_xmit_skb+0x1d0/0x330 [vxlan]
[<ffffffffa02a3919>] vxlan_tnl_send+0x129/0x1a0 [openvswitch]
[<ffffffffa02a2cd6>] ovs_vport_send+0x26/0xa0 [openvswitch]
[<ffffffffa029931e>] do_output+0x2e/0x50 [openvswitch]

Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6: Fixed up ipsec packet be re-routing issue

Bug report on https://bugzilla.kernel.org/show_bug.cgi?id=75781

When a local output ipsec packet match the mangle table rule,
and be set mark value, the packet will be route again in
route_me_harder -> _session_decoder6

In this case, the nhoff in CB of skb was still the default
value 0. So the protocal match can't success and the packet can't match
correct SA rule,and then the packet be send out in plaintext.

To fixed up the issue. The CB->nhoff must be set.

Signed-off-by: Hui Zhang <huizhang@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

farsync: Fix confusion about DMA address and buffer offset types

Use dma_addr_t for DMA address parameters and u32 for shared memory
offset parameters.

Do not assume that dma_addr_t is the same as unsigned long; it will
not be in PAE configurations. Truncate DMA addresses to 32 bits when
printing them. This is OK because the DMA mask for this device is
32-bit (per default).

Also rename the DMA address parameters from 'skb' to 'dma'.

Compile-tested only.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip_tunnel: fix i_key matching in ip_tunnel_find

Some tunnels (though only vti as for now) can use i_key just for internal use:
for example vti uses it for fwmark'ing incoming packets. So raw i_key value
shouldn't be treated as a distinguisher for them. ip_tunnel_key_match exists for
cases when we want to compare two ip_tunnel_parms' i_keys.

Example bug:
ip link add type vti ikey 1 local 1.0.0.1 remote 2.0.0.2
ip link add type vti ikey 2 local 1.0.0.1 remote 2.0.0.2
spawned two tunnels, although it doesn't make sense.

Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx4'

Or Gerlitz says:

====================
mlx4 SRIOV fixes

The patch from Wei Yang is a designed fix to a regression introduced by earlier commit
of him. Jack added a fix to the resource management which we got from IBM.

Let's get that into 3.16-rc1 1st and later see to what stable version/s this should go.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Keep only one driver entry release mlx4_priv

Following commit befdf89 "net/mlx4_core: Preserve pci_dev_data after
__mlx4_remove_one()", there are two mlx4 pci callbacks which will
attempt to release the mlx4_priv object -- .shutdown and .remove.

This leads to a use-after-free access to the already freed mlx4_priv
instance and trigger a "Kernel access of bad area" crash when both
.shutdown and .remove are called.

During reboot or kexec, .shutdown is called, with the VFs probed to
the host going through shutdown first and then the PF. Later, the PF
will trigger VFs' .remove since VFs still have driver attached.

Fix that by keeping only one driver entry which releases mlx4_priv.

Fixes: befdf89 ('net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()')
CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Fix SRIOV free-pool management when enforcing resource quotas

The Hypervisor driver tracks free slots and reserved slots at the global level
and tracks allocated slots and guaranteed slots per VF.

Guaranteed slots are treated as reserved by the driver, so the total
reserved slots is the sum of all guaranteed slots over all the VFs.

As VFs allocate resources, free (global) is decremented and allocated (per VF)
is incremented for those resources. However, reserved (global) is never changed.

This means that effectively, when a VF allocates a resource from its
guaranteed pool, it is actually reducing that resource's free pool (since
the global reserved count was not also reduced).

The fix for this problem is the following: For each resource, as long as a
VF's allocated count is <= its guaranteed number, when allocating for that
VF, the reserved count (global) should be reduced by the allocation as well.

When the global reserved count reaches zero, the remaining global free count
is still accessible as the free pool for that resource.

When the VF frees resources, the reverse happens: the global reserved count
for a resource is incremented only once the VFs allocated number falls below
its guaranteed number.

This fix was developed by Rick Kready <kready@us.ibm.com>

Reported-by: Rick Kready <kready@us.ibm.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip_vti: Fix 'ip tunnel add' with 'key' parameters

ip tunnel add remote 10.2.2.1 local 10.2.2.2 mode vti ikey 1 okey 2
translates to p->iflags = VTI_ISVTI|GRE_KEY and p->i_key = 1, but GRE_KEY !=
TUNNEL_KEY, so ip_tunnel_ioctl would set i_key to 0 (same story with o_key)
making us unable to create vti tunnels with [io]key via ip tunnel.

We cannot simply translate GRE_KEY to TUNNEL_KEY (as GRE module does) because
vti_tunnels with same local/remote addresses but different ikeys will be treated
as different then. So, imo the best option here is to move p->i_flags & *_KEY
check for vti tunnels from ip_tunnel.c to ip_vti.c and to think about [io]_mark
field for ip_tunnel_parm in the future.

Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wimax: i2400m: control.c: Cleaning up conjunction always evaluates to false

Logical conjunction always evaluates to false: minor < 2 && minor > 1
I guess what you wanted is rather: minor > 2 || minor < 1

This was partly found using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: toshiba: ps3_gelic_net.c: Cleaning up a check on a memory allocation

A check on a memory allocation is checked incorrectly.

This was partly found using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: fix unused variable compilation warning in phylib driver

Fix following compilation warning:
[...]
CC drivers/net/phy/amd-xgbe-phy.o
drivers/net/phy/amd-xgbe-phy.c:1353:30: warning:
‘amd_xgbe_phy_ids’ defined but not used [-Wunused-variable]
static struct mdio_device_id amd_xgbe_phy_ids[] = {
^
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: filter: fix nlattr and nlattr_nest BPF tests

- 'struct nlattr' must be 2 byte aligned
- provide big-endian input data for nlattr/nlattr_nest tests

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: filter: cleanup A/X name usage

The macro 'A' used in internal BPF interpreter:
#define A regs[insn->a_reg]
was easily confused with the name of classic BPF register 'A', since
'A' would mean two different things depending on context.

This patch is trying to clean up the naming and clarify its usage in the
following way:

- A and X are names of two classic BPF registers

- BPF_REG_A denotes internal BPF register R0 used to map classic register A
  in internal BPF programs generated from classic

- BPF_REG_X denotes internal BPF register R7 used to map classic register X
  in internal BPF programs generated from classic

- internal BPF instruction format:
struct sock_filter_int {
        __u8    code;           /* opcode */
        __u8    dst_reg:4;      /* dest register */
        __u8    src_reg:4;      /* source register */
        __s16   off;            /* signed offset */
        __s32   imm;            /* signed immediate constant */
};

- BPF_X/BPF_K is 1 bit used to encode source operand of instruction
In classic:
  BPF_X - means use register X as source operand
  BPF_K - means use 32-bit immediate as source operand
In internal:
  BPF_X - means use 'src_reg' register as source operand
  BPF_K - means use 32-bit immediate as source operand

Suggested-by: Chema Gonzalez <chema@google.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Chema Gonzalez <chema@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dns_resolver: assure that dns_query() result is null-terminated

dns_query() credulously assumes that keys are null-terminated and
returns a copy of a memory block that is off by one.

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

powerpc/powernv: Enable POWER8 doorbell IPIs

This patch enables POWER8 doorbell IPIs on powernv.

Since doorbells can only IPI within a core, we test to see when we can use
doorbells and if not we fall back to XICS. This also enables hypervisor
doorbells to wakeup us up from nap/sleep via the LPCR PECEDH bit.

Based on tests by Anton, the best case IPI latency between two threads dropped
from 894ns to 512ns.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/cpuidle: Only clear LPCR decrementer wakeup bit on fast sleep entry

Currently when entering fastsleep we clear all LPCR PECE bits.

This patch changes it to only clear the decrementer bit (ie. PECE1), which is
the only bit we really need to clear here. This is needed if we want to set
other wakeup causes like the PECEDH bit so we can use hypervisor doorbells on
powernv. Also we no longer clear the MER bit as it should never be set in the
host anyway.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/powernv: Fix killed EEH event

On PowerNV platform, EEH errors are reported by IO accessors or poller
driven by interrupt. After the PE is isolated, we won't produce EEH
event for the PE. The current implementation has possibility of EEH
event lost in this way:

The interrupt handler queues one "special" event, which drives the poller.
EEH thread doesn't pick the special event yet. IO accessors kicks in, the
frozen PE is marked as "isolated" and EEH event is queued to the list.
EEH thread runs because of special event and purge all existing EEH events.
However, we never produce an other EEH event for the frozen PE. Eventually,
the PE is marked as "isolated" and we don't have EEH event to recover it.

The patch fixes the issue to keep EEH events for PEs that have been
marked as "isolated" with the help of additional "force" help to
eeh_remove_event().

Reported-by: Rolf Brudeseth <rolfb@us.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: fix typo 'CONFIG_PMAC'

Commit b0d278b7d3ae ("powerpc/perf_event: Reduce latency of calling
perf_event_do_pending") added a check for CONFIG_PMAC were a check for
CONFIG_PPC_PMAC was clearly intended.

Fixes: b0d278b7d3ae ("powerpc/perf_event: Reduce latency of calling perf_event_do_pending")
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: fix typo 'CONFIG_PPC_CPU'

Commit cd64d1697cf0 ("powerpc: mtmsrd not defined") added a check for
CONFIG_PPC_CPU were a check for CONFIG_PPC_FPU was clearly intended.

Fixes: cd64d1697cf0 ("powerpc: mtmsrd not defined")
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/powernv: Don't escalate non-existing frozen PE

Commit cb5b242c ("powerpc/eeh: Escalate error on non-existing PE")
escalates the frozen state on non-existing PE to fenced PHB. It
was to improve kdump reliability. After that, commit 361f2a2a
("powrpc/powernv: Reset PHB in kdump kernel") was introduced to
issue complete reset on all PHBs to increase the reliability of
kdump kernel.

Commit cb5b242c becomes unuseful and it would be reverted.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/eeh: Report frozen parent PE prior to child PE

When we have the corner case of frozen parent and child PE at the
same time, we have to handle the frozen parent PE prior to the
child. Without clearning the frozen state on parent PE, the child
PE can't be recovered successfully.

The patch searches the EEH PE hierarchy tree and returns the toppest
frozen PE to be handled. It ensures the frozen parent PE will be
handled prior to child PE.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/eeh: Clear frozen state for child PE

Since commit cb523e09 ("powerpc/eeh: Avoid I/O access during PE
reset"), the PE is kept as frozen state on hardware level until
the PE reset is done completely. After that, we explicitly clear
the frozen state of the affected PE. However, there might have
frozen child PEs of the affected PE and we also need clear their
frozen state as well. Otherwise, the recovery is going to fail.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/powernv: Reduce panic timeout from 180s to 10s

We've already dropped the default pseries timeout to 10s, do
the same for powernv.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/xmon: avoid format string leaking to printk

This makes sure format strings cannot leak into printk (the string has
already been correctly processed for format arguments).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

selftests/powerpc: Add tests of PMU EBBs

The Power8 Performance Monitor Unit (PMU) has a new feature called Event
Based Branches (EBB). This commit adds tests of the kernel API for using
EBBs.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

selftests/powerpc: Add support for skipping tests

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

selftests/powerpc: Put the test in a separate process group

Allows us to kill the test and any children it has spawned.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

selftests/powerpc: Fix instruction loop for ABIv2 (LE)

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/perf: Ensure all EBB register state is cleared on fork()

In commit 330a1eb "Core EBB support for 64-bit book3s" I messed up
clear_task_ebb(). It clears some but not all of the task's Event Based
Branch (EBB) registers when we duplicate a task struct.

That allows a child task to observe the EBBHR & EBBRR of its parent,
which it should not be able to do.

Fix it by clearing EBBHR & EBBRR.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@vger.kernel.org [v3.11+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/powernv: Fix reading of OPAL msglog

memory_return_from_buffer returns a signed value, so ret should be
ssize_t.

Fixes the following issue reported by David Binderman:

  [linux-3.15/arch/powerpc/platforms/powernv/opal-msglog.c:65]: (style)
  Checking if unsigned variable 'ret' is less than zero.
  [linux-3.15/arch/powerpc/platforms/powernv/opal-msglog.c:82]: (style)
  Checking if unsigned variable 'ret' is less than zero.

  Local variable "ret" is of type size_t. This is always unsigned,
  so it is pointless to check if it is less than zero.

  https://bugzilla.kernel.org/show_bug.cgi?id=77551

Fixing this exposes a real bug for the case where the entire count
bytes is successfully read from the POS_WRAP case. The second
memory_read_from_buffer will return EINVAL, causing the entire read to
return EINVAL to userspace, despite the data being copied correctly. The
fix is to test for the case where the data has been read and return
early.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/spufs: Remove duplicate SPUFS_CNTL_MAP_SIZE define

The SPUFS_CNTL_MAP_SIZE define is cut and pasted twice so we can delete
the second instance.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/cpm: Remove duplicate FCC_GFMR_TTX define

The FCC_GFMR_TTX define is cut and pasted twice so we can remove the
second instance.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/powernv: Fix endianness problems in EEH

EEH information fetched from OPAL need fix before using in LE environment.
To be included in sparse's endian check, declare them as __beXX and
access them by accessors.

Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

crypto/nx: disable NX on little endian builds

The NX driver has endian issues so disable it for now.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powernv: Fix permissions on sysparam sysfs entries

Everyone can write to these files, which is not what we want.

Cc: stable@vger.kernel.org # 3.15
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/powernv : Disable subcore for UP configs

Build throws following errors when CONFIG_SMP=n
arch/powerpc/platforms/powernv/subcore.c: In function ‘cpu_update_split_mode’:
arch/powerpc/platforms/powernv/subcore.c:274:15: error: ‘setup_max_cpus’ undeclared (first use in this function)
arch/powerpc/platforms/powernv/subcore.c:285:5: error: lvalue required as left operand of assignment

'setup_max_cpus' variable is relevant only on SMP, so there is no point
working around it for UP. Furthermore, subcore itself is relevant only
on SMP and hence the better solution is to exclude subcore.o and
subcore-asm.o for UP builds.

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/powernv: Include asm/smp.h to fix UP build failure

Build throws following errors when CONFIG_SMP=n
arch/powerpc/platforms/powernv/setup.c: In function ‘pnv_kexec_wait_secondaries_down’:
arch/powerpc/platforms/powernv/setup.c:179:4: error: implicit declaration of function ‘get_hard_smp_processor_id’
rc = opal_query_cpu_status(get_hard_smp_processor_id(i),

The usage of get_hard_smp_processor_id() needs the declaration from
<asm/smp.h>. The file setup.c includes <linux/sched.h>, which in-turn
includes <linux/smp.h>. However, <linux/smp.h> includes <asm/smp.h>
only on SMP configs and hence UP builds fail.

Fix this by directly including <asm/smp.h> in setup.c unconditionally.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Don't setup CPUs with bad status

OPAL will mark a CPU that is guarded as "bad" in the status property of the CPU
node.

Unfortunatley Linux doesn't check this property and will put the bad CPU in the
present map. This has caused hangs on booting when we try to unsplit the core.

This patch checks the CPU is avaliable via this status property before putting
it in the present map.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Tested-by: Anton Blanchard <anton@samba.org>
cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Correct DSCR during TM context switch

Correct the DSCR SPR becoming temporarily corrupted if a task is
context switched during a transaction.

The problem occurs while suspending the task and is caused by saving
the DSCR to thread.dscr after it has already been set to the CPU's
default value:

__switch_to() calls __switch_to_tm()
which calls tm_reclaim_task()
which calls tm_reclaim_thread()
which calls tm_reclaim()
where the DSCR is set to the CPU's default
__switch_to() calls _switch()
where thread.dscr is set to the DSCR

When the task is resumed, it's transaction will be doomed (as usual)
and the DSCR SPR will be corrupted, although the checkpointed value
will be correct. Therefore the DSCR will be immediately corrected by
the transaction aborting, unless it has been suspended. In that case
the incorrect value can be seen by the task until it resumes the
transaction.

The fix is to treat the DSCR similarly to the TAR and save it early
in __switch_to().

A program exposing the problem is added to the kernel self tests as:
tools/testing/selftests/powerpc/tm/tm-resched-dscr.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
CC: <stable@vger.kernel.org> [v3.10+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Merge branch 'bridge_multicast_exports'

Linus Lüssing says:

====================
bridge: multicast snooping patches / exports

The first patch is simply a cosmetic patch. So far I (and maybe others
too?) have been regularly confusing these two structs, therefore I'd
suggest renaming them and therefore making the follow-up patches easier
to understand and nicer to fit in.

The second patch fixes a minor issue, but probably not worth for stable.

On the other hand the first two patches are also preparations for the
third and fourth patch:

These two patches are exporting functionality needed to marry the bridge
multicast snooping with the batman-adv multicast optimizations recently
added for the 3.15 kernel, allowing to use these optimzations in common
setups having a bridge on top of e.g. bat0, too. So far these bridged
setups would fall back to simple flooding through the batman-adv mesh
network for any multicast packet entering bat0.

More information about the batman-adv multicast optimizations currently
implemented can be found here:

http://www.open-mesh.org/projects/batman-adv/wiki/Basic-multicast-optimizations

The integration on the batman-adv side could afterwards look like this,
for instance:

http://git.open-mesh.org/batman-adv.git/commitdiff/576b59dd3e34737c702e548b21fa72059262f796?hp=f95ce7131746c65fbcdffcf2089cab59e2c2f7ac
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: memorize and export selected IGMP/MLD querier port

Adding bridge support to the batman-adv multicast optimization requires
batman-adv knowing about the existence of bridged-in IGMP/MLD queriers
to be able to reliably serve any multicast listener behind this same
bridge.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: add export of multicast database adjacent to net_dev

With this new, exported function br_multicast_list_adjacent(net_dev) a
list of IPv4/6 addresses is returned. This list contains all multicast
addresses sensed by the bridge multicast snooping feature on all bridge
ports of the bridge interface of net_dev, excluding addresses from the
specified net_device itself.

Adding bridge support to the batman-adv multicast optimization requires
batman-adv knowing about the existence of bridged-in multicast
listeners to be able to reliably serve them with multicast packets.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: adhere to querier election mechanism specified by RFCs

MLDv1 (RFC2710 section 6), MLDv2 (RFC3810 section 7.6.2), IGMPv2
(RFC2236 section 3) and IGMPv3 (RFC3376 section 6.6.2) specify that the
querier with lowest source address shall become the selected
querier.

So far the bridge stopped its querier as soon as it heard another
querier regardless of its source address. This results in the "wrong"
querier potentially becoming the active querier or a potential,
unnecessary querying delay.

With this patch the bridge memorizes the source address of the currently
selected querier and ignores queries from queriers with a higher source
address than the currently selected one. This slight optimization is
supposed to make it more RFC compliant (but is rather uncritical and
therefore probably not necessary to be queued for stable kernels).

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: rename struct bridge_mcast_query/querier

The current naming of these two structs is very random, in that
reversing their naming would not make any semantical difference.

This patch tries to make the naming less confusing by giving them a more
specific, distinguishable naming.

This is also useful for the upcoming patches reintroducing the
"struct bridge_mcast_querier" but for storing information about the
selected querier (no matter if our own or a foreign querier).

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipip, sit: fix ipv4_{update_pmtu,redirect} calls

ipv4_{update_pmtu,redirect} were called with tunnel's ifindex (t->dev is a
tunnel netdevice). It caused wrong route lookup and failure of pmtu update or
redirect. We should use the same ifindex that we use in ip_route_output_* in
*tunnel_xmit code. It is t->parms.link .

Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>