firefly-linux-kernel-4.4.55.git
7 years agoubifs: Fix journal replay wrt. xattr nodes
Richard Weinberger [Tue, 10 Jan 2017 10:49:40 +0000 (11:49 +0100)]
ubifs: Fix journal replay wrt. xattr nodes

commit 1cb51a15b576ee325d527726afff40947218fd5e upstream.

When replaying the journal it can happen that a journal entry points to
a garbage collected node.
This is the case when a power-cut occurred between a garbage collect run
and a commit. In such a case nodes have to be read using the failable
read functions to detect whether the found node matches what we expect.

One corner case was forgotten, when the journal contains an entry to
remove an inode all xattrs have to be removed too. UBIFS models xattr
like directory entries, so the TNC code iterates over
all xattrs of the inode and removes them too. This code re-uses the
functions for walking directories and calls ubifs_tnc_next_ent().
ubifs_tnc_next_ent() expects to be used only after the journal and
aborts when a node does not match the expected result. This behavior can
render an UBIFS volume unmountable after a power-cut when xattrs are
used.

Fix this issue by using failable read functions in ubifs_tnc_next_ent()
too when replaying the journal.
Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system")
Reported-by: Rock Lee <rockdotlee@gmail.com>
Reviewed-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoqla2xxx: Fix crash due to null pointer access
Quinn Tran [Sat, 24 Dec 2016 02:06:10 +0000 (18:06 -0800)]
qla2xxx: Fix crash due to null pointer access

commit fc1ffd6cb38a1c1af625b9833c41928039e733f5 upstream.

During code inspection, while investigating following stack trace
seen on one of the test setup, we found out there was possibility
of memory leak becuase driver was not unwinding the stack properly.

This issue has not been reproduced in a test environment or on a
customer setup.

Here's stack trace that was seen.

[1469877.797315] Call Trace:
[1469877.799940]  [<ffffffffa03ab6e9>] qla2x00_mem_alloc+0xb09/0x10c0 [qla2xxx]
[1469877.806980]  [<ffffffffa03ac50a>] qla2x00_probe_one+0x86a/0x1b50 [qla2xxx]
[1469877.814013]  [<ffffffff813b6d01>] ? __pm_runtime_resume+0x51/0xa0
[1469877.820265]  [<ffffffff8157c1f5>] ? _raw_spin_lock_irqsave+0x25/0x90
[1469877.826776]  [<ffffffff8157cd2d>] ? _raw_spin_unlock_irqrestore+0x6d/0x80
[1469877.833720]  [<ffffffff810741d1>] ? preempt_count_sub+0xb1/0x100
[1469877.839885]  [<ffffffff8157cd0c>] ? _raw_spin_unlock_irqrestore+0x4c/0x80
[1469877.846830]  [<ffffffff81319b9c>] local_pci_probe+0x4c/0xb0
[1469877.852562]  [<ffffffff810741d1>] ? preempt_count_sub+0xb1/0x100
[1469877.858727]  [<ffffffff81319c89>] pci_call_probe+0x89/0xb0

Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[ bvanassche: Fixed spelling in patch description ]
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/ioapic: Restore IO-APIC irq_chip retrigger callback
Ruslan Ruslichenko [Tue, 17 Jan 2017 14:13:52 +0000 (16:13 +0200)]
x86/ioapic: Restore IO-APIC irq_chip retrigger callback

commit 020eb3daaba2857b32c4cf4c82f503d6a00a67de upstream.

commit d32932d02e18 removed the irq_retrigger callback from the IO-APIC
chip and did not add it to the new IO-APIC-IR irq chip.

Unfortunately the software resend fallback is not enabled on X86, so edge
interrupts which are received during the lazy disabled state of the
interrupt line are not retriggered and therefor lost.

Restore the callbacks.

[ tglx: Massaged changelog ]

Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Signed-off-by: Ruslan Ruslichenko <rruslich@cisco.com>
Cc: xe-linux-external@cisco.com
Link: http://lkml.kernel.org/r/1484662432-13580-1-git-send-email-rruslich@cisco.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomtd: nand: xway: disable module support
Hauke Mehrtens [Mon, 5 Dec 2016 21:14:36 +0000 (22:14 +0100)]
mtd: nand: xway: disable module support

commit 73529c872a189c747bdb528ce9b85b67b0e28dec upstream.

The xway_nand driver accesses the ltq_ebu_membase symbol which is not
exported. This also should not get exported and we should handle the
EBU interface in a better way later. This quick fix just deactivated
support for building as module.

Fixes: 99f2b107924c ("mtd: lantiq: Add NAND support on Lantiq XWAY SoC.")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoieee802154: atusb: do not use the stack for buffers to make them DMA able
Stefan Schmidt [Thu, 15 Dec 2016 17:40:14 +0000 (18:40 +0100)]
ieee802154: atusb: do not use the stack for buffers to make them DMA able

commit 05a974efa4bdf6e2a150e3f27dc6fcf0a9ad5655 upstream.

From 4.9 we should really avoid using the stack here as this will not be DMA
able on various platforms. This changes the buffers already being present in
time of 4.9 being released. This should go into stable as well.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agommc: mxs-mmc: Fix additional cycles after transmission stop
Stefan Wahren [Thu, 5 Jan 2017 19:24:04 +0000 (19:24 +0000)]
mmc: mxs-mmc: Fix additional cycles after transmission stop

commit 01167c7b9cbf099c69fe411a228e4e9c7104e123 upstream.

According to the code the intention is to append 8 SCK cycles
instead of 4 at end of a MMC_STOP_TRANSMISSION command. But this
will never happened because it's an AC command not an ADTC command.
So fix this by moving the statement into the right function.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Fixes: e4243f13d10e (mmc: mxs-mmc: add mmc host driver for i.MX23/28)
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoHID: corsair: fix control-transfer error handling
Johan Hovold [Thu, 12 Jan 2017 17:17:43 +0000 (18:17 +0100)]
HID: corsair: fix control-transfer error handling

commit 7a546af50eb78ab99840903083231eb635c8a566 upstream.

Make sure to check for short control transfers in order to avoid parsing
uninitialised buffer data and leaking it to user space.

Note that the backlight and macro-mode buffer constraints are kept as
loose as possible in order to avoid any regressions should the current
buffer sizes be larger than necessary.

Fixes: 6f78193ee9ea ("HID: corsair: Add Corsair Vengeance K90 driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoHID: corsair: fix DMA buffers on stack
Johan Hovold [Thu, 12 Jan 2017 17:17:42 +0000 (18:17 +0100)]
HID: corsair: fix DMA buffers on stack

commit 6d104af38b570d37aa32a5803b04c354f8ed513d upstream.

Not all platforms support DMA to the stack, and specifically since v4.9
this is no longer supported on x86 with VMAP_STACK either.

Note that the macro-mode buffer was larger than necessary.

Fixes: 6f78193ee9ea ("HID: corsair: Add Corsair Vengeance K90 driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoPCI: Enumerate switches below PCI-to-PCIe bridges
Bjorn Helgaas [Wed, 11 Jan 2017 15:11:53 +0000 (09:11 -0600)]
PCI: Enumerate switches below PCI-to-PCIe bridges

commit 51ebfc92b72b4f7dac1ab45683bf56741e454b8c upstream.

A PCI-to-PCIe bridge (a "reverse bridge") has a PCI or PCI-X primary
interface and a PCI Express secondary interface.  The PCIe interface is a
Downstream Port that originates a Link.  See the "PCI Express to PCI/PCI-X
Bridge Specification", rev 1.0, sections 1.2 and A.6.

The bug report below involves a PCI-to-PCIe bridge and a PCIe switch below
the bridge:

  00:1e.0 Intel 82801 PCI Bridge to [bus 01-0a]
  01:00.0 Pericom PI7C9X111SL PCIe-to-PCI Reversible Bridge to [bus 02-0a]
  02:00.0 Pericom Device 8608 [PCIe Upstream Port] to [bus 03-0a]
  03:01.0 Pericom Device 8608 [PCIe Downstream Port] to [bus 0a]

01:00.0 is configured as a PCI-to-PCIe bridge (despite the name printed by
lspci).  As we traverse a PCIe hierarchy, device connections alternate
between PCIe Links and internal Switch logic.  Previously we did not
recognize that 01:00.0 had a secondary link, so we thought the 02:00.0
Upstream Port *did* have a secondary link.  In fact, it's the other way
around: 01:00.0 has a secondary link, and 02:00.0 has internal Switch logic
on its secondary side.

When we thought 02:00.0 had a secondary link, the pci_scan_slot() ->
only_one_child() path assumed 02:00.0 could have only one child, so 03:00.0
was the only possible downstream device.  But 03:00.0 doesn't exist, so we
didn't look for any other devices on bus 03.

Booting with "pci=pcie_scan_all" is a workaround, but we don't want users
to have to do that.

Recognize that PCI-to-PCIe bridges originate links on their secondary
interfaces.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=189361
Fixes: d0751b98dfa3 ("PCI: Add dev->has_secondary_link to track downstream PCIe links")
Tested-by: Blake Moore <blake.moore@men.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agofuse: clear FR_PENDING flag when moving requests out of pending queue
Tahsin Erdogan [Thu, 12 Jan 2017 20:04:04 +0000 (12:04 -0800)]
fuse: clear FR_PENDING flag when moving requests out of pending queue

commit a8a86d78d673b1c99fe9b0064739fde9e9774184 upstream.

fuse_abort_conn() moves requests from pending list to a temporary list
before canceling them. This operation races with request_wait_answer()
which also tries to remove the request after it gets a fatal signal. It
checks FR_PENDING flag to determine whether the request is still in the
pending list.

Make fuse_abort_conn() clear FR_PENDING flag so that request_wait_answer()
does not remove the request from temporary list.

This bug causes an Oops when trying to delete an already deleted list entry
in end_requests().

Fixes: ee314a870e40 ("fuse: abort: no fc->lock needed for request ending")
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosvcrpc: don't leak contexts on PROC_DESTROY
J. Bruce Fields [Mon, 9 Jan 2017 22:15:18 +0000 (17:15 -0500)]
svcrpc: don't leak contexts on PROC_DESTROY

commit 78794d1890708cf94e3961261e52dcec2cc34722 upstream.

Context expiry times are in units of seconds since boot, not unix time.

The use of get_seconds() here therefore sets the expiry time decades in
the future.  This prevents timely freeing of contexts destroyed by
client RPC_GSS_PROC_DESTROY requests.  We'd still free them eventually
(when the module is unloaded or the container shut down), but a lot of
contexts could pile up before then.

Fixes: c5b29f885afe "sunrpc: use seconds since boot in expiry cache"
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/PCI: Ignore _CRS on Supermicro X8DTH-i/6/iF/6F
Bjorn Helgaas [Wed, 28 Dec 2016 20:55:16 +0000 (14:55 -0600)]
x86/PCI: Ignore _CRS on Supermicro X8DTH-i/6/iF/6F

commit 89e9f7bcd8744ea25fcf0ac671b8d72c10d7d790 upstream.

Martin reported that the Supermicro X8DTH-i/6/iF/6F advertises incorrect
host bridge windows via _CRS:

  pci_root PNP0A08:00: host bridge window [io  0xf000-0xffff]
  pci_root PNP0A08:01: host bridge window [io  0xf000-0xffff]

Both bridges advertise the 0xf000-0xffff window, which cannot be correct.

Work around this by ignoring _CRS on this system.  The downside is that we
may not assign resources correctly to hot-added PCI devices (if they are
possible on this system).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=42606
Reported-by: Martin Burnicki <martin.burnicki@meinberg.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotmpfs: clear S_ISGID when setting posix ACLs
Gu Zheng [Mon, 9 Jan 2017 01:34:48 +0000 (09:34 +0800)]
tmpfs: clear S_ISGID when setting posix ACLs

commit 497de07d89c1410d76a15bec2bb41f24a2a89f31 upstream.

This change was missed the tmpfs modification in In CVE-2016-7097
commit 073931017b49 ("posix_acl: Clear SGID bit when setting
file permissions")
It can test by xfstest generic/375, which failed to clear
setgid bit in the following test case on tmpfs:

  touch $testfile
  chown 100:100 $testfile
  chmod 2755 $testfile
  _runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile

Signed-off-by: Gu Zheng <guzheng1@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: dts: imx31: fix AVIC base address
Vladimir Zapolskiy [Thu, 17 Nov 2016 01:30:51 +0000 (03:30 +0200)]
ARM: dts: imx31: fix AVIC base address

commit af92305e567b7f4c9cf48b9e46c1f48ec9ffb1fb upstream.

On i.MX31 AVIC interrupt controller base address is at 0x68000000.

The problem was shadowed by the AVIC driver, which takes the correct
base address from a SoC specific header file.

Fixes: d2a37b3d91f4 ("ARM i.MX31: Add devicetree support")
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: dts: imx31: move CCM device node to AIPS2 bus devices
Vladimir Zapolskiy [Mon, 26 Sep 2016 00:03:41 +0000 (03:03 +0300)]
ARM: dts: imx31: move CCM device node to AIPS2 bus devices

commit 1f87aee6a2e55eda466a43ba6248a8b75eede153 upstream.

i.MX31 Clock Control Module controller is found on AIPS2 bus, move it
there from SPBA bus to avoid a conflict of device IO space mismatch.

Fixes: ef0e4a606fb6 ("ARM: mx31: Replace clk_register_clkdev with clock DT lookup")
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: dts: imx31: fix clock control module interrupts description
Vladimir Zapolskiy [Mon, 26 Sep 2016 00:03:40 +0000 (03:03 +0300)]
ARM: dts: imx31: fix clock control module interrupts description

commit 2e575cbc930901718cc18e084566ecbb9a4b5ebb upstream.

The type of AVIC interrupt controller found on i.MX31 is one-cell,
namely 31 for CCM DVFS and 53 for CCM, however for clock control
module its interrupts are specified as 3-cells, fix it.

Fixes: ef0e4a606fb6 ("ARM: mx31: Replace clk_register_clkdev with clock DT lookup")
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoperf scripting: Avoid leaking the scripting_context variable
Arnaldo Carvalho de Melo [Tue, 25 Oct 2016 20:20:47 +0000 (17:20 -0300)]
perf scripting: Avoid leaking the scripting_context variable

commit cf346d5bd4b9d61656df2f72565c9b354ef3ca0d upstream.

Both register_perl_scripting() and register_python_scripting() allocate
this variable, fix it by checking if it already was.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 7e4b21b84c43 ("perf/scripts: Add Python scripting engine")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoIB/IPoIB: Remove can't use GFP_NOIO warning
Kamal Heib [Thu, 10 Nov 2016 08:16:48 +0000 (10:16 +0200)]
IB/IPoIB: Remove can't use GFP_NOIO warning

commit 0b59970e7d96edcb3c7f651d9d48e1a59af3c3b0 upstream.

Remove the warning print of "can't use of GFP_NOIO" to avoid prints in
each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO.

This print become more annoying when the IPoIB interface is configured
to work in connected mode.

Fixes: 09b93088d750 ('IB: Add a QP creation flag to use GFP_NOIO allocations')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoIB/mlx4: When no DMFS for IPoIB, don't allow NET_IF QPs
Eran Ben Elisha [Thu, 10 Nov 2016 09:31:00 +0000 (11:31 +0200)]
IB/mlx4: When no DMFS for IPoIB, don't allow NET_IF QPs

commit 1f22e454df2eb99ba6b7ace3f594f6805cdf5cbc upstream.

According to the firmware spec, FLOW_STEERING_IB_UC_QP_RANGE command is
supported only if dmfs_ipoib bit is set.

If it isn't set we want to ensure allocating NET_IF QPs fail. We do so
by filling out the allocation bitmap. By thus, the NET_IF QPs allocating
function won't find any free QP and will fail.

Fixes: c1c98501121e ('IB/mlx4: Add support for steerable IB UD QPs')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoIB/mlx4: Fix port query for 56Gb Ethernet links
Saeed Mahameed [Thu, 10 Nov 2016 09:30:59 +0000 (11:30 +0200)]
IB/mlx4: Fix port query for 56Gb Ethernet links

commit 6fa26208206c406fa529cd73f7ae6bf4181e270b upstream.

Report the correct speed in the port attributes when using a 56Gbps
ethernet link.  Without this change the field is incorrectly set to 10.

Fixes: a9c766bb75ee ('IB/mlx4: Fix info returned when querying IBoE ports')
Fixes: 2e96691c31ec ('IB: Use central enum for speed instead of hard-coded values')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoIB/mlx4: Fix out-of-range array index in destroy qp flow
Jack Morgenstein [Sun, 27 Nov 2016 13:18:19 +0000 (15:18 +0200)]
IB/mlx4: Fix out-of-range array index in destroy qp flow

commit c482af646d0809a8d5e1b7f4398cce3592589b98 upstream.

For non-special QPs, the port value becomes non-zero only at the
RESET-to-INIT transition. If the QP has not undergone that transition,
its port number value is still zero.

If such a QP is destroyed before being moved out of the RESET state,
subtracting one from the qp port number results in a negative value.
Using that negative value as an index into the qp1_proxy array
results in an out-of-bounds array reference.

Fix this by testing that the QP type is one that uses qp1_proxy before
using the port number. For special QPs of all types, the port number is
specified at QP creation time.

Fixes: 9433c188915c ("IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoIB/mlx4: Set traffic class in AH
Maor Gottlieb [Thu, 10 Nov 2016 09:30:53 +0000 (11:30 +0200)]
IB/mlx4: Set traffic class in AH

commit af4295c117b82a521b05d0daf39ce879d26e6cb1 upstream.

Set traffic class within sl_tclass_flowlabel when create iboe AH.
Without this the TOS value will be empty when running VLAN tagged
traffic, because the TOS value is taken from the traffic class in the
address handle attributes.

Fixes: 9106c4106974 ('IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoIB/mlx5: Wait for all async command completions to complete
Eli Cohen [Thu, 27 Oct 2016 13:36:43 +0000 (16:36 +0300)]
IB/mlx5: Wait for all async command completions to complete

commit acbda523884dcf45613bf6818d8ead5180df35c2 upstream.

Wait before continuing unload till all pending mkey async creation requests
are done.

Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoftrace/x86: Set ftrace_stub to weak to prevent gcc from using short jumps to it
Steven Rostedt [Tue, 17 May 2016 03:00:35 +0000 (23:00 -0400)]
ftrace/x86: Set ftrace_stub to weak to prevent gcc from using short jumps to it

commit 8329e818f14926a6040df86b2668568bde342ebf upstream.

Matt Fleming reported seeing crashes when enabling and disabling
function profiling which uses function graph tracer. Later Namhyung Kim
hit a similar issue and he found that the issue was due to the jmp to
ftrace_stub in ftrace_graph_call was only two bytes, and when it was
changed to jump to the tracing code, it overwrote the ftrace_stub that
was after it.

Masami Hiramatsu bisected this down to a binutils change:

8dcea93252a9ea7dff57e85220a719e2a5e8ab41 is the first bad commit
commit 8dcea93252a9ea7dff57e85220a719e2a5e8ab41
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri May 15 03:17:31 2015 -0700

    Add -mshared option to x86 ELF assembler

    This patch adds -mshared option to x86 ELF assembler.  By default,
    assembler will optimize out non-PLT relocations against defined non-weak
    global branch targets with default visibility.  The -mshared option tells
    the assembler to generate code which may go into a shared library
    where all non-weak global branch targets with default visibility can
    be preempted.  The resulting code is slightly bigger.  This option
    only affects the handling of branch instructions.

Declaring ftrace_stub as a weak call prevents gas from using two byte
jumps to it, which would be converted to a jump to the function graph
code.

Link: http://lkml.kernel.org/r/20160516230035.1dbae571@gandalf.local.home
Reported-by: Matt Fleming <matt@codeblueprint.co.uk>
Reported-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoLinux 4.4.44
Greg Kroah-Hartman [Fri, 20 Jan 2017 09:56:50 +0000 (10:56 +0100)]
Linux 4.4.44

7 years agopinctrl: sh-pfc: Do not unconditionally support PIN_CONFIG_BIAS_DISABLE
Niklas Söderlund [Sat, 12 Nov 2016 16:04:24 +0000 (17:04 +0100)]
pinctrl: sh-pfc: Do not unconditionally support PIN_CONFIG_BIAS_DISABLE

commit 5d7400c4acbf7fe633a976a89ee845f7333de3e4 upstream.

Always stating PIN_CONFIG_BIAS_DISABLE is supported gives untrue output
when examining /sys/kernel/debug/pinctrl/e6060000.pfc/pinconf-pins if
the operation get_bias() is implemented but the pin is not handled by
the get_bias() implementation. In that case the output will state that
"input bias disabled" indicating that this pin has bias control
support.

Make support for PIN_CONFIG_BIAS_DISABLE depend on that the pin either
supports SH_PFC_PIN_CFG_PULL_UP or SH_PFC_PIN_CFG_PULL_DOWN. This also
solves the issue where SoC specific implementations print error messages
if their particular implementation of {set,get}_bias() is called with a
pin it does not know about.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowerpc/ibmebus: Fix device reference leaks in sysfs interface
Johan Hovold [Tue, 1 Nov 2016 15:26:00 +0000 (16:26 +0100)]
powerpc/ibmebus: Fix device reference leaks in sysfs interface

commit fe0f3168169f7c34c29b0cf0c489f126a7f29643 upstream.

Make sure to drop any reference taken by bus_find_device() in the sysfs
callbacks that are used to create and destroy devices based on
device-tree entries.

Fixes: 6bccf755ff53 ("[POWERPC] ibmebus: dynamic addition/removal of adapters, some code cleanup")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowerpc/ibmebus: Fix further device reference leaks
Johan Hovold [Tue, 1 Nov 2016 15:26:01 +0000 (16:26 +0100)]
powerpc/ibmebus: Fix further device reference leaks

commit 815a7141c4d1b11610dccb7fcbb38633759824f2 upstream.

Make sure to drop any reference taken by bus_find_device() when creating
devices during init and driver registration.

Fixes: 55347cc9962f ("[POWERPC] ibmebus: Add device creation and bus probing based on of_device")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobus: vexpress-config: fix device reference leak
Johan Hovold [Wed, 16 Nov 2016 17:31:30 +0000 (17:31 +0000)]
bus: vexpress-config: fix device reference leak

commit c090959b9dd8c87703e275079aa4b4a824ba3f8e upstream.

Make sure to drop the reference to the parent device taken by
class_find_device() after populating the bus.

Fixes: 3b9334ac835b ("mfd: vexpress: Convert custom func API to regmap")
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoblk-mq: Always schedule hctx->next_cpu
Gabriel Krisman Bertazi [Wed, 28 Sep 2016 03:24:24 +0000 (00:24 -0300)]
blk-mq: Always schedule hctx->next_cpu

commit c02ebfdddbafa9a6a0f52fbd715e6bfa229af9d3 upstream.

Commit 0e87e58bf60e ("blk-mq: improve warning for running a queue on the
wrong CPU") attempts to avoid triggering the WARN_ON in
__blk_mq_run_hw_queue when the expected CPU is dead.  Problem is, in the
last batch execution before round robin, blk_mq_hctx_next_cpu can
schedule a dead CPU and also update next_cpu to the next alive CPU in
the mask, which will trigger the WARN_ON despite the previous
workaround.

The following patch fixes this scenario by always scheduling the value
in hctx->next_cpu.  This changes the moment when we round-robin the CPU
running the hctx, but it really doesn't matter, since it still executes
BLK_MQ_CPU_WORK_BATCH times in a row before switching to another CPU.

Fixes: 0e87e58bf60e ("blk-mq: improve warning for running a queue on the wrong CPU")
Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoACPI / APEI: Fix NMI notification handling
Prarit Bhargava [Wed, 30 Nov 2016 13:19:39 +0000 (08:19 -0500)]
ACPI / APEI: Fix NMI notification handling

commit a545715d2dae8d071c5b06af947b07ffa846b288 upstream.

When removing and adding cpu 0 on a system with GHES NMI the following stack
trace is seen when re-adding the cpu:

WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #2
Call Trace:
 dump_stack+0x63/0x8e
 __warn+0xd1/0xf0
 warn_slowpath_null+0x1d/0x20
 setup_local_APIC+0x275/0x370
 apic_ap_setup+0xe/0x20
 start_secondary+0x48/0x180
 set_init_arg+0x55/0x55
 early_idt_handler_array+0x120/0x120
 x86_64_start_reservations+0x2a/0x2c
 x86_64_start_kernel+0x13d/0x14c

During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
NMI on CPU 0.  The GHES NMI handler, ghes_notify_nmi() runs the
ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
(0xf6).  The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is  also
0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms
that something has set the IRQ_WORK_VECTOR line prior to the APIC being
initialized.

Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
incorrectly modified the behavior such that the handler returns
NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
work queue for every NMI.

This patch modifies the ghes_proc_irq_work() to run as it did prior to
2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by
properly returning NMI_HANDLED and only calling the work queue if
NMI_HANDLED has been set.

Fixes: 2383844d4850 (GHES: Elliminate double-loop in the NMI handler)
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoblock: cfq_cpd_alloc() should use @gfp
Tejun Heo [Thu, 10 Nov 2016 16:16:37 +0000 (11:16 -0500)]
block: cfq_cpd_alloc() should use @gfp

commit ebc4ff661fbe76781c6b16dfb7b754a5d5073f8e upstream.

cfq_cpd_alloc() which is the cpd_alloc_fn implementation for cfq was
incorrectly hard coding GFP_KERNEL instead of using the mask specified
through the @gfp parameter.  This currently doesn't cause any actual
issues because all current callers specify GFP_KERNEL.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e4a9bde9589f ("blkcg: replace blkcg_policy->cpd_size with ->cpd_alloc/free_fn() methods")
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocpufreq: powernv: Disable preemption while checking CPU throttling state
Denis Kirjanov [Tue, 8 Nov 2016 10:39:28 +0000 (05:39 -0500)]
cpufreq: powernv: Disable preemption while checking CPU throttling state

commit 8a10c06a20ec8097a68fd7a4a1c0e285095b4d2f upstream.

With preemption turned on we can read incorrect throttling state
while being switched to CPU on a different chip.

 BUG: using smp_processor_id() in preemptible [00000000] code: cat/7343
 caller is .powernv_cpufreq_throttle_check+0x2c/0x710
 CPU: 13 PID: 7343 Comm: cat Not tainted 4.8.0-rc5-dirty #1
 Call Trace:
 [c0000007d25b75b0] [c000000000971378] .dump_stack+0xe4/0x150 (unreliable)
 [c0000007d25b7640] [c0000000005162e4] .check_preemption_disabled+0x134/0x150
 [c0000007d25b76e0] [c0000000007b63ac] .powernv_cpufreq_throttle_check+0x2c/0x710
 [c0000007d25b7790] [c0000000007b6d18] .powernv_cpufreq_target_index+0x288/0x360
 [c0000007d25b7870] [c0000000007acee4] .__cpufreq_driver_target+0x394/0x8c0
 [c0000007d25b7920] [c0000000007b22ac] .cpufreq_set+0x7c/0xd0
 [c0000007d25b79b0] [c0000000007adf50] .store_scaling_setspeed+0x80/0xc0
 [c0000007d25b7a40] [c0000000007ae270] .store+0xa0/0x100
 [c0000007d25b7ae0] [c0000000003566e8] .sysfs_kf_write+0x88/0xb0
 [c0000007d25b7b70] [c0000000003553b8] .kernfs_fop_write+0x178/0x260
 [c0000007d25b7c10] [c0000000002ac3cc] .__vfs_write+0x3c/0x1c0
 [c0000007d25b7cf0] [c0000000002ad584] .vfs_write+0xc4/0x230
 [c0000007d25b7d90] [c0000000002aeef8] .SyS_write+0x58/0x100
 [c0000007d25b7e30] [c00000000000bfec] system_call+0x38/0xfc

Fixes: 09a972d16209 (cpufreq: powernv: Report cpu frequency throttling)
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoNFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.
NeilBrown [Mon, 19 Dec 2016 00:19:31 +0000 (11:19 +1100)]
NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.

commit cfd278c280f997cf2fe4662e0acab0fe465f637b upstream.

Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds',
then ds->ds_clp will also be non-NULL.

This is not necessasrily true in the case when the process received a fatal signal
while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect().
In that case ->ds_clp may not be set, and the devid may not recently have been marked
unavailable.

So add a test for ds_clp == NULL and return NULL in that case.

Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race")
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Olga Kornievskaia <aglo@umich.edu>
Acked-by: Adamson, Andy <William.Adamson@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoNFS: Fix a performance regression in readdir
Trond Myklebust [Sat, 19 Nov 2016 15:54:55 +0000 (10:54 -0500)]
NFS: Fix a performance regression in readdir

commit 79f687a3de9e3ba2518b4ea33f38ca6cbe9133eb upstream.

Ben Coddington reports that commit 311324ad1713, by adding the function
nfs_dir_mapping_need_revalidate() that checks page cache validity on
each call to nfs_readdir() causes a performance regression when
the directory is being modified.

If the directory is changing while we're iterating through the directory,
POSIX does not require us to invalidate the page cache unless the user
calls rewinddir(). However, we still do want to ensure that we use
readdirplus in order to avoid a load of stat() calls when the user
is doing an 'ls -l' workload.

The fix should be to invalidate the page cache immediately when we're
setting the NFS_INO_ADVISE_RDPLUS bit.

Reported-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 311324ad1713 ("NFS: Be more aggressive in using readdirplus...")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopNFS: Fix race in pnfs_wait_on_layoutreturn
Trond Myklebust [Fri, 18 Nov 2016 20:21:30 +0000 (15:21 -0500)]
pNFS: Fix race in pnfs_wait_on_layoutreturn

commit ee284e35d8c71bf5d4d807eaff6f67a17134b359 upstream.

We must put the task to sleep while holding the inode->i_lock in order
to ensure atomicity with the test for NFS_LAYOUT_RETURN.

Fixes: 500d701f336b ("NFS41: make close wait for layoutreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopinctrl: meson: fix gpio request disabling other modes
Neil Armstrong [Tue, 6 Dec 2016 14:08:16 +0000 (15:08 +0100)]
pinctrl: meson: fix gpio request disabling other modes

commit f24d311f92b516a8aadef5056424ccabb4068e7b upstream.

The pinctrl_gpio_request is called with the "full" gpio number, already
containing the base, then meson_pmx_request_gpio is then called with the
final pin number.
Remove the base addition when calling meson_pmx_disable_other_groups.

Fixes: 6ac730951104 ("pinctrl: add driver for Amlogic Meson SoCs")
CC: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Acked-by: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobtrfs: fix error handling when run_delayed_extent_op fails
Jeff Mahoney [Tue, 20 Dec 2016 18:28:27 +0000 (13:28 -0500)]
btrfs: fix error handling when run_delayed_extent_op fails

commit aa7c8da35d1905d80e840d075f07d26ec90144b5 upstream.

In __btrfs_run_delayed_refs, the error path when run_delayed_extent_op
fails sets locked_ref->processing = 0 but doesn't re-increment
delayed_refs->num_heads_ready.  As a result, we end up triggering
the WARN_ON in btrfs_select_ref_head.

Fixes: d7df2c796d7 (Btrfs: attach delayed ref updates to delayed ref heads)
Reported-by: Jon Nelson <jnelson-suse@jamponi.net>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobtrfs: fix locking when we put back a delayed ref that's too new
Jeff Mahoney [Tue, 20 Dec 2016 18:28:28 +0000 (13:28 -0500)]
btrfs: fix locking when we put back a delayed ref that's too new

commit d0280996437081dd12ed1e982ac8aeaa62835ec4 upstream.

In __btrfs_run_delayed_refs, when we put back a delayed ref that's too
new, we have already dropped the lock on locked_ref when we set
->processing = 0.

This patch keeps the lock to cover that assignment.

Fixes: d7df2c796d7 (Btrfs: attach delayed ref updates to delayed ref heads)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command...
Lukasz Odzioba [Wed, 28 Dec 2016 13:55:40 +0000 (14:55 +0100)]
x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option

commit dd853fd216d1485ed3045ff772079cc8689a9a4a upstream.

A negative number can be specified in the cmdline which will be used as
setup_clear_cpu_cap() argument. With that we can clear/set some bit in
memory predceeding boot_cpu_data/cpu_caps_cleared which may cause kernel
to misbehave. This patch adds lower bound check to setup_disablecpuid().

Boris Petkov reproduced a crash:

  [    1.234575] BUG: unable to handle kernel paging request at ffffffff858bd540
  [    1.236535] IP: memcpy_erms+0x6/0x10

Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andi.kleen@intel.com
Cc: bp@alien8.de
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: slaoub@gmail.com
Fixes: ac72e7888a61 ("x86: add generic clearcpuid=... option")
Link: http://lkml.kernel.org/r/1482933340-11857-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: ch341: fix modem-control and B0 handling
Johan Hovold [Fri, 6 Jan 2017 18:15:12 +0000 (19:15 +0100)]
USB: serial: ch341: fix modem-control and B0 handling

commit 030ee7ae52a46a2be52ccc8242c4a330aba8d38e upstream.

The modem-control signals are managed by the tty-layer during open and
should not be asserted prematurely when set_termios is called from
driver open.

Also make sure that the signals are asserted only when changing speed
from B0.

Fixes: 664d5df92e88 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: ch341: fix resume after reset
Johan Hovold [Fri, 6 Jan 2017 18:15:14 +0000 (19:15 +0100)]
USB: serial: ch341: fix resume after reset

commit ce5e292828117d1b71cbd3edf9e9137cf31acd30 upstream.

Fix reset-resume handling which failed to resubmit the read and
interrupt URBs, thereby leaving a port that was open before suspend in a
broken state until closed and reopened.

Fixes: 1ded7ea47b88 ("USB: ch341 serial: fix port number changed after
resume")
Fixes: 2bfd1c96a9fb ("USB: serial: ch341: remove reset_resume callback")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/radeon: drop verde dpm quirks
Alex Deucher [Thu, 5 Jan 2017 17:39:01 +0000 (12:39 -0500)]
drm/radeon: drop verde dpm quirks

commit 8a08403bcb39f5d0e733bcf59a8a74f16b538f6e upstream.

fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=98897
https://bugs.launchpad.net/bugs/1651981

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Adrian Fiergolski <A.Fiergolski@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosysctl: Drop reference added by grab_header in proc_sys_readdir
Zhou Chengming [Fri, 6 Jan 2017 01:32:32 +0000 (09:32 +0800)]
sysctl: Drop reference added by grab_header in proc_sys_readdir

commit 93362fa47fe98b62e4a34ab408c4a418432e7939 upstream.

Fixes CVE-2016-9191, proc_sys_readdir doesn't drop reference
added by grab_header when return from !dir_emit_dots path.
It can cause any path called unregister_sysctl_table will
wait forever.

The calltrace of CVE-2016-9191:

[ 5535.960522] Call Trace:
[ 5535.963265]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 5535.968817]  [<ffffffff817d33fb>] schedule_timeout+0x3db/0x6f0
[ 5535.975346]  [<ffffffff817cf055>] ? wait_for_completion+0x45/0x130
[ 5535.982256]  [<ffffffff817cf0d3>] wait_for_completion+0xc3/0x130
[ 5535.988972]  [<ffffffff810d1fd0>] ? wake_up_q+0x80/0x80
[ 5535.994804]  [<ffffffff8130de64>] drop_sysctl_table+0xc4/0xe0
[ 5536.001227]  [<ffffffff8130de17>] drop_sysctl_table+0x77/0xe0
[ 5536.007648]  [<ffffffff8130decd>] unregister_sysctl_table+0x4d/0xa0
[ 5536.014654]  [<ffffffff8130deff>] unregister_sysctl_table+0x7f/0xa0
[ 5536.021657]  [<ffffffff810f57f5>] unregister_sched_domain_sysctl+0x15/0x40
[ 5536.029344]  [<ffffffff810d7704>] partition_sched_domains+0x44/0x450
[ 5536.036447]  [<ffffffff817d0761>] ? __mutex_unlock_slowpath+0x111/0x1f0
[ 5536.043844]  [<ffffffff81167684>] rebuild_sched_domains_locked+0x64/0xb0
[ 5536.051336]  [<ffffffff8116789d>] update_flag+0x11d/0x210
[ 5536.057373]  [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
[ 5536.064186]  [<ffffffff81167acb>] ? cpuset_css_offline+0x1b/0x60
[ 5536.070899]  [<ffffffff810fce3d>] ? trace_hardirqs_on+0xd/0x10
[ 5536.077420]  [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
[ 5536.084234]  [<ffffffff8115a9f5>] ? css_killed_work_fn+0x25/0x220
[ 5536.091049]  [<ffffffff81167ae5>] cpuset_css_offline+0x35/0x60
[ 5536.097571]  [<ffffffff8115aa2c>] css_killed_work_fn+0x5c/0x220
[ 5536.104207]  [<ffffffff810bc83f>] process_one_work+0x1df/0x710
[ 5536.110736]  [<ffffffff810bc7c0>] ? process_one_work+0x160/0x710
[ 5536.117461]  [<ffffffff810bce9b>] worker_thread+0x12b/0x4a0
[ 5536.123697]  [<ffffffff810bcd70>] ? process_one_work+0x710/0x710
[ 5536.130426]  [<ffffffff810c3f7e>] kthread+0xfe/0x120
[ 5536.135991]  [<ffffffff817d4baf>] ret_from_fork+0x1f/0x40
[ 5536.142041]  [<ffffffff810c3e80>] ? kthread_create_on_node+0x230/0x230

One cgroup maintainer mentioned that "cgroup is trying to offline
a cpuset css, which takes place under cgroup_mutex.  The offlining
ends up trying to drain active usages of a sysctl table which apprently
is not happening."
The real reason is that proc_sys_readdir doesn't drop reference added
by grab_header when return from !dir_emit_dots path. So this cpuset
offline path will wait here forever.

See here for details: http://www.openwall.com/lists/oss-security/2016/11/04/13

Fixes: f0c3b5093add ("[readdir] convert procfs")
Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: Yang Shukui <yangshukui@huawei.com>
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosysrq: attach sysrq handler correctly for 32-bit kernel
Akinobu Mita [Thu, 5 Jan 2017 17:14:16 +0000 (02:14 +0900)]
sysrq: attach sysrq handler correctly for 32-bit kernel

commit 802c03881f29844af0252b6e22be5d2f65f93fd0 upstream.

The sysrq input handler should be attached to the input device which has
a left alt key.

On 32-bit kernels, some input devices which has a left alt key cannot
attach sysrq handler.  Because the keybit bitmap in struct input_device_id
for sysrq is not correctly initialized.  KEY_LEFTALT is 56 which is
greater than BITS_PER_LONG on 32-bit kernels.

I found this problem when using a matrix keypad device which defines
a KEY_LEFTALT (56) but doesn't have a KEY_O (24 == 56%32).

Cc: Jiri Slaby <jslaby@suse.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx
Richard Genoud [Tue, 13 Dec 2016 16:27:56 +0000 (17:27 +0100)]
tty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx

commit 89d8232411a85b9a6b12fd5da4d07d8a138a8e0c upstream.

If we don't disable the transmitter in atmel_stop_tx, the DMA buffer
continues to send data until it is emptied.
This cause problems with the flow control (CTS is asserted and data are
still sent).

So, disabling the transmitter in atmel_stop_tx is a sane thing to do.

Tested on at91sam9g35-cm(DMA)
Tested for regressions on sama5d2-xplained(Fifo) and at91sam9g20ek(PDC)

Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomnt: Protect the mountpoint hashtable with mount_lock
Eric W. Biederman [Tue, 3 Jan 2017 01:18:43 +0000 (14:18 +1300)]
mnt: Protect the mountpoint hashtable with mount_lock

commit 3895dbf8985f656675b5bde610723a29cbce3fa7 upstream.

Protecting the mountpoint hashtable with namespace_sem was sufficient
until a call to umount_mnt was added to mntput_no_expire.  At which
point it became possible for multiple calls of put_mountpoint on
the same hash chain to happen on the same time.

Kristen Johansen <kjlx@templeofstupid.com> reported:
> This can cause a panic when simultaneous callers of put_mountpoint
> attempt to free the same mountpoint.  This occurs because some callers
> hold the mount_hash_lock, while others hold the namespace lock.  Some
> even hold both.
>
> In this submitter's case, the panic manifested itself as a GP fault in
> put_mountpoint() when it called hlist_del() and attempted to dereference
> a m_hash.pprev that had been poisioned by another thread.

Al Viro observed that the simple fix is to switch from using the namespace_sem
to the mount_lock to protect the mountpoint hash table.

I have taken Al's suggested patch moved put_mountpoint in pivot_root
(instead of taking mount_lock an additional time), and have replaced
new_mountpoint with get_mountpoint a function that does the hash table
lookup and addition under the mount_lock.   The introduction of get_mounptoint
ensures that only the mount_lock is needed to manipulate the mountpoint
hashtable.

d_set_mounted is modified to only set DCACHE_MOUNTED if it is not
already set.  This allows get_mountpoint to use the setting of
DCACHE_MOUNTED to ensure adding a struct mountpoint for a dentry
happens exactly once.

Fixes: ce07d891a089 ("mnt: Honor MNT_LOCKED when detaching mounts")
Reported-by: Krister Johansen <kjlx@templeofstupid.com>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovme: Fix wrong pointer utilization in ca91cx42_slave_get
Augusto Mecking Caringi [Tue, 10 Jan 2017 10:45:00 +0000 (10:45 +0000)]
vme: Fix wrong pointer utilization in ca91cx42_slave_get

commit c8a6a09c1c617402cc9254b2bc8da359a0347d75 upstream.

In ca91cx42_slave_get function, the value pointed by vme_base pointer is
set through:

*vme_base = ioread32(bridge->base + CA91CX42_VSI_BS[i]);

So it must be dereferenced to be used in calculation of pci_base:

*pci_base = (dma_addr_t)*vme_base + pci_offset;

This bug was caught thanks to the following gcc warning:

drivers/vme/bridges/vme_ca91cx42.c: In function ‘ca91cx42_slave_get’:
drivers/vme/bridges/vme_ca91cx42.c:467:14: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
*pci_base = (dma_addr_t)vme_base + pci_offset;

Signed-off-by: Augusto Mecking Caringi <augustocaringi@gmail.com>
Acked-By: Martyn Welch <martyn@welchs.me.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxhci: fix deadlock at host remove by running watchdog correctly
Mathias Nyman [Wed, 11 Jan 2017 15:10:34 +0000 (17:10 +0200)]
xhci: fix deadlock at host remove by running watchdog correctly

commit d6169d04097fd9ddf811e63eae4e5cd71e6666e2 upstream.

If a URB is killed while the host is removed we can end up in a situation
where the hub thread takes the roothub device lock, and waits for
the URB to be given back by xhci-hcd, blocking the host remove code.

xhci-hcd tries to stop the endpoint and give back the urb, but can't
as the host is removed from PCI bus at the same time, preventing the normal
way of giving back urb.

Instead we need to rely on the stop command timeout function to give back
the urb. This xhci_stop_endpoint_command_watchdog() timeout function
used a XHCI_STATE_DYING flag to indicate if the timeout function is already
running, but later this flag has been taking into use in other places to
mark that xhci is dying.

Remove checks for XHCI_STATE_DYING in xhci_urb_dequeue. We are still
checking that reading from pci state does not return 0xffffffff or that
host is not halted before trying to stop the endpoint.

This whole area of stopping endpoints, giving back URBs, and the wathdog
timeout need rework, this fix focuses on solving a specific deadlock
issue that we can then send to stable before any major rework.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoi2c: fix kernel memory disclosure in dev interface
Vlad Tsyrklevich [Mon, 9 Jan 2017 15:53:36 +0000 (22:53 +0700)]
i2c: fix kernel memory disclosure in dev interface

commit 30f939feaeee23e21391cfc7b484f012eb189c3c upstream.

i2c_smbus_xfer() does not always fill an entire block, allowing
kernel stack memory disclosure through the temp variable. Clear
it before it's read to.

Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoi2c: print correct device invalid address
John Garry [Fri, 6 Jan 2017 11:02:57 +0000 (19:02 +0800)]
i2c: print correct device invalid address

commit 6f724fb3039522486fce2e32e4c0fbe238a6ab02 upstream.

In of_i2c_register_device(), when the check for
device address validity fails we print the info.addr,
which has not been assigned properly.

Fix this by printing the actual invalid address.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Fixes: b4e2f6ac1281 ("i2c: apply DT flags when probing")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoInput: elants_i2c - avoid divide by 0 errors on bad touchscreen data
Guenter Roeck [Thu, 5 Jan 2017 22:14:54 +0000 (14:14 -0800)]
Input: elants_i2c - avoid divide by 0 errors on bad touchscreen data

commit 1c3415a06b1016a596bfe59e0cfee56c773aa958 upstream.

The following crash may be seen if bad data is received from the
touchscreen.

[ 2189.425150] elants_i2c i2c-ELAN0001:00: unknown packet ff ff ff ff
[ 2189.430738] divide error: 0000 [#1] PREEMPT SMP
[ 2189.434679] gsmi: Log Shutdown Reason 0x03
[ 2189.434689] Modules linked in: ip6t_REJECT nf_reject_ipv6 rfcomm evdi
uinput uvcvideo cmac videobuf2_vmalloc videobuf2_memops snd_hda_codec_hdmi
i2c_dev videobuf2_core snd_soc_sst_cht_bsw_rt5645 snd_hda_intel
snd_intel_sst_acpi btusb btrtl btbcm btintel bluetooth snd_soc_sst_acpi
snd_hda_codec snd_intel_sst_core snd_hwdep snd_soc_sst_mfld_platform
snd_hda_core snd_soc_rt5645 memconsole_x86_legacy memconsole zram snd_soc_rl6231
fuse ip6table_filter iwlmvm iwlwifi iwl7000_mac80211 cfg80211 iio_trig_sysfs
joydev cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer
kfifo_buf industrialio snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq
snd_seq_device ppp_async ppp_generic slhc tun
[ 2189.434866] CPU: 0 PID: 106 Comm: irq/184-ELAN000 Tainted: G        W
3.18.0-13101-g57e8190 #1
[ 2189.434883] Hardware name: GOOGLE Ultima, BIOS Google_Ultima.7287.131.43 07/20/2016
[ 2189.434898] task: ffff88017a0b6d80 ti: ffff88017a2bc000 task.ti: ffff88017a2bc000
[ 2189.434913] RIP: 0010:[<ffffffffbecc48d5>]  [<ffffffffbecc48d5>] elants_i2c_irq+0x190/0x200
[ 2189.434937] RSP: 0018:ffff88017a2bfd98  EFLAGS: 00010293
[ 2189.434948] RAX: 0000000000000000 RBX: ffff88017a967828 RCX: ffff88017a9678e8
[ 2189.434962] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000000
[ 2189.434975] RBP: ffff88017a2bfdd8 R08: 00000000000003e8 R09: 0000000000000000
[ 2189.434989] R10: 0000000000000000 R11: 000000000044a2bd R12: ffff88017a991800
[ 2189.435001] R13: ffffffffbe8a2a53 R14: ffff88017a0b6d80 R15: ffff88017a0b6d80
[ 2189.435011] FS:  0000000000000000(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000
[ 2189.435022] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2189.435030] CR2: 00007f678d94b000 CR3: 000000003f41a000 CR4: 00000000001007f0
[ 2189.435039] Stack:
[ 2189.435044]  ffff88017a2bfda8 ffff88017a9678e8 646464647a2bfdd8 0000000006e09574
[ 2189.435060]  0000000000000000 ffff88017a088b80 ffff88017a921000 ffffffffbe8a2a53
[ 2189.435074]  ffff88017a2bfe08 ffffffffbe8a2a73 ffff88017a0b6d80 0000000006e09574
[ 2189.435089] Call Trace:
[ 2189.435101]  [<ffffffffbe8a2a53>] ? irq_thread_dtor+0xa9/0xa9
[ 2189.435112]  [<ffffffffbe8a2a73>] irq_thread_fn+0x20/0x40
[ 2189.435123]  [<ffffffffbe8a2be1>] irq_thread+0x14e/0x222
[ 2189.435135]  [<ffffffffbee8cbeb>] ? __schedule+0x3b3/0x57a
[ 2189.435145]  [<ffffffffbe8a29aa>] ? wake_threads_waitq+0x2d/0x2d
[ 2189.435156]  [<ffffffffbe8a2a93>] ? irq_thread_fn+0x40/0x40
[ 2189.435168]  [<ffffffffbe87c385>] kthread+0x10e/0x116
[ 2189.435178]  [<ffffffffbe87c277>] ? __kthread_parkme+0x67/0x67
[ 2189.435189]  [<ffffffffbee900ac>] ret_from_fork+0x7c/0xb0
[ 2189.435199]  [<ffffffffbe87c277>] ? __kthread_parkme+0x67/0x67
[ 2189.435208] Code: ff ff eb 73 0f b6 bb c1 00 00 00 83 ff 03 7e 13 49 8d 7c
24 20 ba 04 00 00 00 48 c7 c6 8a cd 21 bf eb 4d 0f b6 83 c2 00 00 00 99 <f7> ff
83 f8 37 75 15 48 6b f7 37 4c 8d a3 c4 00 00 00 4c 8d ac
[ 2189.435312] RIP  [<ffffffffbecc48d5>] elants_i2c_irq+0x190/0x200
[ 2189.435323]  RSP <ffff88017a2bfd98>
[ 2189.435350] ---[ end trace f4945345a75d96dd ]---
[ 2189.443841] Kernel panic - not syncing: Fatal exception
[ 2189.444307] Kernel Offset: 0x3d800000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 2189.444519] gsmi: Log Shutdown Reason 0x02

The problem was seen with a 3.18 based kernel, but there is no reason
to believe that the upstream code is safe.

Fixes: 66aee90088da2 ("Input: add support for Elan eKTH I2C touchscreens")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: ch341: fix open and resume after B0
Johan Hovold [Fri, 6 Jan 2017 18:15:11 +0000 (19:15 +0100)]
USB: serial: ch341: fix open and resume after B0

commit a20047f36e2f6a1eea4f1fd261aaa55882369868 upstream.

The private baud_rate variable is used to configure the port at open and
reset-resume and must never be set to (and left at) zero or reset-resume
and all further open attempts will fail.

Fixes: aa91def41a7b ("USB: ch341: set tty baud speed according to tty struct")
Fixes: 664d5df92e88 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: ch341: fix control-message error handling
Johan Hovold [Fri, 6 Jan 2017 18:15:18 +0000 (19:15 +0100)]
USB: serial: ch341: fix control-message error handling

commit 2d5a9c72d0c4ac73cf97f4b7814ed6c44b1e49ae upstream.

A short control transfer would currently fail to be detected, something
which could lead to stale buffer data being used as valid input.

Check for short transfers, and make sure to log any transfer errors.

Note that this also avoids leaking heap data to user space (TIOCMGET)
and the remote device (break control).

Fixes: 6ce76104781a ("USB: Driver for CH341 USB-serial adaptor")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: ch341: fix open error handling
Johan Hovold [Fri, 6 Jan 2017 18:15:13 +0000 (19:15 +0100)]
USB: serial: ch341: fix open error handling

commit f2950b78547ffb8475297ada6b92bc2d774d5461 upstream.

Make sure to stop the interrupt URB before returning on errors during
open.

Fixes: 664d5df92e88 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: ch341: fix initial modem-control state
Johan Hovold [Fri, 6 Jan 2017 18:15:10 +0000 (19:15 +0100)]
USB: serial: ch341: fix initial modem-control state

commit 4e2da44691cffbfffb1535f478d19bc2dca3e62b upstream.

DTR and RTS will be asserted by the tty-layer when the port is opened
and deasserted on close (if HUPCL is set). Make sure the initial state
is not-asserted before the port is first opened as well.

Fixes: 664d5df92e88 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: kl5kusb105: fix line-state error handling
Johan Hovold [Tue, 10 Jan 2017 11:05:37 +0000 (12:05 +0100)]
USB: serial: kl5kusb105: fix line-state error handling

commit 146cc8a17a3b4996f6805ee5c080e7101277c410 upstream.

The current implementation failed to detect short transfers when
attempting to read the line state, and also, to make things worse,
logged the content of the uninitialised heap transfer buffer.

Fixes: abf492e7b3ae ("USB: kl5kusb105: fix DMA buffers on stack")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonl80211: fix sched scan netlink socket owner destruction
Johannes Berg [Thu, 5 Jan 2017 09:57:14 +0000 (10:57 +0100)]
nl80211: fix sched scan netlink socket owner destruction

commit 753aacfd2e95df6a0caf23c03dc309020765bea9 upstream.

A single netlink socket might own multiple interfaces *and* a
scheduled scan request (which might belong to another interface),
so when it goes away both may need to be destroyed.

Remove the schedule_scan_stop indirection to fix this - it's only
needed for interface destruction because of the way this works
right now, with a single work taking care of all interfaces.

Fixes: 93a1e86ce10e4 ("nl80211: Stop scheduled scan if netlink client disappears")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: Introduce segmented_write_std
Steve Rutherford [Thu, 12 Jan 2017 02:28:29 +0000 (18:28 -0800)]
KVM: x86: Introduce segmented_write_std

commit 129a72a0d3c8e139a04512325384fe5ac119e74d upstream.

Introduces segemented_write_std.

Switches from emulated reads/writes to standard read/writes in fxsave,
fxrstor, sgdt, and sidt.  This fixes CVE-2017-2584, a longstanding
kernel memory leak.

Since commit 283c95d0e389 ("KVM: x86: emulate FXSAVE and FXRSTOR",
2016-11-09), which is luckily not yet in any final release, this would
also be an exploitable kernel memory *write*!

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 96051572c819194c37a8367624b285be10297eca
Fixes: 283c95d0e3891b64087706b344a4b545d04a6e62
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: emulate FXSAVE and FXRSTOR
Radim Krčmář [Wed, 9 Nov 2016 18:07:06 +0000 (19:07 +0100)]
KVM: x86: emulate FXSAVE and FXRSTOR

commit 283c95d0e3891b64087706b344a4b545d04a6e62 upstream.

Internal errors were reported on 16 bit fxsave and fxrstor with ipxe.
Old Intels don't have unrestricted_guest, so we have to emulate them.

The patch takes advantage of the hardware implementation.

AMD and Intel differ in saving and restoring other fields in first 32
bytes.  A test wrote 0xff to the fxsave area, 0 to upper bits of MCSXR
in the fxsave area, executed fxrstor, rewrote the fxsave area to 0xee,
and executed fxsave:

  Intel (Nehalem):
    7f 1f 7f 7f ff 00 ff 07 ff ff ff ff ff ff 00 00
    ff ff ff ff ff ff 00 00 ff ff 00 00 ff ff 00 00
  Intel (Haswell -- deprecated FPU CS and FPU DS):
    7f 1f 7f 7f ff 00 ff 07 ff ff ff ff 00 00 00 00
    ff ff ff ff 00 00 00 00 ff ff 00 00 ff ff 00 00
  AMD (Opteron 2300-series):
    7f 1f 7f 7f ff 00 ee ee ee ee ee ee ee ee ee ee
    ee ee ee ee ee ee ee ee ff ff 00 00 ff ff 02 00

fxsave/fxrstor will only be emulated on early Intels, so KVM can't do
much to improve the situation.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: add asm_safe wrapper
Radim Krčmář [Tue, 8 Nov 2016 19:54:18 +0000 (20:54 +0100)]
KVM: x86: add asm_safe wrapper

commit aabba3c6abd50b05b1fc2c6ec44244aa6bcda576 upstream.

Move the existing exception handling for inline assembly into a macro
and switch its return values to X86EMUL type.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: add Align16 instruction flag
Radim Krčmář [Tue, 8 Nov 2016 19:54:16 +0000 (20:54 +0100)]
KVM: x86: add Align16 instruction flag

commit d3fe959f81024072068e9ed86b39c2acfd7462a9 upstream.

Needed for FXSAVE and FXRSTOR.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: flush pending lapic jump label updates on module unload
David Matlack [Fri, 16 Dec 2016 22:30:36 +0000 (14:30 -0800)]
KVM: x86: flush pending lapic jump label updates on module unload

commit cef84c302fe051744b983a92764d3fcca933415d upstream.

KVM's lapic emulation uses static_key_deferred (apic_{hw,sw}_disabled).
These are implemented with delayed_work structs which can still be
pending when the KVM module is unloaded. We've seen this cause kernel
panics when the kvm_intel module is quickly reloaded.

Use the new static_key_deferred_flush() API to flush pending updates on
module unload.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agojump_labels: API for flushing deferred jump label updates
David Matlack [Fri, 16 Dec 2016 22:30:35 +0000 (14:30 -0800)]
jump_labels: API for flushing deferred jump label updates

commit b6416e61012429e0277bd15a229222fd17afc1c1 upstream.

Modules that use static_key_deferred need a way to synchronize with
any delayed work that is still pending when the module is unloaded.
Introduce static_key_deferred_flush() which flushes any pending
jump label updates.

Signed-off-by: David Matlack <dmatlack@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: eventfd: fix NULL deref irqbypass consumer
Wanpeng Li [Fri, 6 Jan 2017 01:39:42 +0000 (17:39 -0800)]
KVM: eventfd: fix NULL deref irqbypass consumer

commit 4f3dbdf47e150016aacd734e663347fcaa768303 upstream.

Reported syzkaller:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    PGD 0

    Oops: 0002 [#1] SMP
    CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
    Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
    task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
    RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    Call Trace:
     irqfd_shutdown+0x66/0xa0 [kvm]
     process_one_work+0x16b/0x480
     worker_thread+0x4b/0x500
     kthread+0x101/0x140
     ? process_one_work+0x480/0x480
     ? kthread_create_on_node+0x60/0x60
     ret_from_fork+0x25/0x30
    RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
    CR2: 0000000000000008

The syzkaller folks reported a NULL pointer dereference that due to
unregister an consumer which fails registration before. The syzkaller
creates two VMs w/ an equal eventfd occasionally. So the second VM
fails to register an irqbypass consumer. It will make irqfd as inactive
and queue an workqueue work to shutdown irqfd and unregister the irqbypass
consumer when eventfd is closed. However, the second consumer has been
initialized though it fails registration. So the token(same as the first
VM's) is taken to unregister the consumer through the workqueue, the
consumer of the first VM is found and unregistered, then NULL deref incurred
in the path of deleting consumer from the consumers list.

This patch fixes it by making irq_bypass_register/unregister_consumer()
looks for the consumer entry based on consumer pointer itself instead of
token matching.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: fix emulation of "MOV SS, null selector"
Paolo Bonzini [Thu, 12 Jan 2017 14:02:32 +0000 (15:02 +0100)]
KVM: x86: fix emulation of "MOV SS, null selector"

commit 33ab91103b3415e12457e3104f0e4517ce12d0f3 upstream.

This is CVE-2017-2583.  On Intel this causes a failed vmentry because
SS's type is neither 3 nor 7 (even though the manual says this check is
only done for usable SS, and the dmesg splat says that SS is unusable!).
On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.

The fix fabricates a data segment descriptor when SS is set to a null
selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
this in turn ensures CPL < 3 because RPL must be equal to CPL.

Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
the bug and deciphering the manuals.

Reported-by: Xiaohan Zhang <zhangxiaohan1@huawei.com>
Fixes: 79d5b4c3cd809c770d4bf9812635647016c56011
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm/hugetlb.c: fix reservation race when freeing surplus pages
Mike Kravetz [Wed, 11 Jan 2017 00:58:27 +0000 (16:58 -0800)]
mm/hugetlb.c: fix reservation race when freeing surplus pages

commit e5bbc8a6c992901058bc09e2ce01d16c111ff047 upstream.

return_unused_surplus_pages() decrements the global reservation count,
and frees any unused surplus pages that were backing the reservation.

Commit 7848a4bf51b3 ("mm/hugetlb.c: add cond_resched_lock() in
return_unused_surplus_pages()") added a call to cond_resched_lock in the
loop freeing the pages.

As a result, the hugetlb_lock could be dropped, and someone else could
use the pages that will be freed in subsequent iterations of the loop.
This could result in inconsistent global hugetlb page state, application
api failures (such as mmap) failures or application crashes.

When dropping the lock in return_unused_surplus_pages, make sure that
the global reservation count (resv_huge_pages) remains sufficiently
large to prevent someone else from claiming pages about to be freed.

Analyzed by Paul Cassella.

Fixes: 7848a4bf51b3 ("mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()")
Link: http://lkml.kernel.org/r/1483991767-6879-1-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Paul Cassella <cassella@cray.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoocfs2: fix crash caused by stale lvb with fsdlm plugin
Eric Ren [Wed, 11 Jan 2017 00:57:33 +0000 (16:57 -0800)]
ocfs2: fix crash caused by stale lvb with fsdlm plugin

commit e7ee2c089e94067d68475990bdeed211c8852917 upstream.

The crash happens rather often when we reset some cluster nodes while
nodes contend fiercely to do truncate and append.

The crash backtrace is below:

   dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources
   dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms
   ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18)
   ocfs2: End replay journal (node 318952601, slot 2) on device (253,18)
   ocfs2: Beginning quota recovery on device (253,18) for slot 2
   ocfs2: Finishing quota recovery on device (253,18) for slot 2
   (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
   (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1
   ------------[ cut here ]------------
   kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470!
   invalid opcode: 0000 [#1] SMP
   Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod    iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport      joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix               drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd       usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
   Supported: No, Unsupported modules are loaded
   CPU: 1 PID: 30154 Comm: truncate Tainted: G           OE   N  4.4.21-69-default #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
   task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000
   RIP: 0010:[<ffffffffa05c8c30>]  [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
   RSP: 0018:ffff880074e6bd50  EFLAGS: 00010282
   RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000
   RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
   RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414
   R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448
   R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020
   FS:  00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
   CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0
   Call Trace:
     ocfs2_setattr+0x698/0xa90 [ocfs2]
     notify_change+0x1ae/0x380
     do_truncate+0x5e/0x90
     do_sys_ftruncate.constprop.11+0x108/0x160
     entry_SYSCALL_64_fastpath+0x12/0x6d
   Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
   RIP  [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]

It's because ocfs2_inode_lock() get us stale LVB in which the i_size is
not equal to the disk i_size.  We mistakenly trust the LVB because the
underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with
DLM_SBF_VALNOTVALID properly for us.  But, why?

The current code tries to downconvert lock without DLM_LKF_VALBLK flag
to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even
if the lock resource type needs LVB.  This is not the right way for
fsdlm.

The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on
DLM_LKF_VALBLK to decide if we care about the LVB in the LKB.  If
DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from
this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node
failure happens.

The following diagram briefly illustrates how this crash happens:

RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB;

The 1st round:

             Node1                                    Node2
RSB1: PR
                                                  RSB1(master): NULL->EX
ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
  ocfs2_dlm_lock(no DLM_LKF_VALBLK)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

dlm_lock(no DLM_LKF_VALBLK)
  convert_lock(overwrite lkb->lkb_exflags
               with no DLM_LKF_VALBLK)

RSB1: NULL                                        RSB1: EX
                                                  reset Node2
dlm_recover_rsbs()
  recover_lvb()

/* The LVB is not trustable if the node with EX fails and
 * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
 */

 if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
           return;                   * to invalid the LVB here.
                                     */

The 2nd round:

         Node 1                                Node2
RSB1(become master from recovery)

ocfs2_setattr()
  ocfs2_inode_lock(NULL->EX)
    /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */
    ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */
  ocfs2_truncate_file()
      mlog_bug_on_msg(disk isize != i_size_read(inode))  /* crash! */

The fix is quite straightforward.  We keep to set DLM_LKF_VALBLK flag
for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin
is uesed.

Link: http://lkml.kernel.org/r/1481275846-6604-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
Dan Williams [Wed, 11 Jan 2017 00:57:36 +0000 (16:57 -0800)]
mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}

commit f931ab479dd24cf7a2c6e2df19778406892591fb upstream.

Both arch_add_memory() and arch_remove_memory() expect a single threaded
context.

For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does
not hold any locks over this check and branch:

    if (pgd_val(*pgd)) {
     pud = (pud_t *)pgd_page_vaddr(*pgd);
     paddr_last = phys_pud_init(pud, __pa(vaddr),
        __pa(vaddr_end),
        page_size_mask);
     continue;
    }

    pud = alloc_low_page();
    paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end),
        page_size_mask);

The result is that two threads calling devm_memremap_pages()
simultaneously can end up colliding on pgd initialization.  This leads
to crash signatures like the following where the loser of the race
initializes the wrong pgd entry:

    BUG: unable to handle kernel paging request at ffff888ebfff0000
    IP: memcpy_erms+0x6/0x10
    PGD 2f8e8fc067 PUD 0 /* <---- Invalid PUD */
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
    CPU: 54 PID: 3818 Comm: systemd-udevd Not tainted 4.6.7+ #13
    task: ffff882fac290040 ti: ffff882f887a4000 task.ti: ffff882f887a4000
    RIP: memcpy_erms+0x6/0x10
    [..]
    Call Trace:
      ? pmem_do_bvec+0x205/0x370 [nd_pmem]
      ? blk_queue_enter+0x3a/0x280
      pmem_rw_page+0x38/0x80 [nd_pmem]
      bdev_read_page+0x84/0xb0

Hold the standard memory hotplug mutex over calls to
arch_{add,remove}_memory().

Fixes: 41e94a851304 ("add devm_memremap_pages")
Link: http://lkml.kernel.org/r/148357647831.9498.12606007370121652979.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoselftests: do not require bash for the generated test
Rolf Eike Beer [Wed, 14 Dec 2016 10:59:34 +0000 (11:59 +0100)]
selftests: do not require bash for the generated test

commit a2b1e8a20c992b01eeb76de00d4f534cbe9f3822 upstream.

Nothing in this minimal script seems to require bash. We often run these
tests on embedded devices where the only shell available is the busybox
ash. Use sh instead.

Signed-off-by: Rolf Eike Beer <eb@emlix.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoselftests: do not require bash to run netsocktests testcase
Rolf Eike Beer [Wed, 14 Dec 2016 10:59:57 +0000 (11:59 +0100)]
selftests: do not require bash to run netsocktests testcase

commit 3659f98b5375d195f1870c3e508fe51e52206839 upstream.

Nothing in this minimal script seems to require bash. We often run these
tests on embedded devices where the only shell available is the busybox
ash. Use sh instead.

Signed-off-by: Rolf Eike Beer <eb@emlix.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoInput: i8042 - add Pegatron touchpad to noloop table
Marcos Paulo de Souza [Sun, 18 Dec 2016 23:26:12 +0000 (15:26 -0800)]
Input: i8042 - add Pegatron touchpad to noloop table

commit 41c567a5d7d1a986763e58c3394782813c3bcb03 upstream.

Avoid AUX loopback in Pegatron C15B touchpad, so input subsystem is able
to recognize a Synaptics touchpad in the AUX port.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=93791
(Touchpad is not detected on DNS 0801480 notebook (PEGATRON C15B))

Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoInput: xpad - use correct product id for x360w controllers
Pavel Rojtberg [Tue, 27 Dec 2016 19:44:51 +0000 (11:44 -0800)]
Input: xpad - use correct product id for x360w controllers

commit b6fc513da50c5dbc457a8ad6b58b046a6a68fd9d upstream.

currently the controllers get the same product id as the wireless
receiver. However the controllers actually have their own product id.

The patch makes the driver expose the same product id as the windows
driver.

This improves compatibility when running applications with WINE.

see https://github.com/paroj/xpad/issues/54

Signed-off-by: Pavel Rojtberg <rojtberg@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoLinux 4.4.43
Greg Kroah-Hartman [Sun, 15 Jan 2017 12:41:49 +0000 (13:41 +0100)]
Linux 4.4.43

7 years agomm/init: fix zone boundary creation
Oliver O'Halloran [Tue, 26 Jul 2016 22:22:17 +0000 (15:22 -0700)]
mm/init: fix zone boundary creation

commit 90cae1fe1c3540f791d5b8e025985fa5e699b2bb upstream.

As a part of memory initialisation the architecture passes an array to
free_area_init_nodes() which specifies the max PFN of each memory zone.
This array is not necessarily monotonic (due to unused zones) so this
array is parsed to build monotonic lists of the min and max PFN for each
zone.  ZONE_MOVABLE is special cased here as its limits are managed by
the mm subsystem rather than the architecture.  Unfortunately, this
special casing is broken when ZONE_MOVABLE is the not the last zone in
the zone list.  The core of the issue is:

if (i == ZONE_MOVABLE)
continue;
arch_zone_lowest_possible_pfn[i] =
arch_zone_highest_possible_pfn[i-1];

As ZONE_MOVABLE is skipped the lowest_possible_pfn of the next zone will
be set to zero.  This patch fixes this bug by adding explicitly tracking
where the next zone should start rather than relying on the contents
arch_zone_highest_possible_pfn[].

Thie is low priority.  To get bitten by this you need to enable a zone
that appears after ZONE_MOVABLE in the zone_type enum.  As far as I can
tell this means running a kernel with ZONE_DEVICE or ZONE_CMA enabled,
so I can't see this affecting too many people.

I only noticed this because I've been fiddling with ZONE_DEVICE on
powerpc and 4.6 broke my test kernel.  This bug, in conjunction with the
changes in Taku Izumi's kernelcore=mirror patch (d91749c1dda71) and
powerpc being the odd architecture which initialises max_zone_pfn[] to
~0ul instead of 0 caused all of system memory to be placed into
ZONE_DEVICE at boot, followed a panic since device memory cannot be used
for kernel allocations.  I've already submitted a patch to fix the
powerpc specific bits, but I figured this should be fixed too.

Link: http://lkml.kernel.org/r/1462435033-15601-1-git-send-email-oohall@gmail.com
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoALSA: usb-audio: Add a quirk for Plantronics BT600
Dennis Kadioglu [Mon, 9 Jan 2017 16:10:46 +0000 (17:10 +0100)]
ALSA: usb-audio: Add a quirk for Plantronics BT600

commit 2e40795c3bf344cfb5220d94566205796e3ef19a upstream.

Plantronics BT600 does not support reading the sample rate which leads
to many lines of "cannot get freq at ep 0x1" and "cannot get freq at
ep 0x82". This patch adds the USB ID of the BT600 to quirks.c and
avoids those error messages.

Signed-off-by: Dennis Kadioglu <denk@post.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospi: mvebu: fix baudrate calculation for armada variant
Uwe Kleine-König [Thu, 8 Dec 2016 16:37:08 +0000 (17:37 +0100)]
spi: mvebu: fix baudrate calculation for armada variant

commit 7243e0b20729d372e97763617a7a9c89f29b33e1 upstream.

The calculation of SPR and SPPR doesn't round correctly at several
places which might result in baud rates that are too big. For example
with tclk_hz = 250000001 and target rate 25000000 it determined a
divider of 10 which is wrong.

Instead of fixing all the corner cases replace the calculation by an
algorithm without a loop which should even be quicker to execute apart
from being correct.

Fixes: df59fa7f4bca ("spi: orion: support armada extended baud rates")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: OMAP4+: Fix bad fallthrough for cpuidle
Tony Lindgren [Mon, 7 Nov 2016 23:50:11 +0000 (16:50 -0700)]
ARM: OMAP4+: Fix bad fallthrough for cpuidle

commit cbf2642872333547b56b8c4d943f5ed04ac9a4ee upstream.

We don't want to fall through to a bunch of errors for retention
if PM_OMAP4_CPU_OSWR_DISABLE is not configured for a SoC.

Fixes: 6099dd37c669 ("ARM: OMAP5 / DRA7: Enable CPU RET on suspend")
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: zynq: Reserve correct amount of non-DMA RAM
Kyle Roeschley [Mon, 31 Oct 2016 16:26:17 +0000 (11:26 -0500)]
ARM: zynq: Reserve correct amount of non-DMA RAM

commit 7a3cc2a7b2c723aa552028f4e66841cec183756d upstream.

On Zynq, we haven't been reserving the correct amount of DMA-incapable
RAM to keep DMA away from it (per the Zynq TRM Section 4.1, it should be
the first 512k). In older kernels, this was masked by the
memblock_reserve call in arm_memblock_init(). Now, reserve the correct
amount excplicitly rather than relying on swapper_pg_dir, which is an
address and not a size anyway.

Fixes: 46f5b96 ("ARM: zynq: Reserve not DMAable space in front of the kernel")
Signed-off-by: Kyle Roeschley <kyle.roeschley@ni.com>
Tested-by: Nathan Rossi <nathan@nathanrossi.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowerpc: Fix build warning on 32-bit PPC
Larry Finger [Fri, 23 Dec 2016 03:06:53 +0000 (21:06 -0600)]
powerpc: Fix build warning on 32-bit PPC

commit 8ae679c4bc2ea2d16d92620da8e3e9332fa4039f upstream.

I am getting the following warning when I build kernel 4.9-git on my
PowerBook G4 with a 32-bit PPC processor:

    AS      arch/powerpc/kernel/misc_32.o
  arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]

This problem is evident after commit 989cea5c14be ("kbuild: prevent
lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
error that has been in the code since 2005 when this source file was
created.  That was with commit 9994a33865f4 ("powerpc: Introduce
entry_{32,64}.S, misc_{32,64}.S, systbl.S").

The offending line does not make a lot of sense.  This error does not
seem to cause any errors in the executable, thus I am not recommending
that it be applied to any stable versions.

Thanks to Nicholas Piggin for suggesting this solution.

Fixes: 9994a33865f4 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoALSA: firewire-tascam: Fix to handle error from initialization of stream data
Takashi Sakamoto [Tue, 3 Jan 2017 02:58:33 +0000 (11:58 +0900)]
ALSA: firewire-tascam: Fix to handle error from initialization of stream data

commit 6a2a2f45560a9cb7bc49820883b042e44f83726c upstream.

This module has a bug not to return error code in a case that data
structure for transmitted packets fails to be initialized.

This commit fixes the bug.

Fixes: 35efa5c489de ("ALSA: firewire-tascam: add streaming functionality")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoHID: hid-cypress: validate length of report
Greg Kroah-Hartman [Fri, 6 Jan 2017 14:33:36 +0000 (15:33 +0100)]
HID: hid-cypress: validate length of report

commit 1ebb71143758f45dc0fa76e2f48429e13b16d110 upstream.

Make sure we have enough of a report structure to validate before
looking at it.

Reported-by: Benoit Camredon <benoit.camredon@airbus.com>
Tested-by: Benoit Camredon <benoit.camredon@airbus.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
7 years agonet: vrf: do not allow table id 0
David Ahern [Tue, 10 Jan 2017 23:22:25 +0000 (15:22 -0800)]
net: vrf: do not allow table id 0

[ Upstream commit 24c63bbc18e25d5d8439422aa5fd2d66390b88eb ]

Frank reported that vrf devices can be created with a table id of 0.
This breaks many of the run time table id checks and should not be
allowed. Detect this condition at create time and fail with EINVAL.

Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
Reported-by: Frank Kellermann <frank.kellermann@atos.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: ipv4: Fix multipath selection with vrf
David Ahern [Tue, 10 Jan 2017 22:37:35 +0000 (14:37 -0800)]
net: ipv4: Fix multipath selection with vrf

[ Upstream commit 7a18c5b9fb31a999afc62b0e60978aa896fc89e9 ]

fib_select_path does not call fib_select_multipath if oif is set in the
flow struct. For VRF use cases oif is always set, so multipath route
selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif
check similar to what is done in fib_table_lookup.

Add saddr and proto to the flow struct for the fib lookup done by the
VRF driver to better match hash computation for a flow.

Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agogro: Disable frag0 optimization on IPv6 ext headers
Herbert Xu [Tue, 10 Jan 2017 20:24:15 +0000 (12:24 -0800)]
gro: Disable frag0 optimization on IPv6 ext headers

[ Upstream commit 57ea52a865144aedbcd619ee0081155e658b6f7d ]

The GRO fast path caches the frag0 address.  This address becomes
invalid if frag0 is modified by pskb_may_pull or its variants.
So whenever that happens we must disable the frag0 optimization.

This is usually done through the combination of gro_header_hard
and gro_header_slow, however, the IPv6 extension header path did
the pulling directly and would continue to use the GRO fast path
incorrectly.

This patch fixes it by disabling the fast path when we enter the
IPv6 extension header path.

Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agogro: use min_t() in skb_gro_reset_offset()
Eric Dumazet [Wed, 11 Jan 2017 03:52:43 +0000 (19:52 -0800)]
gro: use min_t() in skb_gro_reset_offset()

[ Upstream commit 7cfd5fd5a9813f1430290d20c0fead9b4582a307 ]

On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
so we shall use min_t() instead of min() to avoid a compiler error.

Fixes: 1272ce87fa01 ("gro: Enter slow-path if there is no tailroom")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agogro: Enter slow-path if there is no tailroom
Herbert Xu [Tue, 10 Jan 2017 20:24:01 +0000 (12:24 -0800)]
gro: Enter slow-path if there is no tailroom

[ Upstream commit 1272ce87fa017ca4cf32920764d879656b7a005a ]

The GRO path has a fast-path where we avoid calling pskb_may_pull
and pskb_expand by directly accessing frag0.  However, this should
only be done if we have enough tailroom in the skb as otherwise
we'll have to expand it later anyway.

This patch adds the check by capping frag0_len with the skb tailroom.

Fixes: cb18978cbf45 ("gro: Open-code final pskb_may_pull")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agor8152: fix rx issue for runtime suspend
hayeswang [Tue, 10 Jan 2017 09:04:07 +0000 (17:04 +0800)]
r8152: fix rx issue for runtime suspend

[ Upstream commit 75dc692eda114cb234a46cb11893a9c3ea520934 ]

Pause the rx and make sure the rx fifo is empty when the autosuspend
occurs.

If the rx data comes when the driver is canceling the rx urb, the host
controller would stop getting the data from the device and continue
it after next rx urb is submitted. That is, one continuing data is
split into two different urb buffers. That let the driver take the
data as a rx descriptor, and unexpected behavior happens.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agor8152: split rtl8152_suspend function
hayeswang [Tue, 10 Jan 2017 09:04:06 +0000 (17:04 +0800)]
r8152: split rtl8152_suspend function

[ Upstream commit 8fb280616878b81c0790a0c33acbeec59c5711f4 ]

Split rtl8152_suspend() into rtl8152_system_suspend() and
rtl8152_rumtime_suspend().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv4: Do not allow MAIN to be alias for new LOCAL w/ custom rules
Alexander Duyck [Mon, 2 Jan 2017 21:32:54 +0000 (13:32 -0800)]
ipv4: Do not allow MAIN to be alias for new LOCAL w/ custom rules

[ Upstream commit 5350d54f6cd12eaff623e890744c79b700bd3f17 ]

In the case of custom rules being present we need to handle the case of the
LOCAL table being intialized after the new rule has been added.  To address
that I am adding a new check so that we can make certain we don't use an
alias of MAIN for LOCAL when allocating a new table.

Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
Reported-by: Oliver Brunel <jjk@jjacky.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoigmp: Make igmp group member RFC 3376 compliant
Michal Tesar [Mon, 2 Jan 2017 13:38:36 +0000 (14:38 +0100)]
igmp: Make igmp group member RFC 3376 compliant

[ Upstream commit 7ababb782690e03b78657e27bd051e20163af2d6 ]

5.2. Action on Reception of a Query

 When a system receives a Query, it does not respond immediately.
 Instead, it delays its response by a random amount of time, bounded
 by the Max Resp Time value derived from the Max Resp Code in the
 received Query message.  A system may receive a variety of Queries on
 different interfaces and of different kinds (e.g., General Queries,
 Group-Specific Queries, and Group-and-Source-Specific Queries), each
 of which may require its own delayed response.

 Before scheduling a response to a Query, the system must first
 consider previously scheduled pending responses and in many cases
 schedule a combined response.  Therefore, the system must be able to
 maintain the following state:

 o A timer per interface for scheduling responses to General Queries.

 o A per-group and interface timer for scheduling responses to Group-
   Specific and Group-and-Source-Specific Queries.

 o A per-group and interface list of sources to be reported in the
   response to a Group-and-Source-Specific Query.

 When a new Query with the Router-Alert option arrives on an
 interface, provided the system has state to report, a delay for a
 response is randomly selected in the range (0, [Max Resp Time]) where
 Max Resp Time is derived from Max Resp Code in the received Query
 message.  The following rules are then used to determine if a Report
 needs to be scheduled and the type of Report to schedule.  The rules
 are considered in order and only the first matching rule is applied.

 1. If there is a pending response to a previous General Query
    scheduled sooner than the selected delay, no additional response
    needs to be scheduled.

 2. If the received Query is a General Query, the interface timer is
    used to schedule a response to the General Query after the
    selected delay.  Any previously pending response to a General
    Query is canceled.
--8<--

Currently the timer is rearmed with new random expiration time for
every incoming query regardless of possibly already pending report.
Which is not aligned with the above RFE.
It also might happen that higher rate of incoming queries can
postpone the report after the expiration time of the first query
causing group membership loss.

Now the per interface general query timer is rearmed only
when there is no pending report already scheduled on that interface or
the newly selected expiration time is before the already pending
scheduled report.

Signed-off-by: Michal Tesar <mtesar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrop_monitor: consider inserted data in genlmsg_end
Reiter Wolfgang [Tue, 3 Jan 2017 00:39:10 +0000 (01:39 +0100)]
drop_monitor: consider inserted data in genlmsg_end

[ Upstream commit 3b48ab2248e61408910e792fe84d6ec466084c1a ]

Final nlmsg_len field update must reflect inserted net_dm_drop_point
data.

This patch depends on previous patch:
"drop_monitor: add missing call to genlmsg_end"

Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrop_monitor: add missing call to genlmsg_end
Reiter Wolfgang [Sat, 31 Dec 2016 20:11:57 +0000 (21:11 +0100)]
drop_monitor: add missing call to genlmsg_end

[ Upstream commit 4200462d88f47f3759bdf4705f87e207b0f5b2e4 ]

Update nlmsg_len field with genlmsg_end to enable userspace processing
using nlmsg_next helper. Also adds error handling.

Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5: Avoid shadowing numa_node
Eli Cohen [Wed, 28 Dec 2016 12:58:34 +0000 (14:58 +0200)]
net/mlx5: Avoid shadowing numa_node

[ Upstream commit d151d73dcc99de87c63bdefebcc4cb69de1cdc40 ]

Avoid using a local variable named numa_node to avoid shadowing a public
one.

Fixes: db058a186f98 ('net/mlx5_core: Set irq affinity hints')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5: Check FW limitations on log_max_qp before setting it
Noa Osherovich [Wed, 28 Dec 2016 12:58:32 +0000 (14:58 +0200)]
net/mlx5: Check FW limitations on log_max_qp before setting it

[ Upstream commit 883371c453b937f9eb581fb4915210865982736f ]

When setting HCA capabilities, set log_max_qp to be the minimum
between the selected profile's value and the HCA limitation.

Fixes: 938fe83c8dcb ('net/mlx5_core: New device capabilities...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: stmmac: Fix race between stmmac_drv_probe and stmmac_open
Florian Fainelli [Wed, 28 Dec 2016 02:23:06 +0000 (18:23 -0800)]
net: stmmac: Fix race between stmmac_drv_probe and stmmac_open

[ Upstream commit 5701659004d68085182d2fd4199c79172165fa65 ]

There is currently a small window during which the network device registered by
stmmac can be made visible, yet all resources, including and clock and MDIO bus
have not had a chance to be set up, this can lead to the following error to
occur:

[  473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
                stmmac_dvr_probe: warning: cannot get CSR clock
[  473.919382] stmmaceth 0000:01:00.0: no reset control found
[  473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42
[  473.919429] stmmaceth 0000:01:00.0: DMA HW capability register supported
[  473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported
[  473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported
[  473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
                Enable RX Mitigation via HW Watchdog Timer
[  473.921395] libphy: PHY stmmac-1:00 not found
[  473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY
[  473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach to
                PHY (error: -19)
[  473.959710] libphy: stmmac: probed
[  473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL
                (stmmac-1:00) active
[  473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL
                (stmmac-1:01)
[  473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL
                (stmmac-1:02)
[  473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL
                (stmmac-1:03)

Fix this by making sure that register_netdev() is the last thing being done,
which guarantees that the clock and the MDIO bus are available.

Fixes: 4bfcbd7abce2 ("stmmac: Move the mdio_register/_unregister in probe/remove")
Reported-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet, sched: fix soft lockup in tc_classify
Daniel Borkmann [Wed, 21 Dec 2016 17:04:11 +0000 (18:04 +0100)]
net, sched: fix soft lockup in tc_classify

[ Upstream commit 628185cfddf1dfb701c4efe2cfd72cf5b09f5702 ]

Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.

What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.

This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.

This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.

tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.

Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").

Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipv6: handle -EFAULT from skb_copy_bits
Dave Jones [Thu, 22 Dec 2016 16:16:22 +0000 (11:16 -0500)]
ipv6: handle -EFAULT from skb_copy_bits

[ Upstream commit a98f91758995cb59611e61318dddd8a6956b52c3 ]

By setting certain socket options on ipv6 raw sockets, we can confuse the
length calculation in rawv6_push_pending_frames triggering a BUG_ON.

RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80

Call Trace:
 [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
 [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
 [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
 [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
 [<ffffffff816da27e>] SyS_sendto+0xe/0x10
 [<ffffffff81002910>] do_syscall_64+0x50/0xa0
 [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25

Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.

Reproducer:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#define LEN 504

int main(int argc, char* argv[])
{
int fd;
int zero = 0;
char buf[LEN];

memset(buf, 0, LEN);

fd = socket(AF_INET6, SOCK_RAW, 7);

setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);

sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
}

Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet: vrf: Drop conntrack data after pass through VRF device on Tx
David Ahern [Wed, 14 Dec 2016 22:31:11 +0000 (14:31 -0800)]
net: vrf: Drop conntrack data after pass through VRF device on Tx

[ Upstream commit eb63ecc1706b3e094d0f57438b6c2067cfc299f2 ]

Locally originated traffic in a VRF fails in the presence of a POSTROUTING
rule. For example,

    $ iptables -t nat -A POSTROUTING -s 11.1.1.0/24  -j MASQUERADE
    $ ping -I red -c1 11.1.1.3
    ping: Warning: source address might be selected on device other than red.
    PING 11.1.1.3 (11.1.1.3) from 11.1.1.2 red: 56(84) bytes of data.
    ping: sendmsg: Operation not permitted

Worse, the above causes random corruption resulting in a panic in random
places (I have not seen a consistent backtrace).

Call nf_reset to drop the conntrack info following the pass through the
VRF device.  The nf_reset is needed on Tx but not Rx because of the order
in which NF_HOOK's are hit: on Rx the VRF device is after the real ingress
device and on Tx it is is before the real egress device. Connection
tracking should be tied to the real egress device and not the VRF device.

Fixes: 8f58336d3f78a ("net: Add ethernet header for pass through VRF device")
Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoser_gigaset: return -ENOMEM on error instead of success
Dan Carpenter [Wed, 7 Dec 2016 11:22:03 +0000 (14:22 +0300)]
ser_gigaset: return -ENOMEM on error instead of success

[ Upstream commit 93a97c50cbf1c007caf12db5cc23e0d5b9c8473c ]

If we can't allocate the resources in gigaset_initdriver() then we
should return -ENOMEM instead of zero.

Fixes: 2869b23e4b95 ("[PATCH] drivers/isdn/gigaset: new M101 driver (v2)")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>