Darrick J. Wong [Wed, 26 Nov 2014 01:45:15 +0000 (17:45 -0800)]
dm bufio: fix memleak when using a dm_buffer's inline bio
commit
445559cdcb98a141f5de415b94fd6eaccab87e6d upstream.
When dm-bufio sets out to use the bio built into a struct dm_buffer to
issue an IO, it needs to call bio_reset after it's done with the bio
so that we can free things attached to the bio such as the integrity
payload. Therefore, inject our own endio callback to take care of
the bio_reset after calling submit_io's end_io callback.
Test case:
1. modprobe scsi_debug delay=0 dif=1 dix=199 ato=1 dev_size_mb=300
2. Set up a dm-bufio client, e.g. dm-verity, on the scsi_debug device
3. Repeatedly read metadata and watch kmalloc-192 leak!
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peng Tao [Mon, 17 Nov 2014 03:05:17 +0000 (11:05 +0800)]
nfs41: fix nfs4_proc_layoutget error handling
commit
4bd5a980de87d2b5af417485bde97b8eb3d6cf6a upstream.
nfs4_layoutget_release() drops layout hdr refcnt. Grab the refcnt
early so that it is safe to call .release in case nfs4_alloc_pages
fails.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Fixes: a47970ff78147 ("NFSv4.1: Hold reference to layout hdr in layoutget")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sumit.Saxena@avagotech.com [Mon, 17 Nov 2014 09:54:23 +0000 (15:24 +0530)]
megaraid_sas: corrected return of wait_event from abort frame path
commit
170c238701ec38b1829321b17c70671c101bac55 upstream.
Corrected wait_event() call which was waiting for wrong completion
status (0xFF).
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Baruch Siach [Mon, 22 Sep 2014 07:12:51 +0000 (10:12 +0300)]
mmc: block: add newline to sysfs display of force_ro
commit
0031a98a85e9fca282624bfc887f9531b2768396 upstream.
Make force_ro consistent with other sysfs entries.
Fixes: 371a689f64b0d ('mmc: MMC boot partitions support')
Cc: Andrei Warkentin <andrey.warkentin@gmail.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dmitry Eremin-Solenikov [Fri, 24 Oct 2014 17:19:57 +0000 (21:19 +0400)]
mfd: tc6393xb: Fail ohci suspend if full state restore is required
commit
1a5fb99de4850cba710d91becfa2c65653048589 upstream.
Some boards with TC6393XB chip require full state restore during system
resume thanks to chip's VCC being cut off during suspend (Sharp SL-6000
tosa is one of them). Failing to do so would result in ohci Oops on
resume due to internal memory contentes being changed. Fail ohci suspend
on tc6393xb is full state restore is required.
Recommended workaround is to unbind tmio-ohci driver before suspend and
rebind it after resume.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NeilBrown [Tue, 9 Sep 2014 04:13:51 +0000 (14:13 +1000)]
md/bitmap: always wait for writes on unplug.
commit
4b5060ddae2b03c5387321fafc089d242225697a upstream.
If two threads call bitmap_unplug at the same time, then
one might schedule all the writes, and the other might
decide that it doesn't need to wait. But really it does.
It rarely hurts to wait when it isn't absolutely necessary,
and the current code doesn't really focus on 'absolutely necessary'
anyway. So just wait always.
This can potentially lead to data corruption if a crash happens
at an awkward time and data was written before the bitmap was
updated. It is very unlikely, but this should go to -stable
just to be safe. Appropriate for any -stable.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Lutomirski [Sat, 6 Dec 2014 03:03:28 +0000 (19:03 -0800)]
x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit
commit
29fa6825463c97e5157284db80107d1bfac5d77b upstream.
paravirt_enabled has the following effects:
- Disables the F00F bug workaround warning. There is no F00F bug
workaround any more because Linux's standard IDT handling already
works around the F00F bug, but the warning still exists. This
is only cosmetic, and, in any event, there is no such thing as
KVM on a CPU with the F00F bug.
- Disables 32-bit APM BIOS detection. On a KVM paravirt system,
there should be no APM BIOS anyway.
- Disables tboot. I think that the tboot code should check the
CPUID hypervisor bit directly if it matters.
- paravirt_enabled disables espfix32. espfix32 should *not* be
disabled under KVM paravirt.
The last point is the purpose of this patch. It fixes a leak of the
high 16 bits of the kernel stack address on 32-bit KVM paravirt
guests. Fixes CVE-2014-8134.
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Lutomirski [Mon, 8 Dec 2014 21:55:20 +0000 (13:55 -0800)]
x86_64, switch_to(): Load TLS descriptors before switching DS and ES
commit
f647d7c155f069c1a068030255c300663516420e upstream.
Otherwise, if buggy user code points DS or ES into the TLS
array, they would be corrupted after a context switch.
This also significantly improves the comments and documents some
gotchas in the code.
Before this patch, the both tests below failed. With this
patch, the es test passes, although the gsbase test still fails.
----- begin es test -----
/*
* Copyright (c) 2014 Andy Lutomirski
* GPL v2
*/
static unsigned short GDT3(int idx)
{
return (idx << 3) | 3;
}
static int create_tls(int idx, unsigned int base)
{
struct user_desc desc = {
.entry_number = idx,
.base_addr = base,
.limit = 0xfffff,
.seg_32bit = 1,
.contents = 0, /* Data, grow-up */
.read_exec_only = 0,
.limit_in_pages = 1,
.seg_not_present = 0,
.useable = 0,
};
if (syscall(SYS_set_thread_area, &desc) != 0)
err(1, "set_thread_area");
return desc.entry_number;
}
int main()
{
int idx = create_tls(-1, 0);
printf("Allocated GDT index %d\n", idx);
unsigned short orig_es;
asm volatile ("mov %%es,%0" : "=rm" (orig_es));
int errors = 0;
int total = 1000;
for (int i = 0; i < total; i++) {
asm volatile ("mov %0,%%es" : : "rm" (GDT3(idx)));
usleep(100);
unsigned short es;
asm volatile ("mov %%es,%0" : "=rm" (es));
asm volatile ("mov %0,%%es" : : "rm" (orig_es));
if (es != GDT3(idx)) {
if (errors == 0)
printf("[FAIL]\tES changed from 0x%hx to 0x%hx\n",
GDT3(idx), es);
errors++;
}
}
if (errors) {
printf("[FAIL]\tES was corrupted %d/%d times\n", errors, total);
return 1;
} else {
printf("[OK]\tES was preserved\n");
return 0;
}
}
----- end es test -----
----- begin gsbase test -----
/*
* gsbase.c, a gsbase test
* Copyright (c) 2014 Andy Lutomirski
* GPL v2
*/
static unsigned char *testptr, *testptr2;
static unsigned char read_gs_testvals(void)
{
unsigned char ret;
asm volatile ("movb %%gs:%1, %0" : "=r" (ret) : "m" (*testptr));
return ret;
}
int main()
{
int errors = 0;
testptr = mmap((void *)0x200000000UL, 1, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
if (testptr == MAP_FAILED)
err(1, "mmap");
testptr2 = mmap((void *)0x300000000UL, 1, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
if (testptr2 == MAP_FAILED)
err(1, "mmap");
*testptr = 0;
*testptr2 = 1;
if (syscall(SYS_arch_prctl, ARCH_SET_GS,
(unsigned long)testptr2 - (unsigned long)testptr) != 0)
err(1, "ARCH_SET_GS");
usleep(100);
if (read_gs_testvals() == 1) {
printf("[OK]\tARCH_SET_GS worked\n");
} else {
printf("[FAIL]\tARCH_SET_GS failed\n");
errors++;
}
asm volatile ("mov %0,%%gs" : : "r" (0));
if (read_gs_testvals() == 0) {
printf("[OK]\tWriting 0 to gs worked\n");
} else {
printf("[FAIL]\tWriting 0 to gs failed\n");
errors++;
}
usleep(100);
if (read_gs_testvals() == 0) {
printf("[OK]\tgsbase is still zero\n");
} else {
printf("[FAIL]\tgsbase was corrupted\n");
errors++;
}
return errors == 0 ? 0 : 1;
}
----- end gsbase test -----
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/509d27c9fec78217691c3dad91cec87e1006b34a.1418075657.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Lutomirski [Fri, 5 Dec 2014 00:48:17 +0000 (16:48 -0800)]
x86/tls: Disallow unusual TLS segments
commit
0e58af4e1d2166e9e33375a0f121e4867010d4f8 upstream.
Users have no business installing custom code segments into the
GDT, and segments that are not present but are otherwise valid
are a historical source of interesting attacks.
For completeness, block attempts to set the L bit. (Prior to
this patch, the L bit would have been silently dropped.)
This is an ABI break. I've checked glibc, musl, and Wine, and
none of them look like they'll have any trouble.
Note to stable maintainers: this is a hardening patch that fixes
no known bugs. Given the possibility of ABI issues, this
probably shouldn't be backported quickly.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Lutomirski [Fri, 5 Dec 2014 00:48:16 +0000 (16:48 -0800)]
x86/tls: Validate TLS entries to protect espfix
commit
41bdc78544b8a93a9c6814b8bbbfef966272abbe upstream.
Installing a 16-bit RW data segment into the GDT defeats espfix.
AFAICT this will not affect glibc, Wine, or dosemu at all.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jan Kara [Mon, 15 Dec 2014 13:22:46 +0000 (14:22 +0100)]
isofs: Fix infinite looping over CE entries
commit
f54e18f1b831c92f6512d2eedb224cd63d607d3d upstream.
Rock Ridge extensions define so called Continuation Entries (CE) which
define where is further space with Rock Ridge data. Corrupted isofs
image can contain arbitrarily long chain of these, including a one
containing loop and thus causing kernel to end in an infinite loop when
traversing these entries.
Limit the traversal to 32 entries which should be more than enough space
to store all the Rock Ridge data.
Reported-by: P J P <ppandit@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Tue, 16 Dec 2014 17:09:56 +0000 (09:09 -0800)]
Linux 3.10.63
Takashi Iwai [Sat, 6 Dec 2014 17:02:55 +0000 (18:02 +0100)]
ALSA: usb-audio: Don't resubmit pending URBs at MIDI error recovery
commit
66139a48cee1530c91f37c145384b4ee7043f0b7 upstream.
In snd_usbmidi_error_timer(), the driver tries to resubmit MIDI input
URBs to reactivate the MIDI stream, but this causes the error when
some of URBs are still pending like:
WARNING: CPU: 0 PID: 0 at ../drivers/usb/core/urb.c:339 usb_submit_urb+0x5f/0x70()
URB
ef705c40 submitted while active
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.6-2-desktop #1
Hardware name: FOXCONN TPS01/TPS01, BIOS 080015 03/23/2010
c0984bfa f4009ed4 c078deaf f4009ee4 c024c884 c09a135c f4009f00 00000000
c0984bfa 00000153 c061ac4f c061ac4f 00000009 00000001 ef705c40 e854d1c0
f4009eec c024c8d3 00000009 f4009ee4 c09a135c f4009f00 f4009f04 c061ac4f
Call Trace:
[<
c0205df6>] try_stack_unwind+0x156/0x170
[<
c020482a>] dump_trace+0x5a/0x1b0
[<
c0205e56>] show_trace_log_lvl+0x46/0x50
[<
c02049d1>] show_stack_log_lvl+0x51/0xe0
[<
c0205eb7>] show_stack+0x27/0x50
[<
c078deaf>] dump_stack+0x45/0x65
[<
c024c884>] warn_slowpath_common+0x84/0xa0
[<
c024c8d3>] warn_slowpath_fmt+0x33/0x40
[<
c061ac4f>] usb_submit_urb+0x5f/0x70
[<
f7974104>] snd_usbmidi_submit_urb+0x14/0x60 [snd_usbmidi_lib]
[<
f797483a>] snd_usbmidi_error_timer+0x6a/0xa0 [snd_usbmidi_lib]
[<
c02570c0>] call_timer_fn+0x30/0x130
[<
c0257442>] run_timer_softirq+0x1c2/0x260
[<
c0251493>] __do_softirq+0xc3/0x270
[<
c0204732>] do_softirq_own_stack+0x22/0x30
[<
c025186d>] irq_exit+0x8d/0xa0
[<
c0795228>] smp_apic_timer_interrupt+0x38/0x50
[<
c0794a3c>] apic_timer_interrupt+0x34/0x3c
[<
c0673d9e>] cpuidle_enter_state+0x3e/0xd0
[<
c028bb8d>] cpu_idle_loop+0x29d/0x3e0
[<
c028bd23>] cpu_startup_entry+0x53/0x60
[<
c0bfac1e>] start_kernel+0x415/0x41a
For avoiding these errors, check the pending URBs and skip
resubmitting such ones.
Reported-and-tested-by: Stefan Seyfried <stefan.seyfried@googlemail.com>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Anton Blanchard [Wed, 26 Nov 2014 21:11:28 +0000 (08:11 +1100)]
powerpc: 32 bit getcpu VDSO function uses 64 bit instructions
commit
152d44a853e42952f6c8a504fb1f8eefd21fd5fd upstream.
I used some 64 bit instructions when adding the 32 bit getcpu VDSO
function. Fix it.
Fixes: 18ad51dd342a ("powerpc: Add VDSO version of getcpu")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stephen Boyd [Mon, 17 Jun 2013 22:40:58 +0000 (15:40 -0700)]
ARM: sched_clock: Load cycle count after epoch stabilizes
commit
336ae1180df5f69b9e0fb6561bec01c5f64361cf upstream.
There is a small race between when the cycle count is read from
the hardware and when the epoch stabilizes. Consider this
scenario:
CPU0 CPU1
---- ----
cyc = read_sched_clock()
cyc_to_sched_clock()
update_sched_clock()
...
cd.epoch_cyc = cyc;
epoch_cyc = cd.epoch_cyc;
...
epoch_ns + cyc_to_ns((cyc - epoch_cyc)
The cyc on cpu0 was read before the epoch changed. But we
calculate the nanoseconds based on the new epoch by subtracting
the new epoch from the old cycle count. Since epoch is most likely
larger than the old cycle count we calculate a large number that
will be converted to nanoseconds and added to epoch_ns, causing
time to jump forward too much.
Fix this problem by reading the hardware after the epoch has
stabilized.
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Todd Fujinaka [Tue, 17 Jun 2014 06:58:11 +0000 (06:58 +0000)]
igb: bring link up when PHY is powered up
commit
aec653c43b0c55667355e26d7de1236bda9fb4e3 upstream.
Call igb_setup_link() when the PHY is powered up.
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Reported-by: Jeff Westfahl <jeff.westfahl@ni.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Vincent Donnefort <vdonnefort@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jan Kara [Tue, 3 Dec 2013 10:20:06 +0000 (11:20 +0100)]
ext2: Fix oops in ext2_get_block() called from ext2_quota_write()
commit
df4e7ac0bb70abc97fbfd9ef09671fc084b3f9db upstream.
ext2_quota_write() doesn't properly setup bh it passes to
ext2_get_block() and thus we hit assertion BUG_ON(maxblocks == 0) in
ext2_get_blocks() (or we could actually ask for mapping arbitrary number
of blocks depending on whatever value was on stack).
Fix ext2_quota_write() to properly fill in number of blocks to map.
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nadav Har'El [Mon, 5 Aug 2013 08:07:17 +0000 (11:07 +0300)]
nEPT: Nested INVEPT
commit
bfd0a56b90005f8c8a004baf407ad90045c2b11e upstream.
If we let L1 use EPT, we should probably also support the INVEPT instruction.
In our current nested EPT implementation, when L1 changes its EPT table
for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in
the course of this modification already calls INVEPT. But if last level
of shadow page is unsync not all L1's changes to EPT12 are intercepted,
which means roots need to be synced when L1 calls INVEPT. Global INVEPT
should not be different since roots are synced by kvm_mmu_load() each
time EPTP02 changes.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2:
- Adjust context, filename
- Simplify handle_invept() as recommended by Paolo - nEPT is not
supported so we always raise #UD]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Vinson Lee <vlee@twopensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Borkmann [Wed, 3 Dec 2014 11:13:58 +0000 (12:13 +0100)]
net: sctp: use MAX_HEADER for headroom reserve in output path
[ Upstream commit
9772b54c55266ce80c639a80aa68eeb908f8ecf5 ]
To accomodate for enough headroom for tunnels, use MAX_HEADER instead
of LL_MAX_HEADER. Robert reported that he has hit after roughly 40hrs
of trinity an skb_under_panic() via SCTP output path (see reference).
I couldn't reproduce it from here, but not using MAX_HEADER as elsewhere
in other protocols might be one possible cause for this.
In any case, it looks like accounting on chunks themself seems to look
good as the skb already passed the SCTP output path and did not hit
any skb_over_panic(). Given tunneling was enabled in his .config, the
headroom would have been expanded by MAX_HEADER in this case.
Reported-by: Robert Święcki <robert@swiecki.net>
Reference: https://lkml.org/lkml/2014/12/1/507
Fixes: 594ccc14dfe4d ("[SCTP] Replace incorrect use of dev_alloc_skb with alloc_skb in sctp_packet_transmit().")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
willy tarreau [Tue, 2 Dec 2014 07:13:04 +0000 (08:13 +0100)]
net: mvneta: fix Tx interrupt delay
[ Upstream commit
aebea2ba0f7495e1a1c9ea5e753d146cb2f6b845 ]
The mvneta driver sets the amount of Tx coalesce packets to 16 by
default. Normally that does not cause any trouble since the driver
uses a much larger Tx ring size (532 packets). But some sockets
might run with very small buffers, much smaller than the equivalent
of 16 packets. This is what ping is doing for example, by setting
SNDBUF to 324 bytes rounded up to 2kB by the kernel.
The problem is that there is no documented method to force a specific
packet to emit an interrupt (eg: the last of the ring) nor is it
possible to make the NIC emit an interrupt after a given delay.
In this case, it causes trouble, because when ping sends packets over
its raw socket, the few first packets leave the system, and the first
15 packets will be emitted without an IRQ being generated, so without
the skbs being freed. And since the socket's buffer is small, there's
no way to reach that amount of packets, and the ping ends up with
"send: no buffer available" after sending 6 packets. Running with 3
instances of ping in parallel is enough to hide the problem, because
with 6 packets per instance, that's 18 packets total, which is enough
to grant a Tx interrupt before all are sent.
The original driver in the LSP kernel worked around this design flaw
by using a software timer to clean up the Tx descriptors. This timer
was slow and caused terrible network performance on some Tx-bound
workloads (such as routing) but was enough to make tools like ping
work correctly.
Instead here, we simply set the packet counts before interrupt to 1.
This ensures that each packet sent will produce an interrupt. NAPI
takes care of coalescing interrupts since the interrupt is disabled
once generated.
No measurable performance impact nor CPU usage were observed on small
nor large packets, including when saturating the link on Tx, and this
fixes tools like ping which rely on too small a send buffer. If one
wants to increase this value for certain workloads where it is safe
to do so, "ethtool -C $dev tx-frames" will override this default
setting.
This fix needs to be applied to stable kernels starting with 3.10.
Tested-By: Maggie Mae Roxas <maggie.mae.roxas@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicolas Dichtel [Thu, 27 Nov 2014 09:16:15 +0000 (10:16 +0100)]
rtnetlink: release net refcnt on error in do_setlink()
[ Upstream commit
e0ebde0e131b529fd721b24f62872def5ec3718c ]
rtnl_link_get_net() holds a reference on the 'struct net', we need to release
it in case of error.
CC: Eric W. Biederman <ebiederm@xmission.com>
Fixes: b51642f6d77b ("net: Enable a userns root rtnl calls that are safe for unprivilged users")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jack Morgenstein [Tue, 25 Nov 2014 09:54:31 +0000 (11:54 +0200)]
net/mlx4_core: Limit count field to 24 bits in qp_alloc_res
[ Upstream commit
2d5c57d7fbfaa642fb7f0673df24f32b83d9066c ]
Some VF drivers use the upper byte of "param1" (the qp count field)
in mlx4_qp_reserve_range() to pass flags which are used to optimize
the range allocation.
Under the current code, if any of these flags are set, the 32-bit
count field yields a count greater than 2^24, which is out of range,
and this VF fails.
As these flags represent a "best-effort" allocation hint anyway, they may
safely be ignored. Therefore, the PF driver may simply mask out the bits.
Fixes: c82e9aa0a8 "mlx4_core: resource tracking for HCA resources used by guests"
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thadeu Lima de Souza Cascardo [Tue, 25 Nov 2014 16:21:11 +0000 (14:21 -0200)]
tg3: fix ring init when there are more TX than RX channels
[ Upstream commit
a620a6bc1c94c22d6c312892be1e0ae171523125 ]
If TX channels are set to 4 and RX channels are set to less than 4,
using ethtool -L, the driver will try to initialize more RX channels
than it has allocated, causing an oops.
This fix only initializes the RX ring if it has been allocated.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yuri Chislov [Mon, 24 Nov 2014 10:25:15 +0000 (11:25 +0100)]
ipv6: gre: fix wrong skb->protocol in WCCP
[ Upstream commit
be6572fdb1bfbe23b2624d477de50af50b02f5d6 ]
When using GRE redirection in WCCP, it sets the wrong skb->protocol,
that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic.
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Yuri Chislov <yuri.chislov@gmail.com>
Tested-by: Yuri Chislov <yuri.chislov@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dmitry Torokhov [Fri, 14 Nov 2014 21:39:05 +0000 (13:39 -0800)]
sata_fsl: fix error handling of irq_of_parse_and_map
commit
aad0b624129709c94c2e19e583b6053520353fa8 upstream.
irq_of_parse_and_map() returns 0 on error (the result is unsigned int),
so testing for negative result never works.
Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tejun Heo [Thu, 4 Dec 2014 18:13:28 +0000 (13:13 -0500)]
ahci: disable MSI on SAMSUNG 0xa800 SSD
commit
2b21ef0aae65f22f5ba86b13c4588f6f0c2dbefb upstream.
Just like 0x1600 which got blacklisted by
66a7cbc303f4 ("ahci: disable
MSI instead of NCQ on Samsung pci-e SSDs on macbooks"), 0xa800 chokes
on NCQ commands if MSI is enabled. Disable MSI.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dominik Mierzejewski <dominik@greysector.net>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=89171
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Devin Ryles [Fri, 7 Nov 2014 22:59:05 +0000 (17:59 -0500)]
AHCI: Add DeviceIDs for Sunrise Point-LP SATA controller
commit
249cd0a187ed4ef1d0af7f74362cc2791ec5581b upstream.
This patch adds DeviceIDs for Sunrise Point-LP.
Signed-off-by: Devin Ryles <devin.ryles@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sakari Ailus [Thu, 6 Nov 2014 20:49:45 +0000 (17:49 -0300)]
media: smiapp: Only some selection targets are settable
commit
b31eb901c4e5eeef4c83c43dfbc7fe0d4348cb21 upstream.
Setting a non-settable selection target caused BUG() to be called. The check
for valid selections only takes the selection target into account, but does
not tell whether it may be set, or only get. Fix the issue by simply
returning an error to the user.
Signed-off-by: Sakari Ailus <sakari.ailus@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Vetter [Mon, 1 Dec 2014 16:56:54 +0000 (17:56 +0100)]
drm/i915: Unlock panel even when LVDS is disabled
commit
b0616c5306b342ceca07044dbc4f917d95c4f825 upstream.
Otherwise we'll have backtraces in assert_panel_unlocked because the
BIOS locks the register. In the reporter's case this regression was
introduced in
commit
c31407a3672aaebb4acddf90944a114fa5c8af7b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Oct 18 21:07:01 2012 +0100
drm/i915: Add no-lvds quirk for Supermicro X7SPA-H
Reported-by: Alexey Orishko <alexey.orishko@gmail.com>
Cc: Alexey Orishko <alexey.orishko@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Francois Tigeot <ftigeot@wolfpond.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Tested-by: Alexey Orishko <alexey.orishko@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Petr Mladek [Thu, 27 Nov 2014 15:57:21 +0000 (16:57 +0100)]
drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6
commit
f5475cc43c899e33098d4db44b7c5e710f16589d upstream.
I was unable too boot 3.18.0-rc6 because of the following kernel
panic in drm_calc_vbltimestamp_from_scanoutpos():
[drm] Initialized drm 1.1.0
20060810
[drm] radeon kernel modesetting enabled.
[drm] initializing kernel modesetting (RV100 0x1002:0x515E 0x15D9:0x8080).
[drm] register mmio base: 0xC8400000
[drm] register mmio size: 65536
radeon 0000:0b:01.0: VRAM: 128M 0x00000000D0000000 - 0x00000000D7FFFFFF (16M used)
radeon 0000:0b:01.0: GTT: 512M 0x00000000B0000000 - 0x00000000CFFFFFFF
[drm] Detected VRAM RAM=128M, BAR=128M
[drm] RAM width 16bits DDR
[TTM] Zone kernel: Available graphics memory:
3829346 kiB
[TTM] Zone dma32: Available graphics memory:
2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
[drm] radeon: 16M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] PCI GART of 512M enabled (table at 0x0000000037880000).
radeon 0000:0b:01.0: WB disabled
radeon 0000:0b:01.0: fence driver on ring 0 use gpu addr 0x00000000b0000000 and cpu addr 0xffff8800bbbfa000
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[drm] Driver supports precise vblank timestamp query.
[drm] radeon: irq initialized.
[drm] Loading R100 Microcode
radeon 0000:0b:01.0: Direct firmware load for radeon/R100_cp.bin failed with error -2
radeon_cp: Failed to load firmware "radeon/R100_cp.bin"
[drm:r100_cp_init] *ERROR* Failed to load firmware!
radeon 0000:0b:01.0: failed initializing CP (-2).
radeon 0000:0b:01.0: Disabling GPU acceleration
[drm] radeon: cp finalized
BUG: unable to handle kernel NULL pointer dereference at
000000000000025c
IP: [<
ffffffff8150423b>] drm_calc_vbltimestamp_from_scanoutpos+0x4b/0x320
PGD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc6-4-default #2649
Hardware name: Supermicro X7DB8/X7DB8, BIOS 6.00 07/26/2006
task:
ffff880234da2010 ti:
ffff880234da4000 task.ti:
ffff880234da4000
RIP: 0010:[<
ffffffff8150423b>] [<
ffffffff8150423b>] drm_calc_vbltimestamp_from_scanoutpos+0x4b/0x320
RSP: 0000:
ffff880234da7918 EFLAGS:
00010086
RAX:
ffffffff81557890 RBX:
0000000000000000 RCX:
ffff880234da7a48
RDX:
ffff880234da79f4 RSI:
0000000000000000 RDI:
ffff880232e15000
RBP:
ffff880234da79b8 R08:
0000000000000000 R09:
0000000000000000
R10:
000000000000000a R11:
0000000000000001 R12:
ffff880232dda1c0
R13:
ffff880232e1518c R14:
0000000000000292 R15:
ffff880232e15000
FS:
0000000000000000(0000) GS:
ffff88023fc40000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
000000000000025c CR3:
0000000002014000 CR4:
00000000000007e0
Stack:
ffff880234da79d8 0000000000000286 ffff880232dcbc00 0000000000002480
ffff880234da7958 0000000000000296 ffff880234da7998 ffffffff8151b51d
ffff880234da7a48 0000000032dcbeb0 ffff880232dcbc00 ffff880232dcbc58
Call Trace:
[<
ffffffff8151b51d>] ? drm_vma_offset_remove+0x1d/0x110
[<
ffffffff8152dc98>] radeon_get_vblank_timestamp_kms+0x38/0x60
[<
ffffffff8152076a>] ? ttm_bo_release_list+0xba/0x180
[<
ffffffff81503751>] drm_get_last_vbltimestamp+0x41/0x70
[<
ffffffff81503933>] vblank_disable_and_save+0x73/0x1d0
[<
ffffffff81106b2f>] ? try_to_del_timer_sync+0x4f/0x70
[<
ffffffff81505245>] drm_vblank_cleanup+0x65/0xa0
[<
ffffffff815604fa>] radeon_irq_kms_fini+0x1a/0x70
[<
ffffffff8156c07e>] r100_init+0x26e/0x410
[<
ffffffff8152ae3e>] radeon_device_init+0x7ae/0xb50
[<
ffffffff8152d57f>] radeon_driver_load_kms+0x8f/0x210
[<
ffffffff81506965>] drm_dev_register+0xb5/0x110
[<
ffffffff8150998f>] drm_get_pci_dev+0x8f/0x200
[<
ffffffff815291cd>] radeon_pci_probe+0xad/0xe0
[<
ffffffff8141a365>] local_pci_probe+0x45/0xa0
[<
ffffffff8141b741>] pci_device_probe+0xd1/0x130
[<
ffffffff81633dad>] driver_probe_device+0x12d/0x3e0
[<
ffffffff8163413b>] __driver_attach+0x9b/0xa0
[<
ffffffff816340a0>] ? __device_attach+0x40/0x40
[<
ffffffff81631cd3>] bus_for_each_dev+0x63/0xa0
[<
ffffffff8163378e>] driver_attach+0x1e/0x20
[<
ffffffff81633390>] bus_add_driver+0x180/0x240
[<
ffffffff81634914>] driver_register+0x64/0xf0
[<
ffffffff81419cac>] __pci_register_driver+0x4c/0x50
[<
ffffffff81509bf5>] drm_pci_init+0xf5/0x120
[<
ffffffff821dc871>] ? ttm_init+0x6a/0x6a
[<
ffffffff821dc908>] radeon_init+0x97/0xb5
[<
ffffffff810002fc>] do_one_initcall+0xbc/0x1f0
[<
ffffffff810e3278>] ? __wake_up+0x48/0x60
[<
ffffffff8218e256>] kernel_init_freeable+0x18a/0x215
[<
ffffffff8218d983>] ? initcall_blacklist+0xc0/0xc0
[<
ffffffff818a78f0>] ? rest_init+0x80/0x80
[<
ffffffff818a78fe>] kernel_init+0xe/0xf0
[<
ffffffff818c0c3c>] ret_from_fork+0x7c/0xb0
[<
ffffffff818a78f0>] ? rest_init+0x80/0x80
Code: 45 ac 0f 88 a8 01 00 00 3b b7 d0 01 00 00 49 89 ff 0f 83 99 01 00 00 48 8b 47 20 48 8b 80 88 00 00 00 48 85 c0 0f 84 cd 01 00 00 <41> 8b b1 5c 02 00 00 41 8b 89 58 02 00 00 89 75 98 41 8b b1 60
RIP [<
ffffffff8150423b>] drm_calc_vbltimestamp_from_scanoutpos+0x4b/0x320
RSP <
ffff880234da7918>
CR2:
000000000000025c
---[ end trace
ad2c0aadf48e2032 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
It has helped me to add a NULL pointer check that was suggested at
http://lists.freedesktop.org/archives/dri-devel/2014-October/070663.html
I am not familiar with the code. But the change looks sane
and we need something fast at this stage of 3.18 development.
Suggested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Tested-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Grygorii Strashko [Mon, 1 Dec 2014 15:34:04 +0000 (17:34 +0200)]
i2c: davinci: generate STP always when NACK is received
commit
9ea359f7314132cbcb5a502d2d8ef095be1f45e4 upstream.
According to I2C specification the NACK should be handled as follows:
"When SDA remains HIGH during this ninth clock pulse, this is defined as the Not
Acknowledge signal. The master can then generate either a STOP condition to
abort the transfer, or a repeated START condition to start a new transfer."
[I2C spec Rev. 6, 3.1.6: http://www.nxp.com/documents/user_manual/UM10204.pdf]
Currently the Davinci i2c driver interrupts the transfer on receipt of a
NACK but fails to send a STOP in some situations and so makes the bus
stuck until next I2C IP reset (idle/enable).
For example, the issue will happen during SMBus read transfer which
consists from two i2c messages write command/address and read data:
S Slave Address Wr A Command Code A Sr Slave Address Rd A D1..Dn A P
<--- write -----------------------> <--- read --------------------->
The I2C client device will send NACK if it can't recognize "Command Code"
and it's expected from I2C master to generate STP in this case.
But now, Davinci i2C driver will just exit with -EREMOTEIO and STP will
not be generated.
Hence, fix it by generating Stop condition (STP) always when NACK is received.
This patch fixes Davinci I2C in the same way it was done for OMAP I2C
commit
cda2109a26eb ("i2c: omap: query STP always when NACK is received").
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reported-by: Hein Tibosch <hein_tibosch@yahoo.es>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Kochetkov [Fri, 21 Nov 2014 00:16:51 +0000 (04:16 +0400)]
i2c: omap: fix i207 errata handling
commit
ccfc866356674cb3a61829d239c685af6e85f197 upstream.
commit
6d9939f651419a63e091105663821f9c7d3fec37 (i2c: omap: split out [XR]DR
and [XR]RDY) changed the way how errata i207 (I2C: RDR Flag May Be Incorrectly
Set) get handled.
6d9939f6514 code doesn't correspond to workaround provided by
errata.
According to errata ISR must filter out spurious RDR before data read not after.
ISR must read RXSTAT to get number of bytes available to read. Because RDR
could be set while there could no data in the receive FIFO.
Restored pre
6d9939f6514 way of handling errata.
Found by code review. Real impact haven't seen.
Tested on Beagleboard XM C.
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Fixes: 6d9939f651419a63e09110 i2c: omap: split out [XR]DR and [XR]RDY
Tested-by: Felipe Balbi <balbi@ti.com>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Kochetkov [Tue, 18 Nov 2014 17:00:58 +0000 (21:00 +0400)]
i2c: omap: fix NACK and Arbitration Lost irq handling
commit
27caca9d2e01c92b26d0690f065aad093fea01c7 upstream.
commit
1d7afc95946487945cc7f5019b41255b72224b70 (i2c: omap: ack IRQ in parts)
changed the interrupt handler to complete transfers without clearing
XRDY (AL case) and ARDY (NACK case) flags. XRDY or ARDY interrupts will be
fired again. As a result, ISR keep processing transfer after it was already
complete (from the driver code point of view).
A didn't see real impacts of the
1d7afc9, but it is really bad idea to
have ISR running on user data after transfer was complete.
It looks, what
1d7afc9 violate TI specs in what how AL and NACK should be
handled (see Note 1, sprugn4r, Figure 17-31 and Figure 17-32).
According to specs (if I understood correctly), in case of NACK and AL driver
must reset NACK, AL, ARDY, RDR, and RRDY (Master Receive Mode), and
NACK, AL, ARDY, and XDR (Master Transmitter Mode).
All that is done down the code under the if condition:
if (stat & (OMAP_I2C_STAT_ARDY | OMAP_I2C_STAT_NACK | OMAP_I2C_STAT_AL)) ...
The patch restore pre
1d7afc9 logic of handling NACK and AL interrupts, so
no interrupts is fired after ISR informs the rest of driver what transfer
complete.
Note: instead of removing break under NACK case, we could just replace 'break'
with 'continue' and allow NACK transfer to finish using ARDY event. I found
that NACK and ARDY bits usually set together. That case confirm TI wiki:
http://processors.wiki.ti.com/index.php/I2C_Tips#Detecting_and_handling_NACK
In order if someone interested in the event traces for NACK and AL cases,
I sent them to mailing list.
Tested on Beagleboard XM C.
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Fixes: 1d7afc9 i2c: omap: ack IRQ in parts
Acked-by: Felipe Balbi <balbi@ti.com>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Seth Forshee [Wed, 26 Nov 2014 02:28:24 +0000 (20:28 -0600)]
xen-netfront: Remove BUGs on paged skb data which crosses a page boundary
commit
8d609725d4357f499e2103e46011308b32f53513 upstream.
These BUGs can be erroneously triggered by frags which refer to
tail pages within a compound page. The data in these pages may
overrun the hardware page while still being contained within the
compound page, but since compound_order() evaluates to 0 for tail
pages the assertion fails. The code already iterates through
subsequent pages correctly in this scenario, so the BUGs are
unnecessary and can be removed.
Fixes: f36c374782e4 ("xen/netfront: handle compound page fragments on transmit")
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hugh Dickins [Tue, 2 Dec 2014 23:59:39 +0000 (15:59 -0800)]
mm: fix swapoff hang after page migration and fork
commit
2022b4d18a491a578218ce7a4eca8666db895a73 upstream.
I've been seeing swapoff hangs in recent testing: it's cycling around
trying unsuccessfully to find an mm for some remaining pages of swap.
I have been exercising swap and page migration more heavily recently,
and now notice a long-standing error in copy_one_pte(): it's trying to
add dst_mm to swapoff's mmlist when it finds a swap entry, but is doing
so even when it's a migration entry or an hwpoison entry.
Which wouldn't matter much, except it adds dst_mm next to src_mm,
assuming src_mm is already on the mmlist: which may not be so. Then if
pages are later swapped out from dst_mm, swapoff won't be able to find
where to replace them.
There's already a !non_swap_entry() test for stats: move that up before
the swap_duplicate() and the addition to mmlist.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Kelley Nielsen <kelleynnn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Weijie Yang [Tue, 2 Dec 2014 23:59:25 +0000 (15:59 -0800)]
mm: frontswap: invalidate expired data on a dup-store failure
commit
fb993fa1a2f669215fa03a09eed7848f2663e336 upstream.
If a frontswap dup-store failed, it should invalidate the expired page
in the backend, or it could trigger some data corruption issue.
Such as:
1. use zswap as the frontswap backend with writeback feature
2. store a swap page(version_1) to entry A, success
3. dup-store a newer page(version_2) to the same entry A, fail
4. use __swap_writepage() write version_2 page to swapfile, success
5. zswap do shrink, writeback version_1 page to swapfile
6. version_2 page is overwrited by version_1, data corrupt.
This patch fixes this issue by invalidating expired data immediately
when meet a dup-store failure.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Sat, 6 Dec 2014 23:55:43 +0000 (15:55 -0800)]
Linux 3.10.62
Sergio Gelato [Wed, 24 Sep 2014 06:47:24 +0000 (08:47 +0200)]
nfsd: Fix ACL null pointer deref
BugLink: http://bugs.launchpad.net/bugs/1348670
Fix regression introduced in pre-3.14 kernels by cherry-picking
aa07c713ecfc0522916f3cd57ac628ea6127c0ec
(NFSD: Call ->set_acl with a NULL ACL structure if no entries).
The affected code was removed in 3.14 by commit
4ac7249ea5a0ceef9f8269f63f33cc873c3fac61
(nfsd: use get_acl and ->set_acl).
The ->set_acl methods are already able to cope with a NULL argument.
Signed-off-by: Sergio Gelato <Sergio.Gelato@astro.su.se>
[bwh: Rewrite the subject]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Moritz Mühlenhoff <muehlenhoff@univention.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Benjamin Herrenschmidt [Tue, 7 Oct 2014 05:12:36 +0000 (16:12 +1100)]
powerpc/powernv: Honor the generic "no_64bit_msi" flag
commit
360743814c4082515581aa23ab1d8e699e1fbe88 upstream.
Instead of the arch specific quirk which we are deprecating
and that drivers don't understand.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Maurizio Lombardi [Thu, 20 Nov 2014 10:17:33 +0000 (11:17 +0100)]
bnx2fc: do not add shared skbs to the fcoe_rx_list
commit
01a4cc4d0cd6a836c7b923760e8eb1cbb6a47258 upstream.
In some cases, the fcoe_rx_list may contains multiple instances
of the same skb (the so called "shared skbs").
the bnx2fc_l2_rcv thread is a loop that extracts a skb from the list,
modifies (and destroys) its content and then proceed to the next one.
The problem is that if the skb is shared, the remaining instances will
be corrupted.
The solution is to use skb_share_check() before adding the skb to the
fcoe_rx_list.
[ 6286.808725] ------------[ cut here ]------------
[ 6286.808729] WARNING: at include/scsi/fc_frame.h:173 bnx2fc_l2_rcv_thread+0x425/0x450 [bnx2fc]()
[ 6286.808748] Modules linked in: bnx2x(-) mdio dm_service_time bnx2fc cnic uio fcoe libfcoe 8021q garp stp mrp libfc llc scsi_transport_fc scsi_tgt sg iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel e1000e ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper ptp cryptd hpilo serio_raw hpwdt lpc_ich pps_core ipmi_si pcspkr mfd_core ipmi_msghandler shpchp pcc_cpufreq mperf nfsd auth_rpcgss nfs_acl lockd sunrpc dm_multipath xfs libcrc32c ata_generic pata_acpi sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit ata_piix drm_kms_helper ttm drm libata i2c_core hpsa dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mdio]
[ 6286.808750] CPU: 3 PID: 1304 Comm: bnx2fc_l2_threa Not tainted 3.10.0-121.el7.x86_64 #1
[ 6286.808750] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013
[ 6286.808752]
0000000000000000 000000000b36e715 ffff8800deba1e00 ffffffff815ec0ba
[ 6286.808753]
ffff8800deba1e38 ffffffff8105dee1 ffffffffa05618c0 ffff8801e4c81888
[ 6286.808754]
ffffe8ffff663868 ffff8801f402b180 ffff8801f56bc000 ffff8800deba1e48
[ 6286.808754] Call Trace:
[ 6286.808759] [<
ffffffff815ec0ba>] dump_stack+0x19/0x1b
[ 6286.808762] [<
ffffffff8105dee1>] warn_slowpath_common+0x61/0x80
[ 6286.808763] [<
ffffffff8105e00a>] warn_slowpath_null+0x1a/0x20
[ 6286.808765] [<
ffffffffa054f415>] bnx2fc_l2_rcv_thread+0x425/0x450 [bnx2fc]
[ 6286.808767] [<
ffffffffa054eff0>] ? bnx2fc_disable+0x90/0x90 [bnx2fc]
[ 6286.808769] [<
ffffffff81085aef>] kthread+0xcf/0xe0
[ 6286.808770] [<
ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
[ 6286.808772] [<
ffffffff815fc76c>] ret_from_fork+0x7c/0xb0
[ 6286.808773] [<
ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
[ 6286.808774] ---[ end trace
c6cdb939184ccb4e ]---
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Acked-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
J. Bruce Fields [Thu, 15 Aug 2013 20:55:26 +0000 (16:55 -0400)]
nfsd4: fix leak of inode reference on delegation failure
commit
bf7bd3e98be5c74813bee6ad496139fb0a011b3b upstream.
This fixes a regression from
68a3396178e6688ad7367202cdf0af8ed03c8727
"nfsd4: shut down more of delegation earlier".
After that commit, nfs4_set_delegation() failures result in
nfs4_put_delegation being called, but nfs4_put_delegation doesn't free
the nfs4_file that has already been set by alloc_init_deleg().
This can result in an oops on later unmounting the exported filesystem.
Note also delaying the fi_had_conflict check we're able to return a
better error (hence give 4.1 clients a better idea why the delegation
failed; though note CONFLICT isn't an exact match here, as that's
supposed to indicate a current conflict, but all we know here is that
there was one recently).
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[tuomasjjrasanen: backported to 3.10
Conflicts fs/nfsd/nfs4state.c:
Delegation type flags have been removed from upstream code. In 3.10-series,
they still exists and therefore the commit caused few conflicts in function
signatures.
]
Signed-off-by: Tuomas Räsänen <tuomasjjrasanen@opinsys.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Trond Myklebust [Wed, 19 Nov 2014 17:47:50 +0000 (12:47 -0500)]
nfsd: Fix slot wake up race in the nfsv4.1 callback code
commit
c6c15e1ed303ffc47e696ea1c9a9df1761c1f603 upstream.
The currect code for nfsd41_cb_get_slot() and nfsd4_cb_done() has no
locking in order to guarantee atomicity, and so allows for races of
the form.
Task 1 Task 2
====== ======
if (test_and_set_bit(0) != 0) {
clear_bit(0)
rpc_wake_up_next(queue)
rpc_sleep_on(queue)
return false;
}
This patch breaks the race condition by adding a retest of the bit
after the call to rpc_sleep_on().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stanislaw Gruszka [Tue, 11 Nov 2014 13:28:47 +0000 (14:28 +0100)]
rt2x00: do not align payload on modern H/W
commit
cfd9167af14eb4ec21517a32911d460083ee3d59 upstream.
RT2800 and newer hardware require padding between header and payload if
header length is not multiple of 4.
For historical reasons we also align payload to to 4 bytes boundary, but
such alignment is not needed on modern H/W.
Patch fixes skb_under_panic problems reported from time to time:
https://bugzilla.kernel.org/show_bug.cgi?id=84911
https://bugzilla.kernel.org/show_bug.cgi?id=72471
http://marc.info/?l=linux-wireless&m=
139108549530402&w=2
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/
1087591
Panic happened because we eat 4 bytes of skb headroom on each
(re)transmission when sending frame without the payload and the header
length not being multiple of 4 (i.e. QoS header has 26 bytes). On such
case because paylad_aling=2 is bigger than header_align=0 we increase
header_align by 4 bytes. To prevent that we could change the check to:
if (payload_length && payload_align > header_align)
header_align += 4;
but not aligning payload at all is more effective and alignment is not
really needed by H/W (that has been tested on OpenWrt project for few
years now).
Reported-and-tested-by: Antti S. Lankila <alankila@bel.fi>
Debugged-by: Antti S. Lankila <alankila@bel.fi>
Reported-by: Henrik Asp <solenskiner@gmail.com>
Originally-From: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Körper [Fri, 31 Oct 2014 06:33:54 +0000 (07:33 +0100)]
can: dev: avoid calling kfree_skb() from interrupt context
commit
5247a589c24022ab34e780039cc8000c48f2035e upstream.
ikfree_skb() is Called in can_free_echo_skb(), which might be called from (TX
Error) interrupt, which triggers the folloing warning:
[ 1153.360705] ------------[ cut here ]------------
[ 1153.360715] WARNING: CPU: 0 PID: 31 at net/core/skbuff.c:563 skb_release_head_state+0xb9/0xd0()
[ 1153.360772] Call Trace:
[ 1153.360778] [<
c167906f>] dump_stack+0x41/0x52
[ 1153.360782] [<
c105bb7e>] warn_slowpath_common+0x7e/0xa0
[ 1153.360784] [<
c158b909>] ? skb_release_head_state+0xb9/0xd0
[ 1153.360786] [<
c158b909>] ? skb_release_head_state+0xb9/0xd0
[ 1153.360788] [<
c105bc42>] warn_slowpath_null+0x22/0x30
[ 1153.360791] [<
c158b909>] skb_release_head_state+0xb9/0xd0
[ 1153.360793] [<
c158be90>] skb_release_all+0x10/0x30
[ 1153.360795] [<
c158bf06>] kfree_skb+0x36/0x80
[ 1153.360799] [<
f8486938>] ? can_free_echo_skb+0x28/0x40 [can_dev]
[ 1153.360802] [<
f8486938>] can_free_echo_skb+0x28/0x40 [can_dev]
[ 1153.360805] [<
f849a12c>] esd_pci402_interrupt+0x34c/0x57a [esd402]
[ 1153.360809] [<
c10a75b5>] handle_irq_event_percpu+0x35/0x180
[ 1153.360811] [<
c10a7623>] ? handle_irq_event_percpu+0xa3/0x180
[ 1153.360813] [<
c10a7731>] handle_irq_event+0x31/0x50
[ 1153.360816] [<
c10a9c7f>] handle_fasteoi_irq+0x6f/0x120
[ 1153.360818] [<
c10a9c10>] ? handle_edge_irq+0x110/0x110
[ 1153.360822] [<
c1011b61>] handle_irq+0x71/0x90
[ 1153.360823] <IRQ> [<
c168152c>] do_IRQ+0x3c/0xd0
[ 1153.360829] [<
c1680b6c>] common_interrupt+0x2c/0x34
[ 1153.360834] [<
c107d277>] ? finish_task_switch+0x47/0xf0
[ 1153.360836] [<
c167c27b>] __schedule+0x35b/0x7e0
[ 1153.360839] [<
c10a5334>] ? console_unlock+0x2c4/0x4d0
[ 1153.360842] [<
c13df500>] ? n_tty_receive_buf_common+0x890/0x890
[ 1153.360845] [<
c10707b6>] ? process_one_work+0x196/0x370
[ 1153.360847] [<
c167c723>] schedule+0x23/0x60
[ 1153.360849] [<
c1070de1>] worker_thread+0x161/0x460
[ 1153.360852] [<
c1090fcf>] ? __wake_up_locked+0x1f/0x30
[ 1153.360854] [<
c1070c80>] ? rescuer_thread+0x2f0/0x2f0
[ 1153.360856] [<
c1074f01>] kthread+0xa1/0xc0
[ 1153.360859] [<
c1680401>] ret_from_kernel_thread+0x21/0x30
[ 1153.360861] [<
c1074e60>] ? kthread_create_on_node+0x110/0x110
[ 1153.360863] ---[ end trace
5ff83639cbb74b35 ]---
This patch replaces the kfree_skb() by dev_kfree_skb_any().
Signed-off-by: Thomas Körper <thomas.koerper@esd.eu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thor Thayer [Thu, 6 Nov 2014 19:54:27 +0000 (13:54 -0600)]
spi: dw: Fix dynamic speed change.
commit
0a8727e69778683495058852f783eeda141a754e upstream.
An IOCTL call that calls spi_setup() and then dw_spi_setup() will
overwrite the persisted last transfer speed. On each transfer, the
SPI speed is compared to the last transfer speed to determine if the
clock divider registers need to be updated (did the speed change?).
This bug was observed with the spidev driver using spi-config to
update the max transfer speed.
This fix: Don't overwrite the persisted last transaction clock speed
when updating the SPI parameters in dw_spi_setup(). On the next
transaction, the new speed won't match the persisted last speed
and the hardware registers will be updated.
On initialization, the persisted last transaction clock
speed will be 0 but will be updated after the first SPI
transaction.
Move zeroed clock divider check into clock change test because
chip->clk_div is zero on startup and would cause a divide-by-zero
error. The calculation was wrong as well (can't support odd #).
Reported-by: Vlastimil Setka <setka@vsis.cz>
Signed-off-by: Vlastimil Setka <setka@vsis.cz>
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sagi Grimberg [Tue, 28 Oct 2014 20:45:03 +0000 (13:45 -0700)]
iser-target: Handle DEVICE_REMOVAL event on network portal listener correctly
commit
3b726ae2de02a406cc91903f80132daee37b6f1b upstream.
In this case the cm_id->context is the isert_np, and the cm_id->qp
is NULL, so use that to distinct the cases.
Since we don't expect any other events on this cm_id we can
just return -1 for explicit termination of the cm_id by the
cma layer.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Roland Dreier [Tue, 14 Oct 2014 21:16:24 +0000 (14:16 -0700)]
target: Don't call TFO->write_pending if data_length == 0
commit
885e7b0e181c14e4d0ddd26c688bad2b84c1ada9 upstream.
If an initiator sends a zero-length command (e.g. TEST UNIT READY) but
sets the transfer direction in the transport layer to indicate a
data-out phase, we still shouldn't try to transfer data. At best it's
a NOP, and depending on the transport, we might crash on an
uninitialized sg list.
Reported-by: Craig Watson <craig.watson@vanguard-rugged.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bart Van Assche [Sun, 19 Oct 2014 15:05:33 +0000 (18:05 +0300)]
srp-target: Retry when QP creation fails with ENOMEM
commit
ab477c1ff5e0a744c072404bf7db51bfe1f05b6e upstream.
It is not guaranteed to that srp_sq_size is supported
by the HCA. So if we failed to create the QP with ENOMEM,
try with a smaller srp_sq_size. Keep it up until we hit
MIN_SRPT_SQ_SIZE, then fail the connection.
Reported-by: Mark Lehrer <lehrer@gmail.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Tue, 25 Nov 2014 08:38:17 +0000 (00:38 -0800)]
Input: xpad - use proper endpoint type
commit
a1f9a4072655843fc03186acbad65990cc05dd2d upstream.
The xpad wireless endpoint is not a bulk endpoint on my devices, but
rather an interrupt one, so the USB core complains when it is submitted.
I'm guessing that the author really did mean that this should be an
interrupt urb, but as there are a zillion different xpad devices out
there, let's cover out bases and handle both bulk and interrupt
endpoints just as easily.
Signed-off-by: "Pierre-Loup A. Griffais" <pgriffais@valvesoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Petazzoni [Tue, 25 Nov 2014 17:43:15 +0000 (18:43 +0100)]
ARM: 8222/1: mvebu: enable strex backoff delay
commit
995ab5189d1d7264e79e665dfa032a19b3ac646e upstream.
Under extremely rare conditions, in an MPCore node consisting of at
least 3 CPUs, two CPUs trying to perform a STREX to data on the same
shared cache line can enter a livelock situation.
This patch enables the HW mechanism that overcomes the bug. This fixes
the incorrect setup of the STREX backoff delay bit due to a wrong
description in the specification.
Note that enabling the STREX backoff delay mechanism is done by
leaving the bit *cleared*, while the bit was currently being set by
the proc-v7.S code.
[Thomas: adapt to latest mainline, slightly reword the commit log, add
stable markers.]
Fixes: de4901933f6d ("arm: mm: Add support for PJ4B cpu and init routines")
Signed-off-by: Nadav Haklai <nadavh@marvell.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dmitry Eremin-Solenikov [Fri, 21 Nov 2014 14:29:00 +0000 (15:29 +0100)]
ARM: 8216/1: xscale: correct auxiliary register in suspend/resume
commit
ef59a20ba375aeb97b3150a118318884743452a8 upstream.
According to the manuals I have, XScale auxiliary register should be
reached with opc_2 = 1 instead of crn = 1. cpu_xscale_proc_init
correctly uses c1, c0, 1 arguments, but cpu_xscale_do_suspend and
cpu_xscale_do_resume use c1, c1, 0. Correct suspend/resume functions to
also use c1, c0, 1.
The issue was primarily noticed thanks to qemu reporing "unsupported
instruction" on the pxa suspend path. Confirmed in PXA210/250 and PXA255
XScale Core manuals and in PXA270 and PXA320 Developers Guides.
Harware tested by me on tosa (pxa255). Robert confirmed on pxa270 board.
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jurgen Kramer [Sat, 15 Nov 2014 13:01:21 +0000 (14:01 +0100)]
ALSA: usb-audio: Add ctrl message delay quirk for Marantz/Denon devices
commit
6e84a8d7ac3ba246ef44e313e92bc16a1da1b04a upstream.
This patch adds a USB control message delay quirk for a few specific Marantz/Denon
devices. Without the delay the DACs will not work properly and produces the
following type of messages:
Nov 15 10:09:21 orwell kernel: [ 91.342880] usb 3-13: clock source 41 is not valid, cannot use
Nov 15 10:09:21 orwell kernel: [ 91.343775] usb 3-13: clock source 41 is not valid, cannot use
There are likely other Marantz/Denon devices using the same USB module which exhibit the
same problems. But as this cannot be verified I limited the patch to the devices
I could test.
The following two devices are covered by this path:
- Marantz SA-14S1
- Marantz HD-DAC1
Signed-off-by: Jurgen Kramer <gtmkramer@xs4all.nl>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Khoroshilov [Fri, 10 Oct 2014 20:31:07 +0000 (00:31 +0400)]
can: esd_usb2: fix memory leak on disconnect
commit
efbd50d2f62fc1f69a3dcd153e63ba28cc8eb27f upstream.
It seems struct esd_usb2 dev is not deallocated on disconnect. The patch adds
the missing deallocation.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Matthias Fuchs <matthias.fuchs@esd.eu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mathias Nyman [Tue, 18 Nov 2014 09:27:11 +0000 (11:27 +0200)]
USB: xhci: don't start a halted endpoint before its new dequeue is set
commit
c3492dbfa1050debf23a5b5cd2bc7514c5b37896 upstream.
A halted endpoint ring must first be reset, then move the ring
dequeue pointer past the problematic TRB. If we start the ring too
early after reset, but before moving the dequeue pointer we
will end up executing the same problematic TRB again.
As we always issue a set transfer dequeue command after a reset
endpoint command we can skip starting endpoint rings at reset endpoint
command completion.
Without this fix we end up trying to handle the same faulty TD for
contol endpoints. causing timeout, and failing testusb ctrl_out write
tests.
Fixes: e9df17e (USB: xhci: Correct assumptions about number of rings per endpoint.)
Tested-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hans de Goede [Mon, 24 Nov 2014 10:22:38 +0000 (11:22 +0100)]
usb-quirks: Add reset-resume quirk for MS Wireless Laser Mouse 6000
commit
263e80b43559a6103e178a9176938ce171b23872 upstream.
This wireless mouse receiver needs a reset-resume quirk to properly come
out of reset.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1165206
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Troy Clark [Mon, 17 Nov 2014 22:33:17 +0000 (14:33 -0800)]
usb: serial: ftdi_sio: add PIDs for Matrix Orbital products
commit
204ec6e07ea7aff863df0f7c53301f9cbbfbb9d3 upstream.
Add PIDs for new Matrix Orbital GTT series products.
Signed-off-by: Troy Clark <tclark@matrixorbital.ca>
[johan: shorten commit message ]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Preston Fick [Sat, 8 Nov 2014 05:26:11 +0000 (23:26 -0600)]
USB: serial: cp210x: add IDs for CEL MeshConnect USB Stick
commit
ffcfe30ebd8dd703d0fc4324ffe56ea21f5479f4 upstream.
Signed-off-by: Preston Fick <pffick@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johan Hovold [Tue, 18 Nov 2014 10:25:19 +0000 (11:25 +0100)]
USB: keyspan: fix tty line-status reporting
commit
5d1678a33c731b56e245e888fdae5e88efce0997 upstream.
Fix handling of TTY error flags, which are not bitmasks and must
specifically not be ORed together as this prevents the line discipline
from recognising them.
Also insert null characters when reporting overrun errors as these are
not associated with the received character.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johan Hovold [Tue, 18 Nov 2014 10:25:20 +0000 (11:25 +0100)]
USB: keyspan: fix overrun-error reporting
commit
855515a6d3731242d85850a206f2ec084c917338 upstream.
Fix reporting of overrun errors, which are not associated with a
character. Instead insert a null character and report only once.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johan Hovold [Tue, 18 Nov 2014 10:25:21 +0000 (11:25 +0100)]
USB: ssu100: fix overrun-error reporting
commit
75bcbf29c284dd0154c3e895a0bd1ef0e796160e upstream.
Fix reporting of overrun errors, which should only be reported once
using the inserted null character.
Fixes: 6b8f1ca5581b ("USB: ssu100: set tty_flags in ssu100_process_packet")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cristina Ciocan [Tue, 11 Nov 2014 14:07:42 +0000 (16:07 +0200)]
iio: Fix IIO_EVENT_CODE_EXTRACT_DIR bit mask
commit
ccf54555da9a5e91e454b909ca6a5303c7d6b910 upstream.
The direction field is set on 7 bits, thus we need to AND it with 0111 111 mask
in order to retrieve it, that is 0x7F, not 0xCF as it is now.
Fixes: ade7ef7ba (staging:iio: Differential channel handling)
Signed-off-by: Cristina Ciocan <cristina.ciocan@intel.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Laurent Dufour [Mon, 24 Nov 2014 14:07:53 +0000 (15:07 +0100)]
powerpc/pseries: Fix endiannes issue in RTAS call from xmon
commit
3b8a3c01096925a824ed3272601082289d9c23a5 upstream.
On pseries system (LPAR) xmon failed to enter when running in LE mode,
system is hunging. Inititating xmon will lead to such an output on the
console:
SysRq : Entering xmon
cpu 0x15: Vector: 0 at [
c0000003f39ffb10]
pc:
c00000000007ed7c: sysrq_handle_xmon+0x5c/0x70
lr:
c00000000007ed7c: sysrq_handle_xmon+0x5c/0x70
sp:
c0000003f39ffc70
msr:
8000000000009033
current = 0xc0000003fafa7180
paca = 0xc000000007d75e80 softe: 0 irq_happened: 0x01
pid = 14617, comm = bash
Bad kernel stack pointer
fafb4b0 at
eca7cc4
cpu 0x15: Vector: 300 (Data Access) at [
c000000007f07d40]
pc:
000000000eca7cc4
lr:
000000000eca7c44
sp:
fafb4b0
msr:
8000000000001000
dar:
10000000
dsisr:
42000000
current = 0xc0000003fafa7180
paca = 0xc000000007d75e80 softe: 0 irq_happened: 0x01
pid = 14617, comm = bash
cpu 0x15: Exception 300 (Data Access) in xmon, returning to main loop
xmon: WARNING: bad recursive fault on cpu 0x15
The root cause is that xmon is calling RTAS to turn off the surveillance
when entering xmon, and RTAS is requiring big endian parameters.
This patch is byte swapping the RTAS arguments when running in LE mode.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Benjamin Herrenschmidt [Tue, 7 Oct 2014 05:12:55 +0000 (16:12 +1100)]
powerpc/pseries: Honor the generic "no_64bit_msi" flag
commit
415072a041bf50dbd6d56934ffc0cbbe14c97be8 upstream.
Instead of the arch specific quirk which we are deprecating
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Benjamin Herrenschmidt [Fri, 14 Nov 2014 06:55:03 +0000 (17:55 +1100)]
of/base: Fix PowerPC address parsing hack
commit
746c9e9f92dde2789908e51a354ba90a1962a2eb upstream.
We have a historical hack that treats missing ranges properties as the
equivalent of an empty one. This is needed for ancient PowerMac "bad"
device-trees, and shouldn't be enabled for any other PowerPC platform,
otherwise we get some nasty layout of devices in sysfs or even
duplication when a set of otherwise identically named devices is
created multiple times under a different parent node with no ranges
property.
This fix is needed for the PowerNV i2c busses to be exposed properly
and will fix a number of other embedded cases.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Charles Keepax [Mon, 17 Nov 2014 10:48:21 +0000 (10:48 +0000)]
ASoC: wm_adsp: Avoid attempt to free buffers that might still be in use
commit
9da7a5a9fdeeb76b2243f6b473363a7e6147ab6f upstream.
We should not free any buffers associated with writing out coefficients
to the DSP until all the async writes have completed. This patch updates
the out of memory path when allocating a new buffer to include a call to
regmap_async_complete.
Reported-by: JS Park <aitdark.park@samsung.com>
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fabio Estevam [Fri, 14 Nov 2014 04:14:47 +0000 (02:14 -0200)]
ASoC: sgtl5000: Fix SMALL_POP bit definition
commit
c251ea7bd7a04f1f2575467e0de76e803cf59149 upstream.
On a mx28evk with a sgtl5000 codec we notice a loud 'click' sound to happen
5 seconds after the end of a playback.
The SMALL_POP bit should fix this, but its definition is incorrect:
according to the sgtl5000 manual it is bit 0 of CHIP_REF_CTRL register, not
bit 1.
Fix the definition accordingly and enable the bit as intended per the code
comment.
After applying this change, no loud 'click' sound is heard after playback
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Benjamin Herrenschmidt [Fri, 3 Oct 2014 05:13:24 +0000 (15:13 +1000)]
PCI/MSI: Add device flag indicating that 64-bit MSIs don't work
commit
f144d1496b47e7450f41b767d0d91c724c2198bc upstream.
This can be set by quirks/drivers to be used by the architecture code
that assigns the MSI addresses.
We additionally add verification in the core MSI code that the values
assigned by the architecture do satisfy the limitation in order to fail
gracefully if they don't (ie. the arch hasn't been updated to deal with
that quirk yet).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jiri Bohac [Wed, 19 Nov 2014 22:05:49 +0000 (23:05 +0100)]
ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
[ Upstream commit
01462405f0c093b2f8dfddafcadcda6c9e4c5cdf ]
This fixes an old regression introduced by commit
b0d0d915 (ipx: remove the BKL).
When a recvmsg syscall blocks waiting for new data, no data can be sent on the
same socket with sendmsg because ipx_recvmsg() sleeps with the socket locked.
This breaks mars-nwe (NetWare emulator):
- the ncpserv process reads the request using recvmsg
- ncpserv forks and spawns nwconn
- ncpserv calls a (blocking) recvmsg and waits for new requests
- nwconn deadlocks in sendmsg on the same socket
Commit
b0d0d915 has simply replaced BKL locking with
lock_sock/release_sock. Unlike now, BKL got unlocked while
sleeping, so a blocking recvmsg did not block a concurrent
sendmsg.
Only keep the socket locked while actually working with the socket data and
release it prior to calling skb_recv_datagram().
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mathias Krause [Wed, 19 Nov 2014 17:05:26 +0000 (18:05 +0100)]
pptp: fix stack info leak in pptp_getname()
[ Upstream commit
a5f6fc28d6e6cc379c6839f21820e62262419584 ]
pptp_getname() only partially initializes the stack variable sa,
particularly only fills the pptp part of the sa_addr union. The code
thereby discloses 16 bytes of kernel stack memory via getsockname().
Fix this by memset(0)'ing the union before.
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Martin Hauke [Sun, 16 Nov 2014 18:55:25 +0000 (19:55 +0100)]
qmi_wwan: Add support for HP lt4112 LTE/HSPA+ Gobi 4G Modem
[ Upstream commit
bb2bdeb83fb125c95e47fc7eca2a3e8f868e2a74 ]
Added the USB VID/PID for the HP lt4112 LTE/HSPA+ Gobi 4G Modem (Huawei me906e)
Signed-off-by: Martin Hauke <mardnh@gmx.de>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Khoroshilov [Fri, 14 Nov 2014 23:11:59 +0000 (02:11 +0300)]
ieee802154: fix error handling in ieee802154fake_probe()
[ Upstream commit
8c2dd54485ccee7fc4086611e188478584758c8d ]
In case of any failure ieee802154fake_probe() just calls unregister_netdev().
But it does not look safe to unregister netdevice before it was registered.
The patch implements straightforward resource deallocation in case of
failure in ieee802154fake_probe().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Panu Matilainen [Fri, 14 Nov 2014 11:14:32 +0000 (13:14 +0200)]
ipv4: Fix incorrect error code when adding an unreachable route
[ Upstream commit
49dd18ba4615eaa72f15c9087dea1c2ab4744cf5 ]
Trying to add an unreachable route incorrectly returns -ESRCH if
if custom FIB rules are present:
[root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4
RTNETLINK answers: Network is unreachable
[root@localhost ~]# ip rule add to 55.66.77.88 table 200
[root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4
RTNETLINK answers: No such process
[root@localhost ~]#
Commit
83886b6b636173b206f475929e58fac75c6f2446 ("[NET]: Change "not found"
return value for rule lookup") changed fib_rules_lookup()
to use -ESRCH as a "not found" code internally, but for user space it
should be translated into -ENETUNREACH. Handle the translation centrally in
ipv4-specific fib_lookup(), leaving the DECnet case alone.
On a related note, commit
b7a71b51ee37d919e4098cd961d59a883fd272d8
("ipv4: removed redundant conditional") removed a similar translation from
ip_route_input_slow() prematurely AIUI.
Fixes: b7a71b51ee37 ("ipv4: removed redundant conditional")
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vincent BENAYOUN [Thu, 13 Nov 2014 12:47:26 +0000 (13:47 +0100)]
inetdevice: fixed signed integer overflow
[ Upstream commit
84bc88688e3f6ef843aa8803dbcd90168bb89faf ]
There could be a signed overflow in the following code.
The expression, (32-logmask) is comprised between 0 and 31 included.
It may be equal to 31.
In such a case the left shift will produce a signed integer overflow.
According to the C99 Standard, this is an undefined behavior.
A simple fix is to replace the signed int 1 with the unsigned int 1U.
Signed-off-by: Vincent BENAYOUN <vincent.benayoun@trust-in-soft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David S. Miller [Sun, 16 Nov 2014 21:19:32 +0000 (13:19 -0800)]
sparc64: Fix constraints on swab helpers.
[ Upstream commit
5a2b59d3993e8ca4f7788a48a23e5cb303f26954 ]
We are reading the memory location, so we have to have a memory
constraint in there purely for the sake of showing the data flow
to the compiler.
Reported-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Lutomirski [Fri, 21 Nov 2014 21:26:07 +0000 (13:26 -0800)]
uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME
commit
82975bc6a6df743b9a01810fb32cb65d0ec5d60b upstream.
x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but
not on non-paranoid returns. I suspect that this is a mistake and that
the code only works because int3 is paranoid.
Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround
for the x86 bug. With that bug fixed, we can remove _TIF_NOTIFY_RESUME
from the uprobes code.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kees Cook [Fri, 14 Nov 2014 19:47:37 +0000 (11:47 -0800)]
x86, mm: Set NX across entire PMD at boot
commit
45e2a9d4701d8c624d4a4bcdd1084eae31e92f58 upstream.
When setting up permissions on kernel memory at boot, the end of the
PMD that was split from bss remained executable. It should be NX like
the rest. This performs a PMD alignment instead of a PAGE alignment to
get the correct span of memory.
Before:
---[ High Kernel Mapping ]---
...
0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte
0xffffffff82200000-0xffffffff82c00000 10M RW PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82df5000 2004K RW GLB NX pte
0xffffffff82df5000-0xffffffff82e00000 44K RW GLB x pte
0xffffffff82e00000-0xffffffffc0000000 978M pmd
After:
---[ High Kernel Mapping ]---
...
0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte
0xffffffff82200000-0xffffffff82e00000 12M RW PSE GLB NX pmd
0xffffffff82e00000-0xffffffffc0000000 978M pmd
[ tglx: Changed it to roundup(_brk_end, PMD_SIZE) and added a comment.
We really should unmap the reminder along with the holes
caused by init,initdata etc. but thats a different issue ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20141114194737.GA3091@www.outflux.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dave Hansen [Tue, 11 Nov 2014 22:01:33 +0000 (14:01 -0800)]
x86: Require exact match for 'noxsave' command line option
commit
2cd3949f702692cf4c5d05b463f19cd706a92dd3 upstream.
We have some very similarly named command-line options:
arch/x86/kernel/cpu/common.c:__setup("noxsave", x86_xsave_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaveopt", x86_xsaveopt_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaves", x86_xsaves_setup);
__setup() is designed to match options that take arguments, like
"foo=bar" where you would have:
__setup("foo", x86_foo_func...);
The problem is that "noxsave" actually _matches_ "noxsaves" in
the same way that "foo" matches "foo=bar". If you boot an old
kernel that does not know about "noxsaves" with "noxsaves" on the
command line, it will interpret the argument as "noxsave", which
is not what you want at all.
This makes the "noxsave" handler only return success when it finds
an *exact* match.
[ tglx: We really need to make __setup() more robust. ]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20141111220133.FE053984@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Lutomirski [Sun, 23 Nov 2014 02:00:33 +0000 (18:00 -0800)]
x86_64, traps: Rework bad_iret
commit
b645af2d5905c4e32399005b867987919cbfc3ae upstream.
It's possible for iretq to userspace to fail. This can happen because
of a bad CS, SS, or RIP.
Historically, we've handled it by fixing up an exception from iretq to
land at bad_iret, which pretends that the failed iret frame was really
the hardware part of #GP(0) from userspace. To make this work, there's
an extra fixup to fudge the gs base into a usable state.
This is suboptimal because it loses the original exception. It's also
buggy because there's no guarantee that we were on the kernel stack to
begin with. For example, if the failing iret happened on return from an
NMI, then we'll end up executing general_protection on the NMI stack.
This is bad for several reasons, the most immediate of which is that
general_protection, as a non-paranoid idtentry, will try to deliver
signals and/or schedule from the wrong stack.
This patch throws out bad_iret entirely. As a replacement, it augments
the existing swapgs fudge into a full-blown iret fixup, mostly written
in C. It's should be clearer and more correct.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Lutomirski [Sun, 23 Nov 2014 02:00:32 +0000 (18:00 -0800)]
x86_64, traps: Stop using IST for #SS
commit
6f442be2fb22be02cafa606f1769fa1e6f894441 upstream.
On a 32-bit kernel, this has no effect, since there are no IST stacks.
On a 64-bit kernel, #SS can only happen in user code, on a failed iret
to user space, a canonical violation on access via RSP or RBP, or a
genuine stack segment violation in 32-bit kernel code. The first two
cases don't need IST, and the latter two cases are unlikely fatal bugs,
and promoting them to double faults would be fine.
This fixes a bug in which the espfix64 code mishandles a stack segment
violation.
This saves 4k of memory per CPU and a tiny bit of code.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Lutomirski [Sun, 23 Nov 2014 02:00:31 +0000 (18:00 -0800)]
x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
commit
af726f21ed8af2cdaa4e93098dc211521218ae65 upstream.
There's nothing special enough about the espfix64 double fault fixup to
justify writing it in assembly. Move it to C.
This also fixes a bug: if the double fault came from an IST stack, the
old asm code would return to a partially uninitialized stack frame.
Fixes: 3891a04aafd668686239349ea58f3314ea2af86b
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Aaro Koskinen [Wed, 19 Nov 2014 23:05:38 +0000 (01:05 +0200)]
MIPS: Loongson: Make platform serial setup always built-in.
commit
26927f76499849e095714452b8a4e09350f6a3b9 upstream.
If SERIAL_8250 is compiled as a module, the platform specific setup
for Loongson will be a module too, and it will not work very well.
At least on Loongson 3 it will trigger a build failure,
since loongson_sysconf is not exported to modules.
Fix by making the platform specific serial code always built-in.
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Reported-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Huacai Chen <chenhc@lemote.com>
Cc: Markos Chandras <Markos.Chandras@imgtec.com>
Patchwork: https://patchwork.linux-mips.org/patch/8533/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Aaro Koskinen [Fri, 17 Oct 2014 15:10:24 +0000 (18:10 +0300)]
MIPS: oprofile: Fix backtrace on 64-bit kernel
commit
bbaf113a481b6ce32444c125807ad3618643ce57 upstream.
Fix incorrect cast that always results in wrong address for the new
frame on 64-bit kernels.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nsn.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/8110/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 21 Nov 2014 17:23:22 +0000 (09:23 -0800)]
Linux 3.10.61
Johannes Weiner [Wed, 16 Oct 2013 20:46:59 +0000 (13:46 -0700)]
mm: memcg: handle non-error OOM situations more gracefully
commit
4942642080ea82d99ab5b653abb9a12b7ba31f4a upstream.
Commit
3812c8c8f395 ("mm: memcg: do not trap chargers with full
callstack on OOM") assumed that only a few places that can trigger a
memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
readahead. But there are many more and it's impractical to annotate
them all.
First of all, we don't want to invoke the OOM killer when the failed
allocation is gracefully handled, so defer the actual kill to the end of
the fault handling as well. This simplifies the code quite a bit for
added bonus.
Second, since a failed allocation might not be the abrupt end of the
fault, the memcg OOM handler needs to be re-entrant until the fault
finishes for subsequent allocation attempts. If an allocation is
attempted after the task already OOMed, allow it to bypass the limit so
that it can quickly finish the fault and invoke the OOM killer.
Reported-by: azurIt <azurit@pobox.sk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Weiner [Thu, 12 Sep 2013 22:13:44 +0000 (15:13 -0700)]
mm: memcg: do not trap chargers with full callstack on OOM
commit
3812c8c8f3953921ef18544110dafc3505c1ac62 upstream.
The memcg OOM handling is incredibly fragile and can deadlock. When a
task fails to charge memory, it invokes the OOM killer and loops right
there in the charge code until it succeeds. Comparably, any other task
that enters the charge path at this point will go to a waitqueue right
then and there and sleep until the OOM situation is resolved. The problem
is that these tasks may hold filesystem locks and the mmap_sem; locks that
the selected OOM victim may need to exit.
For example, in one reported case, the task invoking the OOM killer was
about to charge a page cache page during a write(), which holds the
i_mutex. The OOM killer selected a task that was just entering truncate()
and trying to acquire the i_mutex:
OOM invoking task:
mem_cgroup_handle_oom+0x241/0x3b0
mem_cgroup_cache_charge+0xbe/0xe0
add_to_page_cache_locked+0x4c/0x140
add_to_page_cache_lru+0x22/0x50
grab_cache_page_write_begin+0x8b/0xe0
ext3_write_begin+0x88/0x270
generic_file_buffered_write+0x116/0x290
__generic_file_aio_write+0x27c/0x480
generic_file_aio_write+0x76/0xf0 # takes ->i_mutex
do_sync_write+0xea/0x130
vfs_write+0xf3/0x1f0
sys_write+0x51/0x90
system_call_fastpath+0x18/0x1d
OOM kill victim:
do_truncate+0x58/0xa0 # takes i_mutex
do_last+0x250/0xa30
path_openat+0xd7/0x440
do_filp_open+0x49/0xa0
do_sys_open+0x106/0x240
sys_open+0x20/0x30
system_call_fastpath+0x18/0x1d
The OOM handling task will retry the charge indefinitely while the OOM
killed task is not releasing any resources.
A similar scenario can happen when the kernel OOM killer for a memcg is
disabled and a userspace task is in charge of resolving OOM situations.
In this case, ALL tasks that enter the OOM path will be made to sleep on
the OOM waitqueue and wait for userspace to free resources or increase
the group's limit. But a userspace OOM handler is prone to deadlock
itself on the locks held by the waiting tasks. For example one of the
sleeping tasks may be stuck in a brk() call with the mmap_sem held for
writing but the userspace handler, in order to pick an optimal victim,
may need to read files from /proc/<pid>, which tries to acquire the same
mmap_sem for reading and deadlocks.
This patch changes the way tasks behave after detecting a memcg OOM and
makes sure nobody loops or sleeps with locks held:
1. When OOMing in a user fault, invoke the OOM killer and restart the
fault instead of looping on the charge attempt. This way, the OOM
victim can not get stuck on locks the looping task may hold.
2. When OOMing in a user fault but somebody else is handling it
(either the kernel OOM killer or a userspace handler), don't go to
sleep in the charge context. Instead, remember the OOMing memcg in
the task struct and then fully unwind the page fault stack with
-ENOMEM. pagefault_out_of_memory() will then call back into the
memcg code to check if the -ENOMEM came from the memcg, and then
either put the task to sleep on the memcg's OOM waitqueue or just
restart the fault. The OOM victim can no longer get stuck on any
lock a sleeping task may hold.
Debugged by Michal Hocko.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: azurIt <azurit@pobox.sk>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Weiner [Thu, 12 Sep 2013 22:13:43 +0000 (15:13 -0700)]
mm: memcg: rework and document OOM waiting and wakeup
commit
fb2a6fc56be66c169f8b80e07ed999ba453a2db2 upstream.
The memcg OOM handler open-codes a sleeping lock for OOM serialization
(trylock, wait, repeat) because the required locking is so specific to
memcg hierarchies. However, it would be nice if this construct would be
clearly recognizable and not be as obfuscated as it is right now. Clean
up as follows:
1. Remove the return value of mem_cgroup_oom_unlock()
2. Rename mem_cgroup_oom_lock() to mem_cgroup_oom_trylock().
3. Pull the prepare_to_wait() out of the memcg_oom_lock scope. This
makes it more obvious that the task has to be on the waitqueue
before attempting to OOM-trylock the hierarchy, to not miss any
wakeups before going to sleep. It just didn't matter until now
because it was all lumped together into the global memcg_oom_lock
spinlock section.
4. Pull the mem_cgroup_oom_notify() out of the memcg_oom_lock scope.
It is proctected by the hierarchical OOM-lock.
5. The memcg_oom_lock spinlock is only required to propagate the OOM
lock in any given hierarchy atomically. Restrict its scope to
mem_cgroup_oom_(trylock|unlock).
6. Do not wake up the waitqueue unconditionally at the end of the
function. Only the lockholder has to wake up the next in line
after releasing the lock.
Note that the lockholder kicks off the OOM-killer, which in turn
leads to wakeups from the uncharges of the exiting task. But a
contender is not guaranteed to see them if it enters the OOM path
after the OOM kills but before the lockholder releases the lock.
Thus there has to be an explicit wakeup after releasing the lock.
7. Put the OOM task on the waitqueue before marking the hierarchy as
under OOM as that is the point where we start to receive wakeups.
No point in listening before being on the waitqueue.
8. Likewise, unmark the hierarchy before finishing the sleep, for
symmetry.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Weiner [Thu, 12 Sep 2013 22:13:42 +0000 (15:13 -0700)]
mm: memcg: enable memcg OOM killer only for user faults
commit
519e52473ebe9db5cdef44670d5a97f1fd53d721 upstream.
System calls and kernel faults (uaccess, gup) can handle an out of memory
situation gracefully and just return -ENOMEM.
Enable the memcg OOM killer only for user faults, where it's really the
only option available.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Weiner [Thu, 12 Sep 2013 22:13:40 +0000 (15:13 -0700)]
x86: finish user fault error path with fatal signal
commit
3a13c4d761b4b979ba8767f42345fed3274991b0 upstream.
The x86 fault handler bails in the middle of error handling when the
task has a fatal signal pending. For a subsequent patch this is a
problem in OOM situations because it relies on pagefault_out_of_memory()
being called even when the task has been killed, to perform proper
per-task OOM state unwinding.
Shortcutting the fault like this is a rather minor optimization that
saves a few instructions in rare cases. Just remove it for
user-triggered faults.
Use the opportunity to split the fault retry handling from actual fault
errors and add locking documentation that reads suprisingly similar to
ARM's.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Weiner [Thu, 12 Sep 2013 22:13:39 +0000 (15:13 -0700)]
arch: mm: pass userspace fault flag to generic fault handler
commit
759496ba6407c6994d6a5ce3a5e74937d7816208 upstream.
Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.
Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Weiner [Thu, 12 Sep 2013 22:13:38 +0000 (15:13 -0700)]
arch: mm: do not invoke OOM killer on kernel fault OOM
commit
871341023c771ad233620b7a1fb3d9c7031c4e5c upstream.
Kernel faults are expected to handle OOM conditions gracefully (gup,
uaccess etc.), so they should never invoke the OOM killer. Reserve this
for faults triggered in user context when it is the only option.
Most architectures already do this, fix up the remaining few.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Weiner [Thu, 12 Sep 2013 22:13:36 +0000 (15:13 -0700)]
arch: mm: remove obsolete init OOM protection
commit
94bce453c78996cc4373d5da6cfabe07fcc6d9f9 upstream.
The memcg code can trap tasks in the context of the failing allocation
until an OOM situation is resolved. They can hold all kinds of locks
(fs, mm) at this point, which makes it prone to deadlocking.
This series converts memcg OOM handling into a two step process that is
started in the charge context, but any waiting is done after the fault
stack is fully unwound.
Patches 1-4 prepare architecture handlers to support the new memcg
requirements, but in doing so they also remove old cruft and unify
out-of-memory behavior across architectures.
Patch 5 disables the memcg OOM handling for syscalls, readahead, kernel
faults, because they can gracefully unwind the stack with -ENOMEM. OOM
handling is restricted to user triggered faults that have no other
option.
Patch 6 reworks memcg's hierarchical OOM locking to make it a little
more obvious wth is going on in there: reduce locked regions, rename
locking functions, reorder and document.
Patch 7 implements the two-part OOM handling such that tasks are never
trapped with the full charge stack in an OOM situation.
This patch:
Back before smart OOM killing, when faulting tasks were killed directly on
allocation failures, the arch-specific fault handlers needed special
protection for the init process.
Now that all fault handlers call into the generic OOM killer (see commit
609838cfed97: "mm: invoke oom-killer from remaining unconverted page
fault handlers"), which already provides init protection, the
arch-specific leftovers can be removed.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Weiner [Mon, 8 Jul 2013 22:59:50 +0000 (15:59 -0700)]
mm: invoke oom-killer from remaining unconverted page fault handlers
commit
609838cfed972d49a65aac7923a9ff5cbe482e30 upstream.
A few remaining architectures directly kill the page faulting task in an
out of memory situation. This is usually not a good idea since that
task might not even use a significant amount of memory and so may not be
the optimal victim to resolve the situation.
Since 2.6.29's
1c0fe6e ("mm: invoke oom-killer from page fault") there
is a hook that architecture page fault handlers are supposed to call to
invoke the OOM killer and let it pick the right task to kill. Convert
the remaining architectures over to this hook.
To have the previous behavior of simply taking out the faulting task the
vm.oom_kill_allocating_task sysctl can be set to 1.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc bits]
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Borkmann [Thu, 9 Oct 2014 20:55:31 +0000 (22:55 +0200)]
net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
commit
9de7922bc709eee2f609cd01d98aaedc4cf5ea74 upstream.
Commit
6f4c618ddb0 ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:
skb_over_panic: text:
ffffffffa01ea1c3 len:31056 put:30768
head:
ffff88011bd81800 data:
ffff88011bd81800 tail:0x7950
end:0x440 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
<IRQ>
[<
ffffffff8144fb1c>] skb_put+0x5c/0x70
[<
ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
[<
ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
[<
ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
[<
ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
[<
ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
[<
ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
[<
ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
[<
ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
[<
ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
[<
ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
[<
ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
[<
ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
[<
ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<
ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
[<
ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<
ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
[<
ffffffff81497078>] ip_local_deliver+0x98/0xa0
[<
ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
[<
ffffffff81496ac5>] ip_rcv+0x275/0x350
[<
ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
[<
ffffffff81460588>] netif_receive_skb+0x58/0x60
This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...
-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
------------------ ASCONF; UNKNOWN ------------------>
... where ASCONF chunk of length 280 contains 2 parameters ...
1) Add IP address parameter (param length: 16)
2) Add/del IP address parameter (param length: 255)
... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.
The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.
In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.
When walking to the next parameter this time, we proceed
with ...
length = ntohs(asconf_param->param_hdr.length);
asconf_param = (void *)asconf_param + length;
... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.
Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.
Joint work with Vlad Yasevich.
Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Borkmann [Thu, 9 Oct 2014 20:55:32 +0000 (22:55 +0200)]
net: sctp: fix panic on duplicate ASCONF chunks
commit
b69040d8e39f20d5215a03502a8e8b4c6ab78395 upstream.
When receiving a e.g. semi-good formed connection scan in the
form of ...
-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
---------------- ASCONF_a; ASCONF_b ----------------->
... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!
The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.
Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():
While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit
2e3216cd54b1
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.
Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk->list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.
Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.
Joint work with Vlad Yasevich.
Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Borkmann [Thu, 9 Oct 2014 20:55:33 +0000 (22:55 +0200)]
net: sctp: fix remote memory pressure from excessive queueing
commit
26b87c7881006311828bb0ab271a551a62dcceb4 upstream.
This scenario is not limited to ASCONF, just taken as one
example triggering the issue. When receiving ASCONF probes
in the form of ...
-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------>
[...]
---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------>
... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed
ASCONFs and have increasing serial numbers, we process such
ASCONF chunk(s) marked with !end_of_packet and !singleton,
since we have not yet reached the SCTP packet end. SCTP does
only do verification on a chunk by chunk basis, as an SCTP
packet is nothing more than just a container of a stream of
chunks which it eats up one by one.
We could run into the case that we receive a packet with a
malformed tail, above marked as trailing JUNK. All previous
chunks are here goodformed, so the stack will eat up all
previous chunks up to this point. In case JUNK does not fit
into a chunk header and there are no more other chunks in
the input queue, or in case JUNK contains a garbage chunk
header, but the encoded chunk length would exceed the skb
tail, or we came here from an entirely different scenario
and the chunk has pdiscard=1 mark (without having had a flush
point), it will happen, that we will excessively queue up
the association's output queue (a correct final chunk may
then turn it into a response flood when flushing the
queue ;)): I ran a simple script with incremental ASCONF
serial numbers and could see the server side consuming
excessive amount of RAM [before/after: up to 2GB and more].
The issue at heart is that the chunk train basically ends
with !end_of_packet and !singleton markers and since commit
2e3216cd54b1 ("sctp: Follow security requirement of responding
with 1 packet") therefore preventing an output queue flush
point in sctp_do_sm() -> sctp_cmd_interpreter() on the input
chunk (chunk = event_arg) even though local_cork is set,
but its precedence has changed since then. In the normal
case, the last chunk with end_of_packet=1 would trigger the
queue flush to accommodate possible outgoing bundling.
In the input queue, sctp_inq_pop() seems to do the right thing
in terms of discarding invalid chunks. So, above JUNK will
not enter the state machine and instead be released and exit
the sctp_assoc_bh_rcv() chunk processing loop. It's simply
the flush point being missing at loop exit. Adding a try-flush
approach on the output queue might not work as the underlying
infrastructure might be long gone at this point due to the
side-effect interpreter run.
One possibility, albeit a bit of a kludge, would be to defer
invalid chunk freeing into the state machine in order to
possibly trigger packet discards and thus indirectly a queue
flush on error. It would surely be better to discard chunks
as in the current, perhaps better controlled environment, but
going back and forth, it's simply architecturally not possible.
I tried various trailing JUNK attack cases and it seems to
look good now.
Joint work with Vlad Yasevich.
Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nadav Amit [Tue, 16 Sep 2014 23:50:50 +0000 (02:50 +0300)]
KVM: x86: Don't report guest userspace emulation error to userspace
commit
a2b9e6c1a35afcc0973acb72e591c714e78885ff upstream.
Commit
fc3a9157d314 ("KVM: X86: Don't report L2 emulation failures to
user-space") disabled the reporting of L2 (nested guest) emulation failures to
userspace due to race-condition between a vmexit and the instruction emulator.
The same rational applies also to userspace applications that are permitted by
the guest OS to access MMIO area or perform PIO.
This patch extends the current behavior - of injecting a #UD instead of
reporting it to userspace - also for guest userspace code.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tomas Henzl [Thu, 1 Aug 2013 13:14:00 +0000 (15:14 +0200)]
SCSI: hpsa: fix a race in cmd_free/scsi_done
commit
2cc5bfaf854463d9d1aa52091f60110fbf102a96 upstream.
When the driver calls scsi_done and after that frees it's internal
preallocated memory it can happen that a new job is enqueud before
the memory is freed. The allocation fails and the message
"cmd_alloc returned NULL" is shown.
Patch below fixes it by moving cmd->scsi_done after cmd_free.
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Cc: Masoud Sharbiani <msharbiani@twitter.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eugenia Emantayev [Thu, 25 Jul 2013 16:21:23 +0000 (19:21 +0300)]
net/mlx4_en: Fix BlueFlame race
commit
2d4b646613d6b12175b017aca18113945af1faf3 upstream.
Fix a race between BlueFlame flow and stamping in post send flow.
Example:
SW: Build WQE 0 on the TX buffer, except the ownership bit
SW: Set ownership for WQE 0 on the TX buffer
SW: Ring doorbell for WQE 0
SW: Build WQE 1 on the TX buffer, except the ownership bit
SW: Set ownership for WQE 1 on the TX buffer
HW: Read WQE 0 and then WQE 1, before doorbell was rung/BF was done for WQE 1
HW: Produce CQEs for WQE 0 and WQE 1
SW: Process the CQEs, and stamp WQE 0 and WQE 1 accordingly (on the TX buffer)
SW: Copy WQE 1 from the TX buffer to the BF register - ALREADY STAMPED!
HW: CQE error with index 0xFFFF - the BF WQE's control segment is STAMPED,
so the BF index is 0xFFFF. Error: Invalid Opcode.
As a result QP enters the error state and no traffic can be sent.
Solution:
When stamping - do not stamp last completed wqe.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Vinson Lee <vlee@twopensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ben Dooks [Thu, 25 Jul 2013 13:38:03 +0000 (14:38 +0100)]
ARM: Correct BUG() assembly to ensure it is endian-agnostic
commit
63328070eff2f4fd730c86966a0dbc976147c39f upstream.
Currently BUG() uses .word or .hword to create the necessary illegal
instructions. However if we are building BE8 then these get swapped
by the linker into different illegal instructions in the text. This
means that the BUG() macro does not get trapped properly.
Change to using <asm/opcodes.h> to provide the necessary ARM instruction
building as we cannot rely on gcc/gas having the `.inst` instructions
which where added to try and resolve this issue (reported by Dave Martin
<Dave.Martin@arm.com>).
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vince Weaver [Mon, 14 Jul 2014 19:33:25 +0000 (15:33 -0400)]
perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
commit
1996388e9f4e3444db8273bc08d25164d2967c21 upstream.
This was discussed back in February:
https://lkml.org/lkml/2014/2/18/956
But I never saw a patch come out of it.
On IvyBridge we share the SandyBridge cache event tables, but the
dTLB-load-miss event is not compatible. Patch it up after
the fact to the proper DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1407141528200.17214@vincent-weaver-1.umelst.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>