firefly-linux-kernel-4.4.55.git
9 years agokvm: svm: Only propagate next_rip when guest supports it
Joerg Roedel [Wed, 14 Oct 2015 13:10:54 +0000 (15:10 +0200)]
kvm: svm: Only propagate next_rip when guest supports it

Currently we always write the next_rip of the shadow vmcb to
the guests vmcb when we emulate a vmexit. This could confuse
the guest when its cpuid indicated no support for the
next_rip feature.

Fix this by only propagating next_rip if the guest actually
supports it.

Cc: Bandan Das <bsd@redhat.com>
Cc: Dirk Mueller <dmueller@suse.com>
Tested-By: Dirk Mueller <dmueller@suse.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: manually unroll bad_mt_xwr loop
Paolo Bonzini [Wed, 23 Sep 2015 08:34:26 +0000 (10:34 +0200)]
KVM: x86: manually unroll bad_mt_xwr loop

The loop is computing one of two constants, it can be simpler to write
everything inline.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: nVMX: expose VPID capability to L1
Wanpeng Li [Tue, 13 Oct 2015 16:18:37 +0000 (09:18 -0700)]
KVM: nVMX: expose VPID capability to L1

Expose VPID capability to L1. For nested guests, we don't do anything
specific for single context invalidation. Hence, only advertise support
for global context invalidation. The major benefit of nested VPID comes
from having separate vpids when switching between L1 and L2, and also
when L2's vCPUs not sched in/out on L1.

Reviewed-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: nVMX: nested VPID emulation
Wanpeng Li [Tue, 13 Oct 2015 16:18:36 +0000 (09:18 -0700)]
KVM: nVMX: nested VPID emulation

VPID is used to tag address space and avoid a TLB flush. Currently L0 use
the same VPID to run L1 and all its guests. KVM flushes VPID when switching
between L1 and L2.

This patch advertises VPID to the L1 hypervisor, then address space of L1
and L2 can be separately treated and avoid TLB flush when swithing between
L1 and L2. For each nested vmentry, if vpid12 is changed, reuse shadow vpid
w/ an invvpid.

Performance:

run lmbench on L2 w/ 3.5 kernel.

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
kernel    Linux 3.5.0-1 1.2200 1.3700 1.4500 4.7800 2.3300 5.60000 2.88000  nested VPID
kernel    Linux 3.5.0-1 1.2600 1.4300 1.5600   12.7   12.9 3.49000 7.46000  vanilla

Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: nVMX: emulate the INVVPID instruction
Wanpeng Li [Tue, 13 Oct 2015 16:12:21 +0000 (09:12 -0700)]
KVM: nVMX: emulate the INVVPID instruction

Add the INVVPID instruction emulation.

Reviewed-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: VMX: introduce __vmx_flush_tlb to handle specific vpid
Wanpeng Li [Wed, 23 Sep 2015 10:26:57 +0000 (18:26 +0800)]
KVM: VMX: introduce __vmx_flush_tlb to handle specific vpid

Introduce __vmx_flush_tlb() to handle specific vpid.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: VMX: adjust interface to allocate/free_vpid
Wanpeng Li [Wed, 16 Sep 2015 09:30:05 +0000 (17:30 +0800)]
KVM: VMX: adjust interface to allocate/free_vpid

Adjust allocate/free_vid so that they can be reused for the nested vpid.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
Kosuke Tatsukawa [Fri, 9 Oct 2015 12:21:55 +0000 (12:21 +0000)]
kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c

async_pf_execute() seems to be missing a memory barrier which might
cause the waker to not notice the waiter and miss sending a wake_up as
in the following figure.

        async_pf_execute                    kvm_vcpu_block
------------------------------------------------------------------------
spin_lock(&vcpu->async_pf.lock);
if (waitqueue_active(&vcpu->wq))
/* The CPU might reorder the test for
   the waitqueue up here, before
   prior writes complete */
                                    prepare_to_wait(&vcpu->wq, &wait,
                                      TASK_INTERRUPTIBLE);
                                    /*if (kvm_vcpu_check_block(vcpu) < 0) */
                                     /*if (kvm_arch_vcpu_runnable(vcpu)) { */
                                      ...
                                      return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
                                        !vcpu->arch.apf.halted)
                                        || !list_empty_careful(&vcpu->async_pf.done)
                                     ...
                                     return 0;
list_add_tail(&apf->link,
  &vcpu->async_pf.done);
spin_unlock(&vcpu->async_pf.lock);
                                    waited = true;
                                    schedule();
------------------------------------------------------------------------

The attached patch adds the missing memory barrier.

I found this issue when I was looking through the linux source code
for places calling waitqueue_active() before wake_up*(), but without
preceding memory barriers, after sending a patch to fix a similar
issue in drivers/tty/n_tty.c  (Details about the original issue can be
found here: https://lkml.org/lkml/2015/9/28/849).

Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: don't notify userspace IOAPIC on edge EOI
Radim Krčmář [Thu, 8 Oct 2015 18:30:00 +0000 (20:30 +0200)]
KVM: x86: don't notify userspace IOAPIC on edge EOI

On real hardware, edge-triggered interrupts don't set a bit in TMR,
which means that IOAPIC isn't notified on EOI.  Do the same here.

Staying in guest/kernel mode after edge EOI is what we want for most
devices.  If some bugs could be nicely worked around with edge EOI
notifications, we should invest in a better interface.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: fix edge EOI and IOAPIC reconfig race
Radim Krčmář [Thu, 8 Oct 2015 18:23:34 +0000 (20:23 +0200)]
KVM: x86: fix edge EOI and IOAPIC reconfig race

KVM uses eoi_exit_bitmap to track vectors that need an action on EOI.
The problem is that IOAPIC can be reconfigured while an interrupt with
old configuration is pending and eoi_exit_bitmap only remembers the
newest configuration;  thus EOI from the pending interrupt is not
recognized.

(Reconfiguration is not a problem for level interrupts, because IOAPIC
 sends interrupt with the new configuration.)

For an edge interrupt with ACK notifiers, like i8254 timer; things can
happen in this order
 1) IOAPIC inject a vector from i8254
 2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap
    on original VCPU gets cleared
 3) guest's handler for the vector does EOI
 4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is
    not in that VCPU's eoi_exit_bitmap
 5) i8254 stops working

A simple solution is to set the IOAPIC vector in eoi_exit_bitmap if the
vector is in PIR/IRR/ISR.

This creates an unwanted situation if the vector is reused by a
non-IOAPIC source, but I think it is so rare that we don't want to make
the solution more sophisticated.  The simple solution also doesn't work
if we are reconfiguring the vector.  (Shouldn't happen in the wild and
I'd rather fix users of ACK notifiers instead of working around that.)

The are no races because ioapic injection and reconfig are locked.

Fixes: b053b2aef25d ("KVM: x86: Add EOI exit bitmap inference")
[Before b053b2aef25d, this bug happened only with APICv.]
Fixes: c7c9c56ca26f ("x86, apicv: add virtual interrupt delivery support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm: x86: set KVM_REQ_EVENT when updating IRR
Radim Krčmář [Thu, 8 Oct 2015 18:23:33 +0000 (20:23 +0200)]
kvm: x86: set KVM_REQ_EVENT when updating IRR

After moving PIR to IRR, the interrupt needs to be delivered manually.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMerge branch 'kvm-master' into HEAD
Paolo Bonzini [Wed, 14 Oct 2015 14:35:15 +0000 (16:35 +0200)]
Merge branch 'kvm-master' into HEAD

Merge more important SMM fixes.

9 years agoKVM: x86: fix RSM into 64-bit protected mode
Paolo Bonzini [Wed, 14 Oct 2015 13:25:52 +0000 (15:25 +0200)]
KVM: x86: fix RSM into 64-bit protected mode

In order to get into 64-bit protected mode, you need to enable
paging while EFER.LMA=1.  For this to work, CS.L must be 0.
Currently, we load the segments before CR0 and CR4, which means
that if RSM returns into 64-bit protected mode CS.L is already 1
and everything breaks.

Luckily, CS.L=0 is always the case when executing RSM, because it
is forbidden to execute RSM from 64-bit protected mode.  Hence it
is enough to load CR0 and CR4 first, and only then the segments.

Fixes: 660a5d517aaab9187f93854425c4c63f4a09195c
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: fix previous commit for 32-bit
Paolo Bonzini [Wed, 14 Oct 2015 13:51:08 +0000 (15:51 +0200)]
KVM: x86: fix previous commit for 32-bit

Unfortunately I only noticed this after pushing.

Fixes: f0d648bdf0a5bbc91da6099d5282f77996558ea4
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMerge branch 'kvm-master' into HEAD
Paolo Bonzini [Tue, 13 Oct 2015 19:32:50 +0000 (21:32 +0200)]
Merge branch 'kvm-master' into HEAD

This merge brings in a couple important SMM fixes, which makes it
easier to test latest KVM with unrestricted_guest=0 and to test
the in-progress work on SMM support in the firmware.

Conflicts:
arch/x86/kvm/x86.c

9 years agoKVM: x86: fix SMI to halted VCPU
Paolo Bonzini [Tue, 13 Oct 2015 08:19:35 +0000 (10:19 +0200)]
KVM: x86: fix SMI to halted VCPU

An SMI to a halted VCPU must wake it up, hence a VCPU with a pending
SMI must be considered runnable.

Fixes: 64d6067057d9658acb8675afcfba549abdb7fc16
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: clean up kvm_arch_vcpu_runnable
Paolo Bonzini [Tue, 13 Oct 2015 08:18:53 +0000 (10:18 +0200)]
KVM: x86: clean up kvm_arch_vcpu_runnable

Split the huge conditional in two functions.

Fixes: 64d6067057d9658acb8675afcfba549abdb7fc16
Cc: stable@vger.kernel.org
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: map/unmap private slots in __x86_set_memory_region
Paolo Bonzini [Mon, 12 Oct 2015 11:56:27 +0000 (13:56 +0200)]
KVM: x86: map/unmap private slots in __x86_set_memory_region

Otherwise, two copies (one of them never populated and thus bogus)
are allocated for the regular and SMM address spaces.  This breaks
SMM with EPT but without unrestricted guest support, because the
SMM copy of the identity page map is all zeros.

By moving the allocation to the caller we also remove the last
vestiges of kernel-allocated memory regions (not accessible anymore
in userspace since commit b74a07beed0e, "KVM: Remove kernel-allocated
memory regions", 2010-06-21); that is a nice bonus.

Reported-by: Alexandre DERUMIER <aderumier@odiso.com>
Cc: stable@vger.kernel.org
Fixes: 9da0e4d5ac969909f6b435ce28ea28135a9cbd69
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: build kvm_userspace_memory_region in x86_set_memory_region
Paolo Bonzini [Mon, 12 Oct 2015 11:38:32 +0000 (13:38 +0200)]
KVM: x86: build kvm_userspace_memory_region in x86_set_memory_region

The next patch will make x86_set_memory_region fill the
userspace_addr.  Since the struct is not used untouched
anymore, it makes sense to build it in x86_set_memory_region
directly; it also simplifies the callers.

Reported-by: Alexandre DERUMIER <aderumier@odiso.com>
Cc: stable@vger.kernel.org
Fixes: 9da0e4d5ac969909f6b435ce28ea28135a9cbd69
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMerge tag 'kvm-s390-next-20151013' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Tue, 13 Oct 2015 14:44:51 +0000 (16:44 +0200)]
Merge tag 'kvm-s390-next-20151013' of git://git./linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes for 4.4

A bunch of fixes and optimizations for interrupt and time
handling. No fix is important enough to qualify for 4.3 or
stable.

9 years agoKVM: s390: factor out reading of the guest TOD clock
David Hildenbrand [Tue, 29 Sep 2015 14:20:36 +0000 (16:20 +0200)]
KVM: s390: factor out reading of the guest TOD clock

Let's factor this out and always use get_tod_clock_fast() when
reading the guest TOD.

STORE CLOCK FAST does not do serialization and, therefore, might
result in some fuzziness between different processors in a way
that subsequent calls on different CPUs might have time stamps that
are earlier. This semantics is fine though for all KVM use cases.
To make it obvious that the new function has STORE CLOCK FAST
semantics we name it kvm_s390_get_tod_clock_fast.

With this patch, we only have a handful of places were we
have to care about STP sync (using preempt_disable() logic).

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: factor out and fix setting of guest TOD clock
David Hildenbrand [Tue, 12 May 2015 07:49:14 +0000 (09:49 +0200)]
KVM: s390: factor out and fix setting of guest TOD clock

Let's move that whole logic into one function. We now always use unsigned
values when calculating the epoch (to avoid over/underflow defined).
Also, we always have to get all VCPUs out of SIE before doing the update
to avoid running differing VCPUs with different TODs.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: switch to get_tod_clock() and fix STP sync races
David Hildenbrand [Tue, 29 Sep 2015 14:27:24 +0000 (16:27 +0200)]
KVM: s390: switch to get_tod_clock() and fix STP sync races

Nobody except early.c makes use of store_tod_clock() to handle the
cc. So if we would get a cc != 0, we would be in more trouble.

Let's replace all users with get_tod_clock(). Returning a cc
on an ioctl sounded strange either way.

We can now also easily move the get_tod_clock() call into the
preempt_disable() section. This is in fact necessary to make the
STP sync work as expected. Otherwise the host TOD could change
and we would end up with a wrong epoch calculation.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: correctly handle injection of pgm irqs and per events
David Hildenbrand [Mon, 4 May 2015 10:38:48 +0000 (12:38 +0200)]
KVM: s390: correctly handle injection of pgm irqs and per events

PER events can always co-exist with other program interrupts.

For now, we always overwrite all program interrupt parameters when
injecting any type of program interrupt.

Let's handle that correctly by only overwriting the relevant portion of
the program interrupt parameters. Therefore we can now inject PER events
and ordinary program interrupts concurrently, resulting in no loss of
program interrupts. This will especially by helpful when manually detecting
PER events later - as both types might be triggered during one SIE exit.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: simplify in-kernel program irq injection
David Hildenbrand [Thu, 20 Nov 2014 12:49:32 +0000 (13:49 +0100)]
KVM: s390: simplify in-kernel program irq injection

The main reason to keep program injection in kernel separated until now
was that we were able to do some checking, if really only the owning
thread injects program interrupts (via waitqueue_active(li->wq)).

This BUG_ON was never triggered and the chances of really hitting it, if
another thread injected a program irq to another vcpu, were very small.

Let's drop this check and turn kvm_s390_inject_program_int() and
kvm_s390_inject_prog_irq() into simple inline functions that makes use of
kvm_s390_inject_vcpu().

__must_check can be dropped as they are implicitely given by
kvm_s390_inject_vcpu(), to avoid ugly long function prototypes.

Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: drop out early in kvm_s390_has_irq()
David Hildenbrand [Wed, 6 May 2015 11:51:29 +0000 (13:51 +0200)]
KVM: s390: drop out early in kvm_s390_has_irq()

Let's get rid of the local variable and exit directly if we found
any pending interrupt. This is not only faster, but also better
readable.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: kvm_arch_vcpu_runnable already cares about timer interrupts
David Hildenbrand [Wed, 23 Sep 2015 10:25:15 +0000 (12:25 +0200)]
KVM: s390: kvm_arch_vcpu_runnable already cares about timer interrupts

We can remove that double check.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: set interception requests for all floating irqs
David Hildenbrand [Mon, 28 Sep 2015 12:27:51 +0000 (14:27 +0200)]
KVM: s390: set interception requests for all floating irqs

No need to separate pending and floating irqs when setting interception
requests. Let's do it for all equally.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: disabled wait cares about machine checks, not PER
David Hildenbrand [Mon, 28 Sep 2015 11:32:38 +0000 (13:32 +0200)]
KVM: s390: disabled wait cares about machine checks, not PER

We don't care about program event recording irqs (synchronous
program irqs) but asynchronous irqs when checking for disabled
wait. Machine checks were missing.

Let's directly switch to the functions we have for that purpose
instead of testing once again for magic bits.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: remove unused variable in __inject_vm
Christian Borntraeger [Wed, 16 Sep 2015 10:14:52 +0000 (12:14 +0200)]
KVM: s390: remove unused variable in __inject_vm

the float int structure is no longer used in __inject_vm.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoiommu/vt-d: Add a command line parameter for VT-d posted-interrupts
Feng Wu [Fri, 18 Sep 2015 14:29:56 +0000 (22:29 +0800)]
iommu/vt-d: Add a command line parameter for VT-d posted-interrupts

Enable VT-d Posted-Interrtups and add a command line
parameter for it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: Update Posted-Interrupts Descriptor when vCPU is blocked
Feng Wu [Fri, 18 Sep 2015 14:29:55 +0000 (22:29 +0800)]
KVM: Update Posted-Interrupts Descriptor when vCPU is blocked

This patch updates the Posted-Interrupts Descriptor when vCPU
is blocked.

pre-block:
- Add the vCPU to the blocked per-CPU list
- Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

post-block:
- Remove the vCPU from the per-CPU list

Signed-off-by: Feng Wu <feng.wu@intel.com>
[Concentrate invocation of pre/post-block hooks to vcpu_block. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: Update Posted-Interrupts Descriptor when vCPU is preempted
Feng Wu [Fri, 18 Sep 2015 14:29:54 +0000 (22:29 +0800)]
KVM: Update Posted-Interrupts Descriptor when vCPU is preempted

This patch updates the Posted-Interrupts Descriptor when vCPU
is preempted.

sched out:
- Set 'SN' to suppress furture non-urgent interrupts posted for
the vCPU.

sched in:
- Clear 'SN'
- Change NDST if vCPU is scheduled to a different CPU
- Set 'NV' to POSTED_INTR_VECTOR

Signed-off-by: Feng Wu <feng.wu@intel.com>
[Include asm/cpu.h to fix !CONFIG_SMP compilation. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: select IRQ_BYPASS_MANAGER
Feng Wu [Fri, 18 Sep 2015 14:29:40 +0000 (22:29 +0800)]
KVM: x86: select IRQ_BYPASS_MANAGER

Select IRQ_BYPASS_MANAGER for x86 when CONFIG_KVM is set

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Update IRTE for posted-interrupts
Feng Wu [Fri, 18 Sep 2015 14:29:51 +0000 (22:29 +0800)]
KVM: x86: Update IRTE for posted-interrupts

This patch adds the routine to update IRTE for posted-interrupts
when guest changes the interrupt configuration.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[Squashed in automatically generated patch from the build robot
 "KVM: x86: vcpu_to_pi_desc() can be static" - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agovfio: Register/unregister irq_bypass_producer
Feng Wu [Fri, 18 Sep 2015 14:29:50 +0000 (22:29 +0800)]
vfio: Register/unregister irq_bypass_producer

This patch adds the registration/unregistration of an
irq_bypass_producer for MSI/MSIx on vfio pci devices.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: make kvm_set_msi_irq() public
Feng Wu [Fri, 18 Sep 2015 14:29:49 +0000 (22:29 +0800)]
KVM: make kvm_set_msi_irq() public

Make kvm_set_msi_irq() public, we can use this function outside.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: Define a new interface kvm_intr_is_single_vcpu()
Feng Wu [Fri, 18 Sep 2015 14:29:47 +0000 (22:29 +0800)]
KVM: Define a new interface kvm_intr_is_single_vcpu()

This patch defines a new interface kvm_intr_is_single_vcpu(),
which can returns whether the interrupt is for single-CPU or not.

It is used by VT-d PI, since now we only support single-CPU
interrupts, For lowest-priority interrupts, if user configures
it via /proc/irq or uses irqbalance to make it single-CPU, we
can use PI to deliver the interrupts to it. Full functionality
of lowest-priority support will be added later.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: Add some helper functions for Posted-Interrupts
Feng Wu [Fri, 18 Sep 2015 14:29:46 +0000 (22:29 +0800)]
KVM: Add some helper functions for Posted-Interrupts

This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
[Make the new functions inline. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: Extend struct pi_desc for VT-d Posted-Interrupts
Feng Wu [Fri, 18 Sep 2015 14:29:45 +0000 (22:29 +0800)]
KVM: Extend struct pi_desc for VT-d Posted-Interrupts

Extend struct pi_desc for VT-d Posted-Interrupts.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
Feng Wu [Fri, 18 Sep 2015 14:29:53 +0000 (22:29 +0800)]
KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'

This patch adds an arch specific hooks 'arch_update' in
'struct kvm_kernel_irqfd'. On Intel side, it is used to
update the IRTE when VT-d posted-interrupts is used.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: eventfd: add irq bypass consumer management
Eric Auger [Fri, 18 Sep 2015 14:29:44 +0000 (22:29 +0800)]
KVM: eventfd: add irq bypass consumer management

This patch adds the registration/unregistration of an
irq_bypass_consumer on irqfd assignment/deassignment.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: introduce kvm_arch functions for IRQ bypass
Eric Auger [Fri, 18 Sep 2015 14:29:43 +0000 (22:29 +0800)]
KVM: introduce kvm_arch functions for IRQ bypass

This patch introduces
- kvm_arch_irq_bypass_add_producer
- kvm_arch_irq_bypass_del_producer
- kvm_arch_irq_bypass_stop
- kvm_arch_irq_bypass_start

They make possible to specialize the KVM IRQ bypass consumer in
case CONFIG_KVM_HAVE_IRQ_BYPASS is set.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
[Add weak implementations of the callbacks. - Feng]
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: create kvm_irqfd.h
Eric Auger [Fri, 18 Sep 2015 14:29:42 +0000 (22:29 +0800)]
KVM: create kvm_irqfd.h

Move _irqfd_resampler and _irqfd struct declarations in a new
public header: kvm_irqfd.h. They are respectively renamed into
kvm_kernel_irqfd_resampler and kvm_kernel_irqfd. Those datatypes
will be used by architecture specific code, in the context of
IRQ bypass manager integration.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agovirt: Add virt directory to the top Makefile
Feng Wu [Tue, 22 Sep 2015 08:47:29 +0000 (16:47 +0800)]
virt: Add virt directory to the top Makefile

We need to build files in virt/lib/, which are now used by
KVM and VFIO, so add virt directory to the top Makefile.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agovirt: IRQ bypass manager
Alex Williamson [Fri, 18 Sep 2015 14:29:39 +0000 (22:29 +0800)]
virt: IRQ bypass manager

When a physical I/O device is assigned to a virtual machine through
facilities like VFIO and KVM, the interrupt for the device generally
bounces through the host system before being injected into the VM.
However, hardware technologies exist that often allow the host to be
bypassed for some of these scenarios.  Intel Posted Interrupts allow
the specified physical edge interrupts to be directly injected into a
guest when delivered to a physical processor while the vCPU is
running.  ARM IRQ Forwarding allows forwarded physical interrupts to
be directly deactivated by the guest.

The IRQ bypass manager here is meant to provide the shim to connect
interrupt producers, generally the host physical device driver, with
interrupt consumers, generally the hypervisor, in order to configure
these bypass mechanism.  To do this, we base the connection on a
shared, opaque token.  For KVM-VFIO this is expected to be an
eventfd_ctx since this is the connection we already use to connect an
eventfd to an irqfd on the in-kernel path.  When a producer and
consumer with matching tokens is found, callbacks via both registered
participants allow the bypass facilities to be automatically enabled.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoirq_remapping: move structs outside #ifdef
Paolo Bonzini [Fri, 18 Sep 2015 16:04:17 +0000 (18:04 +0200)]
irq_remapping: move structs outside #ifdef

This is friendlier to clients of the code, who are going to prepare
vcpu_data structs unconditionally, even if CONFIG_IRQ_REMAP is not
defined.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agox86: kvmclock: abolish PVCLOCK_COUNTS_FROM_ZERO
Radim Krčmář [Fri, 18 Sep 2015 15:54:29 +0000 (17:54 +0200)]
x86: kvmclock: abolish PVCLOCK_COUNTS_FROM_ZERO

Newer KVM won't be exposing PVCLOCK_COUNTS_FROM_ZERO anymore.
The purpose of that flags was to start counting system time from 0 when
the KVM clock has been initialized.
We can achieve the same by selecting one read as the initial point.

A simple subtraction will work unless the KVM clock count overflows
earlier (has smaller width) than scheduler's cycle count.  We should be
safe till x86_128.

Because PVCLOCK_COUNTS_FROM_ZERO was enabled only on new hypervisors,
setting sched clock as stable based on PVCLOCK_TSC_STABLE_BIT might
regress on older ones.

I presume we don't need to change kvm_clock_read instead of introducing
kvm_sched_clock_read.  A problem could arise in case sched_clock is
expected to return the same value as get_cycles, but we should have
merged those clocks in that case.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: VMX: drop rdtscp_enabled field
Xiao Guangrong [Wed, 9 Sep 2015 06:05:57 +0000 (14:05 +0800)]
KVM: VMX: drop rdtscp_enabled field

Check cpuid bit instead of it

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: VMX: clean up bit operation on SECONDARY_VM_EXEC_CONTROL
Xiao Guangrong [Wed, 9 Sep 2015 06:05:56 +0000 (14:05 +0800)]
KVM: VMX: clean up bit operation on SECONDARY_VM_EXEC_CONTROL

Use vmcs_set_bits() and vmcs_clear_bits() to clean up the code

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update
Xiao Guangrong [Wed, 9 Sep 2015 06:05:55 +0000 (14:05 +0800)]
KVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update

Unify the update in vmx_cpuid_update()

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Rewrite to use vmcs_set_secondary_exec_control. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: VMX: align vmx->nested.nested_vmx_secondary_ctls_high to vmx->rdtscp_enabled
Paolo Bonzini [Tue, 15 Sep 2015 15:34:42 +0000 (17:34 +0200)]
KVM: VMX: align vmx->nested.nested_vmx_secondary_ctls_high to vmx->rdtscp_enabled

The SECONDARY_EXEC_RDTSCP must be available iff RDTSCP is enabled in the
guest.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: VMX: simplify invpcid handling in vmx_cpuid_update()
Xiao Guangrong [Wed, 9 Sep 2015 06:05:54 +0000 (14:05 +0800)]
KVM: VMX: simplify invpcid handling in vmx_cpuid_update()

If vmx_invpcid_supported() is true, second execution control
filed must be supported and SECONDARY_EXEC_ENABLE_INVPCID
must have already been set in current vmcs by
vmx_secondary_exec_control()

If vmx_invpcid_supported() is false, no need to clear
SECONDARY_EXEC_ENABLE_INVPCID

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: VMX: simplify rdtscp handling in vmx_cpuid_update()
Xiao Guangrong [Wed, 9 Sep 2015 06:05:53 +0000 (14:05 +0800)]
KVM: VMX: simplify rdtscp handling in vmx_cpuid_update()

if vmx_rdtscp_supported() is true SECONDARY_EXEC_RDTSCP must
have already been set in current vmcs by
vmx_secondary_exec_control()

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: VMX: drop rdtscp_enabled check in prepare_vmcs02()
Xiao Guangrong [Wed, 9 Sep 2015 06:05:52 +0000 (14:05 +0800)]
KVM: VMX: drop rdtscp_enabled check in prepare_vmcs02()

SECONDARY_EXEC_RDTSCP set for L2 guest comes from vmcs12

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: add pcommit support
Xiao Guangrong [Wed, 9 Sep 2015 06:05:51 +0000 (14:05 +0800)]
KVM: x86: add pcommit support

Pass PCOMMIT CPU feature to guest to enable PCOMMIT instruction

Currently we do not catch pcommit instruction for L1 guest and
allow L1 to catch this instruction for L2 if, as required by the spec,
L1 can enumerate the PCOMMIT instruction via CPUID:
| IA32_VMX_PROCBASED_CTLS2[53] (which enumerates support for the
| 1-setting of PCOMMIT exiting) is always the same as
| CPUID.07H:EBX.PCOMMIT[bit 22]. Thus, software can set PCOMMIT exiting
| to 1 if and only if the PCOMMIT instruction is enumerated via CPUID

The spec can be found at
https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: allow guest to use cflushopt and clwb
Xiao Guangrong [Wed, 9 Sep 2015 06:05:50 +0000 (14:05 +0800)]
KVM: x86: allow guest to use cflushopt and clwb

Pass these CPU features to guest to enable them in guest

They are needed by nvdimm drivers

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: vmx: disable posted interrupts if no local APIC
Paolo Bonzini [Mon, 28 Sep 2015 09:58:14 +0000 (11:58 +0200)]
KVM: vmx: disable posted interrupts if no local APIC

Uniprocessor 32-bit randconfigs can disable the local APIC, and posted
interrupts require reserving a vector on the LAPIC, so they are
incompatible.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm/x86: Hyper-V HV_X64_MSR_VP_RUNTIME support
Andrey Smetanin [Wed, 16 Sep 2015 09:29:50 +0000 (12:29 +0300)]
kvm/x86: Hyper-V HV_X64_MSR_VP_RUNTIME support

HV_X64_MSR_VP_RUNTIME msr used by guest to get
"the time the virtual processor consumes running guest code,
and the time the associated logical processor spends running
hypervisor code on behalf of that guest."

Calculation of this time is performed by task_cputime_adjusted()
for vcpu task.

Necessary to support loading of winhv.sys in guest, which in turn is
required to support Windows VMBus.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm/x86: Hyper-V HV_X64_MSR_VP_INDEX export for QEMU.
Andrey Smetanin [Wed, 16 Sep 2015 09:29:49 +0000 (12:29 +0300)]
kvm/x86: Hyper-V HV_X64_MSR_VP_INDEX export for QEMU.

Insert Hyper-V HV_X64_MSR_VP_INDEX into msr's emulated list,
so QEMU can set Hyper-V features cpuid HV_X64_MSR_VP_INDEX_AVAILABLE
bit correctly. KVM emulation part is in place already.

Necessary to support loading of winhv.sys in guest, which in turn is
required to support Windows VMBus.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm/x86: Hyper-V HV_X64_MSR_RESET msr
Andrey Smetanin [Wed, 16 Sep 2015 09:29:48 +0000 (12:29 +0300)]
kvm/x86: Hyper-V HV_X64_MSR_RESET msr

HV_X64_MSR_RESET msr is used by Hyper-V based Windows guest
to reset guest VM by hypervisor.

Necessary to support loading of winhv.sys in guest, which in turn is
required to support Windows VMBus.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm: add capability for any-length ioeventfds
Jason Wang [Tue, 15 Sep 2015 06:41:59 +0000 (14:41 +0800)]
kvm: add capability for any-length ioeventfds

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm: add tracepoint for fast mmio
Jason Wang [Tue, 15 Sep 2015 06:41:58 +0000 (14:41 +0800)]
kvm: add tracepoint for fast mmio

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm: use kmalloc() instead of kzalloc() during iodev register/unregister
Jason Wang [Tue, 25 Aug 2015 09:05:46 +0000 (17:05 +0800)]
kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

All fields of kvm_io_range were initialized or copied explicitly
afterwards. So switch to use kmalloc().

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Add support for local interrupt requests from userspace
Steve Rutherford [Thu, 30 Jul 2015 09:27:16 +0000 (11:27 +0200)]
KVM: x86: Add support for local interrupt requests from userspace

In order to enable userspace PIC support, the userspace PIC needs to
be able to inject local interrupts even when the APICs are in the
kernel.

KVM_INTERRUPT now supports sending local interrupts to an APIC when
APICs are in the kernel.

The ready_for_interrupt_request flag is now only set when the CPU/APIC
will immediately accept and inject an interrupt (i.e. APIC has not
masked the PIC).

When the PIC wishes to initiate an INTA cycle with, say, CPU0, it
kicks CPU0 out of the guest, and renedezvous with CPU0 once it arrives
in userspace.

When the CPU/APIC unmasks the PIC, a KVM_EXIT_IRQ_WINDOW_OPEN is
triggered, so that userspace has a chance to inject a PIC interrupt
if it had been pending.

Overall, this design can lead to a small number of spurious userspace
renedezvous. In particular, whenever the PIC transistions from low to
high while it is masked and whenever the PIC becomes unmasked while
it is low.

Note: this does not buffer more than one local interrupt in the
kernel, so the VMM needs to enter the guest in order to complete
interrupt injection before injecting an additional interrupt.

Compiles for x86.

Can pass the KVM Unit Tests.

Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Add EOI exit bitmap inference
Steve Rutherford [Thu, 30 Jul 2015 06:32:35 +0000 (23:32 -0700)]
KVM: x86: Add EOI exit bitmap inference

In order to support a userspace IOAPIC interacting with an in kernel
APIC, the EOI exit bitmaps need to be configurable.

If the IOAPIC is in userspace (i.e. the irqchip has been split), the
EOI exit bitmaps will be set whenever the GSI Routes are configured.
In particular, for the low MSI routes are reservable for userspace
IOAPICs. For these MSI routes, the EOI Exit bit corresponding to the
destination vector of the route will be set for the destination VCPU.

The intention is for the userspace IOAPICs to use the reservable MSI
routes to inject interrupts into the guest.

This is a slight abuse of the notion of an MSI Route, given that MSIs
classically bypass the IOAPIC. It might be worthwhile to add an
additional route type to improve clarity.

Compile tested for Intel x86.

Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Add KVM exit for IOAPIC EOIs
Steve Rutherford [Thu, 30 Jul 2015 06:21:41 +0000 (23:21 -0700)]
KVM: x86: Add KVM exit for IOAPIC EOIs

Adds KVM_EXIT_IOAPIC_EOI which allows the kernel to EOI
level-triggered IOAPIC interrupts.

Uses a per VCPU exit bitmap to decide whether or not the IOAPIC needs
to be informed (which is identical to the EOI_EXIT_BITMAP field used
by modern x86 processors, but can also be used to elide kvm IOAPIC EOI
exits on older processors).

[Note: A prototype using ResampleFDs found that decoupling the EOI
from the VCPU's thread made it possible for the VCPU to not see a
recent EOI after reentering the guest. This does not match real
hardware.]

Compile tested for Intel x86.

Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Split the APIC from the rest of IRQCHIP.
Steve Rutherford [Thu, 30 Jul 2015 06:21:40 +0000 (23:21 -0700)]
KVM: x86: Split the APIC from the rest of IRQCHIP.

First patch in a series which enables the relocation of the
PIC/IOAPIC to userspace.

Adds capability KVM_CAP_SPLIT_IRQCHIP;

KVM_CAP_SPLIT_IRQCHIP enables the construction of LAPICs without the
rest of the irqchip.

Compile tested for x86.

Signed-off-by: Steve Rutherford <srutherford@google.com>
Suggested-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: unify handling of interrupt window
Paolo Bonzini [Thu, 30 Jul 2015 08:32:16 +0000 (10:32 +0200)]
KVM: x86: unify handling of interrupt window

The interrupt window is currently checked twice, once in vmx.c/svm.c and
once in dm_request_for_irq_injection.  The only difference is the extra
check for kvm_arch_interrupt_allowed in dm_request_for_irq_injection,
and the different return value (EINTR/KVM_EXIT_INTR for vmx.c/svm.c vs.
0/KVM_EXIT_IRQ_WINDOW_OPEN for dm_request_for_irq_injection).

However, dm_request_for_irq_injection is basically dead code!  Revive it
by removing the checks in vmx.c and svm.c's vmexit handlers, and
fixing the returned values for the dm_request_for_irq_injection case.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: introduce lapic_in_kernel
Paolo Bonzini [Wed, 29 Jul 2015 10:05:37 +0000 (12:05 +0200)]
KVM: x86: introduce lapic_in_kernel

Avoid pointer chasing and memory barriers, and simplify the code
when split irqchip (LAPIC in kernel, IOAPIC/PIC in userspace)
is introduced.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: replace vm_has_apicv hook with cpu_uses_apicv
Paolo Bonzini [Wed, 29 Jul 2015 09:49:59 +0000 (11:49 +0200)]
KVM: x86: replace vm_has_apicv hook with cpu_uses_apicv

This will avoid an unnecessary trip to ->kvm and from there to the VPIC.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: store IOAPIC-handled vectors in each VCPU
Paolo Bonzini [Wed, 29 Jul 2015 08:43:18 +0000 (10:43 +0200)]
KVM: x86: store IOAPIC-handled vectors in each VCPU

We can reuse the algorithm that computes the EOI exit bitmap to figure
out which vectors are handled by the IOAPIC.  The only difference
between the two is for edge-triggered interrupts other than IRQ8
that have no notifiers active; however, the IOAPIC does not have to
do anything special for these interrupts anyway.

This again limits the interactions between the IOAPIC and the LAPIC,
making it easier to move the former to userspace.

Inspired by a patch from Steve Rutherford.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: set TMR when the interrupt is accepted
Paolo Bonzini [Wed, 29 Jul 2015 13:03:06 +0000 (15:03 +0200)]
KVM: x86: set TMR when the interrupt is accepted

Do not compute TMR in advance.  Instead, set the TMR just before the interrupt
is accepted into the IRR.  This limits the coupling between IOAPIC and LAPIC.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agotools lib traceevent: update KVM plugin
Paolo Bonzini [Thu, 1 Oct 2015 10:26:50 +0000 (12:26 +0200)]
tools lib traceevent: update KVM plugin

The format of the role word has changed through the years and the
plugin was never updated; some VMX exit reasons were missing too.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMerge branch 'x86/for-kvm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip...
Paolo Bonzini [Thu, 1 Oct 2015 13:02:45 +0000 (15:02 +0200)]
Merge branch 'x86/for-kvm' of git://git./linux/kernel/git/tip/tip into HEAD

This merges a cleanup of asm/apic.h, which is needed by the KVM patches
to support VT-d posted interrupts.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoUse WARN_ON_ONCE for missing X86_FEATURE_NRIPS
Dirk Müller [Thu, 1 Oct 2015 11:43:42 +0000 (13:43 +0200)]
Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

The cpu feature flags are not ever going to change, so warning
everytime can cause a lot of kernel log spam
(in our case more than 10GB/hour).

The warning seems to only occur when nested virtualization is
enabled, so it's probably triggered by a KVM bug.  This is a
sensible and safe change anyway, and the KVM bug fix might not
be suitable for stable releases anyway.

Cc: stable@vger.kernel.org
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoUpdate KVM homepage Url
Dirk Müller [Thu, 1 Oct 2015 11:46:01 +0000 (13:46 +0200)]
Update KVM homepage Url

The old one appears to be a generic catch all page, which
is unhelpful.

Signed-off-by: Dirk Mueller <dmueller@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoRevert "KVM: SVM: use NPT page attributes"
Paolo Bonzini [Thu, 1 Oct 2015 11:20:22 +0000 (13:20 +0200)]
Revert "KVM: SVM: use NPT page attributes"

This reverts commit 3c2e7f7de3240216042b61073803b61b9b3cfb22.
Initializing the mapping from MTRR to PAT values was reported to
fail nondeterministically, and it also caused extremely slow boot
(due to caching getting disabled---bug 103321) with assigned devices.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: Sebastian Schuette <dracon@ewetel.net>
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoRevert "KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask"
Paolo Bonzini [Thu, 1 Oct 2015 11:19:55 +0000 (13:19 +0200)]
Revert "KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask"

This reverts commit 5492830370171b6a4ede8a3bfba687a8d0f25fa5.
It builds on the commit that is being reverted next.

Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoRevert "KVM: SVM: Sync g_pat with guest-written PAT value"
Paolo Bonzini [Thu, 1 Oct 2015 11:28:15 +0000 (13:28 +0200)]
Revert "KVM: SVM: Sync g_pat with guest-written PAT value"

This reverts commit e098223b789b4a618dacd79e5e0dad4a9d5018d1,
which has a dependency on other commits being reverted.

Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoRevert "KVM: x86: apply guest MTRR virtualization on host reserved pages"
Paolo Bonzini [Thu, 1 Oct 2015 11:12:47 +0000 (13:12 +0200)]
Revert "KVM: x86: apply guest MTRR virtualization on host reserved pages"

This reverts commit fd717f11015f673487ffc826e59b2bad69d20fe5.
It was reported to cause Machine Check Exceptions (bug 104091).

Reported-by: harn-solo@gmx.de
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agox86/x2apic: Make stub functions available even if !CONFIG_X86_LOCAL_APIC
Paolo Bonzini [Mon, 28 Sep 2015 10:26:31 +0000 (12:26 +0200)]
x86/x2apic: Make stub functions available even if !CONFIG_X86_LOCAL_APIC

Some CONFIG_X86_X2APIC functions, especially x2apic_enabled(), are not
declared if !CONFIG_X86_LOCAL_APIC.  However, the same stubs that work
for !CONFIG_X86_X2APIC are okay even if there is no local APIC support
at all.

Avoid the introduction of #ifdefs by moving the x2apic declarations
completely outside the CONFIG_X86_LOCAL_APIC block.  (Unfortunately,
diff generation messes up the actual change that this patch makes).
There is no semantic change because CONFIG_X86_X2APIC depends on
CONFIG_X86_LOCAL_APIC.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Feng Wu <feng.wu@intel.com>
Link: http://lkml.kernel.org/r/1443435991-35750-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
9 years agoRevert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR"
Radim Krčmář [Fri, 18 Sep 2015 15:54:30 +0000 (17:54 +0200)]
Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR"

Shifting pvclock_vcpu_time_info.system_time on write to KVM system time
MSR is a change of ABI.  Probably only 2.6.16 based SLES 10 breaks due
to its custom enhancements to kvmclock, but KVM never declared the MSR
only for one-shot initialization.  (Doc says that only one write is
needed.)

This reverts commit b7e60c5aedd2b63f16ef06fde4f81ca032211bc5.
And adds a note to the definition of PVCLOCK_COUNTS_FROM_ZERO.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoLinux 4.3-rc3
Linus Torvalds [Sun, 27 Sep 2015 11:50:08 +0000 (07:50 -0400)]
Linux 4.3-rc3

9 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 27 Sep 2015 10:51:42 +0000 (06:51 -0400)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Two bugfixes from Andy addressing at least some of the subtle NMI
  related wreckage which has been reported by Sasha Levin"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/nmi/64: Fix a paravirt stack-clobbering bug in the NMI code
  x86/paravirt: Replace the paravirt nop with a bona fide empty function

9 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 27 Sep 2015 10:50:23 +0000 (06:50 -0400)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Thomass Gleixner:
 "A bugfix for the atmel aic5 irq chip driver which caches the wrong
  data and thereby breaking resume"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/atmel-aic5: Use per chip mask caches in mask/unmask()

9 years agoMerge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Linus Torvalds [Sun, 27 Sep 2015 10:48:48 +0000 (06:48 -0400)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
 "Just two fixes: wire up the new system calls added during the last
  merge window, and fix another user access site"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: alignment: fix alignment handling for uaccess changes
  ARM: wire up new syscalls

9 years agoMerge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Linus Torvalds [Sun, 27 Sep 2015 10:45:18 +0000 (06:45 -0400)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "Our first real batch of fixes this release cycle.  Nothing really
  concerning, and diffstat is a bit inflated due to some DT contents
  moving around on STi platforms.

  There's a collection of them here:

   - A fixup for a build breakage that hits on arm64 allmodconfig in
     QCOM SCM firmware drivers
   - MMC fixes for OMAP that had quite a bit of breakage this merge
     window.
   - Misc build/warning fixes on PXA and OMAP
   - A couple of minor fixes for Beagleboard X15 which is now starting
     to see a few more users in the wild"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (31 commits)
  ARM: sti: dt: adapt DT to fix probe/bind issues in DRM driver
  ARM: dts: fix omap2+ address translation for pbias
  firmware: qcom: scm: Add function stubs for ARM64
  ARM: dts: am57xx-beagle-x15: use palmas-usb for USB2
  ARM: omap2plus_defconfig: enable GPIO_PCA953X
  ARM: dts: omap5-uevm.dts: fix i2c5 pinctrl offsets
  ARM: OMAP2+: AM43XX: Enable autoidle for clks in am43xx_init_late
  ARM: dts: am57xx-beagle-x15: Update Phy supplies
  ARM: pxa: balloon3: Fix build error
  ARM: dts: Fixup model name for HP t410 dts
  ARM: dts: DRA7: fix a typo in ethernet
  ARM: omap2plus_defconfig: make PCF857x built-in
  ARM: dts: Use ti,pbias compatible string for pbias
  ARM: OMAP5: Cleanup options for SoC only build
  ARM: DRA7: Select missing options for SoC only build
  ARM: OMAP2+: board-generic: Remove stale of_irq macros
  ARM: OMAP4+: PM: erratum is used by OMAP5 and DRA7 as well
  ARM: dts: omap3-igep: Move eth IRQ pinmux to IGEPv2 common dtsi
  ARM: dts: am57xx-beagle-x15: Add wakeup irq for mcp79410
  ARM: dts: am335x-phycore-som: Fix mpu voltage
  ...

9 years agoMerge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 27 Sep 2015 10:42:54 +0000 (06:42 -0400)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

Pull CIFS fixes from Steve French:
 "Four fixes from testing at the recent SMB3 Plugfest including two
  important authentication ones (one fixes authentication problems to
  some popular servers when clock times differ more than two hours
  between systems, the other fixes Kerberos authentication for SMB3)"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  fix encryption error checks on mount
  [SMB3] Fix sec=krb5 on smb3 mounts
  cifs: use server timestamp for ntlmv2 authentication
  disabling oplocks/leases via module parm enable_oplocks broken for SMB3

9 years agoMerge tag 'pxa-fixes-v4.3' of https://github.com/rjarzmik/linux into fixes
Olof Johansson [Sun, 27 Sep 2015 05:23:26 +0000 (22:23 -0700)]
Merge tag 'pxa-fixes-v4.3' of https://github.com/rjarzmik/linux into fixes

ARM: pxa: fixes for v4.3

These fixes are mainly regression fixes triggered by irq changes,
common clock framework introduction and sound side-effect of
other platforms.

* tag 'pxa-fixes-v4.3' of https://github.com/rjarzmik/linux:
  ARM: pxa: balloon3: Fix build error
  ARM: pxa: ssp: Fix build error by removing originally incorrect DT binding
  ARM: pxa: fix DFI bus lockups on startup

Signed-off-by: Olof Johansson <olof@lixom.net>
9 years agoMerge tag 'omap-for-v4.3/fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Olof Johansson [Sun, 27 Sep 2015 05:22:31 +0000 (22:22 -0700)]
Merge tag 'omap-for-v4.3/fixes-rc2' of git://git./linux/kernel/git/tmlind/linux-omap into fixes

Fixes for omaps for v4.3-rc cycle:

- Two more patches to fix most of the MMC regressions with the
  PBIAS regulator changes. At least two MMC driver related issues
  still seems to remain for omap3 legacy booting and omap4 duovero.
  Note that the dts changes depend on a recent regulator fix, and
  are based on the regulator commit now in mainline kernel

- Enable autoidle for am43xx clocks to prevent clocks from staying
  always on

- Fix i2c5 pinctrl offsets for omap5-uevm

- Enable PCA953X as that's needed for HDMI to work on omap5

- Update phy supplies for beagle x15 beta board

- Use palmas-usb for on beagle x15 to start using the related
  driver that recently got merged

* tag 'omap-for-v4.3/fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: fix omap2+ address translation for pbias
  ARM: dts: am57xx-beagle-x15: use palmas-usb for USB2
  ARM: omap2plus_defconfig: enable GPIO_PCA953X
  ARM: dts: omap5-uevm.dts: fix i2c5 pinctrl offsets
  ARM: OMAP2+: AM43XX: Enable autoidle for clks in am43xx_init_late
  ARM: dts: am57xx-beagle-x15: Update Phy supplies
  regulator: pbias: program pbias register offset in pbias driver
  ARM: omap2plus_defconfig: Enable MUSB DMA support
  ARM: DRA752: Add ID detect for ES2.0
  ARM: OMAP3: vc: fix 'or' always true warning
  ARM: OMAP2+: Fix booting if no timer parent clock is available
  ARM: OMAP2+: omap-device: fix race deferred probe of omap_hsmmc vs omap_device_late_init

Signed-off-by: Olof Johansson <olof@lixom.net>
9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Sun, 27 Sep 2015 01:05:23 +0000 (21:05 -0400)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This fixes the following issues:

   - check the return value of platform_get_irq as signed int in xgene.

   - skip adf_dev_restore on virtual functions in qat.

   - fix double-free with backlogged requests in marvell_cesa"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  hwrng: xgene - fix handling platform_get_irq
  crypto: qat - VF should never trigger SBR on PH
  crypto: marvell - properly handle CRYPTO_TFM_REQ_MAY_BACKLOG-flagged requests

9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Linus Torvalds [Sun, 27 Sep 2015 01:02:42 +0000 (21:02 -0400)]
Merge git://git./linux/kernel/git/nab/target-pending

Pull SCSI target fixes from Nicholas Bellinger:
 "This includes a iser-target series from Jenny + Sagi @ Mellanox that
  addresses the few remaining active I/O shutdown bugs, along with a
  patch to support zero-copy for immediate data payloads that gives a
  nice performance improvement for small block WRITEs.

  Also included are some recent >= v4.2 regression bug-fixes.  The most
  notable is a RCU conversion regression for SPC-3 PR registrations, and
  recent removal of obsolete RFC-3720 markers that introduced a login
  regression bug with MSFT iSCSI initiators.

  Thanks to everyone who has been testing + reporting bugs for v4.x"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  iscsi-target: Avoid OFMarker + IFMarker negotiation
  target: Make TCM_WRITE_PROTECT failure honor D_SENSE bit
  target: Fix target_sense_desc_format NULL pointer dereference
  target: Propigate backend read-only to core_tpg_add_lun
  target: Fix PR registration + APTPL RCU conversion regression
  iser-target: Skip data copy if all the command data comes as immediate
  iser-target: Change the recv buffers posting logic
  iser-target: Fix pending connections handling in target stack shutdown sequnce
  iser-target: Remove np_ prefix from isert_np members
  iser-target: Remove unused variables
  iser-target: Put the reference on commands waiting for unsol data
  iser-target: remove command with state ISTATE_REMOVE

9 years agoMerge tag 'usb-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 27 Sep 2015 01:00:28 +0000 (21:00 -0400)]
Merge tag 'usb-4.3-rc3' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some USB driver fixes for 4.3-rc3.

  There's the usual assortment of new device ids, combined with xhci and
  gadget driver fixes.  Full details in the shortlog.  All of these have
  been in linux-next with no reported problems"

* tag 'usb-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (34 commits)
  MAINTAINERS: remove amd5536udc USB gadget driver maintainer
  USB: whiteheat: fix potential null-deref at probe
  xhci: init command timeout timer earlier to avoid deleting it uninitialized
  xhci: change xhci 1.0 only restrictions to support xhci 1.1
  usb: xhci: exit early in xhci_setup_device() if we're halted or dying
  usb: xhci: stop everything on the first call to xhci_stop
  usb: xhci: Clear XHCI_STATE_DYING on start
  usb: xhci: lock mutex on xhci_stop
  xhci: Move xhci_pme_quirk() behind #ifdef CONFIG_PM
  xhci: give command abortion one more chance before killing xhci
  usb: Use the USB_SS_MULT() macro to get the burst multiplier.
  usb: dwc3: gadget: Fix BUG in RT config
  usb: musb: fix cppi channel teardown for isoch transfer
  usb: phy: isp1301: Export I2C module alias information
  usb: gadget: drop null test before destroy functions
  usb: gadget: dummy_hcd: in transfer(), return data sent, not limit
  usb: gadget: dummy_hcd: fix rescan logic for transfer
  usb: gadget: dummy_hcd: fix unneeded else-if condition
  usb: gadget: dummy_hcd: emulate sending zlp in packet logic
  usb: musb: dsps: fix polling in device-only mode
  ...

9 years agoMerge tag 'tty-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 27 Sep 2015 00:58:38 +0000 (20:58 -0400)]
Merge tag 'tty-4.3-rc3' of git://git./linux/kernel/git/gregkh/tty

Pull serial driver fix from Greg KH:
 "Here is one serial driver fix for 4.3-rc3 that resolves a module
  loading issue due to splitting up of the 8250 driver into smaller
  pieces.  It's been in linux-next with no reported problems"

* tag 'tty-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: serial: Add missing module license for 8250_base.ko

9 years agoMerge tag 'staging-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 27 Sep 2015 00:56:50 +0000 (20:56 -0400)]
Merge tag 'staging-4.3-rc3' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
 "Here are some tiny staging driver and documentation fixes for 4.3-rc3.

  All of these resolve reported issues that people have found and have
  been in the linux-next tree for a while with no problems"

* tag 'staging-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  MAINTAINERS: Update email address for Martyn Welch
  staging: ion: fix corruption of ion_import_dma_buf
  staging: dgap: Remove myself from the MAINTAINERS file
  staging: most: Add dependency to HAS_IOMEM
  staging: unisys: remove reference of visorutil
  staging: unisys: visornic: handle error return from device registration
  staging: unisys: stop device registration before visorbus registration
  staging: unisys: visorbus: Unregister driver on error
  staging: unisys: visornic: Fix receive bytes statistics
  staging: unisys: unregister netdev when create debugfs fails
  staging: fbtft: replace master->setup() with spi_setup()
  staging: fbtft: fix 9-bit SPI support detection
  staging/lustre: change Lustre URLs and mailing list
  staging/android: Update ION TODO per LPC discussion
  Staging: most: MOST and MOSTCORE should depend on HAS_DMA
  staging: most: fix HDM_USB dependencies and build errors

9 years agoMerge tag 'driver-core-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 27 Sep 2015 00:54:53 +0000 (20:54 -0400)]
Merge tag 'driver-core-4.3-rc3' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fix from Greg KH:
 "Here is one driver core fix for 4.3-rc3 that resolves a reported oops"

* tag 'driver-core-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  cpu/cacheinfo: Fix teardown path

9 years agoMerge tag 'char-misc-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 27 Sep 2015 00:53:15 +0000 (20:53 -0400)]
Merge tag 'char-misc-4.3-rc3' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here's some tiny char and misc driver fixes that resolve some reported
  errors for 4.3-rc3.

  All of these have been in linux-next with no problems for a while"

* tag 'char-misc-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  extcon: Fix attached value returned by is_extcon_changed
  Drivers: hv: vmbus: fix init_vp_index() for reloading hv_netvsc
  mei: fix debugfs files leak on error path
  thunderbolt: Allow loading of module on recent Apple MacBooks with thunderbolt 2 controller

9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sat, 26 Sep 2015 10:01:33 +0000 (06:01 -0400)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) When we run a tap on netlink sockets, we have to copy mmap'd SKBs
    instead of cloning them.  From Daniel Borkmann.

 2) When converting classical BPF into eBPF, fix the setting of the
    source reg to BPF_REG_X.  From Tycho Andersen.

 3) Fix igmpv3/mldv2 report parsing in the bridge multicast code, from
    Linus Lussing.

 4) Fix dst refcounting for ipv6 tunnels, from Martin KaFai Lau.

 5) Set NLM_F_REPLACE flag properly when replacing ipv6 routes, from
    Roopa Prabhu.

 6) Add some new cxgb4 PCI device IDs, from Hariprasad Shenai.

 7) Fix headroom tests and SKB leaks in ipv6 fragmentation code, from
    Florian Westphal.

 8) Check DMA mapping errors in bna driver, from Ivan Vecera.

 9) Several 8139cp bug fixes (dev_kfree_skb_any in interrupt context,
    misclearing of interrupt status in TX timeout handler, etc.) from
    David Woodhouse.

10) In tipc, reset SKB header pointer after skb_linearize(), from Erik
    Hugne.

11) Fix autobind races et al. in netlink code, from Herbert Xu with
    help from Tejun Heo and others.

12) Missing SET_NETDEV_DEV in sunvnet driver, from Sowmini Varadhan.

13) Fix various races in timewait timer and reqsk_queue_hadh_req, from
    Eric Dumazet.

14) Fix array overruns in mac80211, from Johannes Berg and Dan
    Carpenter.

15) Fix data race in rhashtable_rehash_one(), from Dmitriy Vyukov.

16) Fix race between poll_one_napi and napi_disable, from Neil Horman.

17) Fix byte order in geneve tunnel port config, from John W Linville.

18) Fix handling of ARP replies over lightweight tunnels, from Jiri
    Benc.

19) We can loop when fib rule dumps cross multiple SKBs, fix from Wilson
    Kok and Roopa Prabhu.

20) Several reference count handling bug fixes in the PHY/MDIO layer
    from Russel King.

21) Fix lockdep splat in ppp_dev_uninit(), from Guillaume Nault.

22) Fix crash in icmp_route_lookup(), from David Ahern.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  net: Fix panic in icmp_route_lookup
  net: update docbook comment for __mdiobus_register()
  ppp: fix lockdep splat in ppp_dev_uninit()
  net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected
  phy: marvell: add link partner advertised modes
  net: fix net_device refcounting
  phy: add phy_device_remove()
  phy: fixed-phy: properly validate phy in fixed_phy_update_state()
  net: fix phy refcounting in a bunch of drivers
  of_mdio: fix MDIO phy device refcounting
  phy: add proper phy struct device refcounting
  phy: fix mdiobus module safety
  net: dsa: fix of_mdio_find_bus() device refcount leak
  phy: fix of_mdio_find_bus() device refcount leak
  ip6_tunnel: Reduce log level in ip6_tnl_err() to debug
  ip6_gre: Reduce log level in ip6gre_err() to debug
  fib_rules: fix fib rule dumps across multiple skbs
  bnx2x: byte swap rss_key to comply to Toeplitz specs
  net: revert "net_sched: move tp->root allocation into fw_init()"
  lwtunnel: remove source and destination UDP port config option
  ...

9 years agonet: Fix panic in icmp_route_lookup
David Ahern [Thu, 24 Sep 2015 21:31:29 +0000 (15:31 -0600)]
net: Fix panic in icmp_route_lookup

Andrey reported a panic:

[ 7249.865507] BUG: unable to handle kernel pointer dereference at 000000b4
[ 7249.865559] IP: [<c16afeca>] icmp_route_lookup+0xaa/0x320
[ 7249.865598] *pdpt = 0000000030f7f001 *pde = 0000000000000000
[ 7249.865637] Oops: 0000 [#1]
...
[ 7249.866811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.3.0-999-generic #201509220155
[ 7249.866876] Hardware name: MSI MS-7250/MS-7250, BIOS 080014  08/02/2006
[ 7249.866916] task: c1a5ab00 ti: c1a52000 task.ti: c1a52000
[ 7249.866949] EIP: 0060:[<c16afeca>] EFLAGS: 00210246 CPU: 0
[ 7249.866981] EIP is at icmp_route_lookup+0xaa/0x320
[ 7249.867012] EAX: 00000000 EBX: f483ba48 ECX: 00000000 EDX: f2e18a00
[ 7249.867045] ESI: 000000c0 EDI: f483ba70 EBP: f483b9ec ESP: f483b974
[ 7249.867077]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 7249.867108] CR0: 8005003b CR2: 000000b4 CR3: 36ee07c0 CR4: 000006f0
[ 7249.867141] Stack:
[ 7249.867165]  320310ee 00000000 00000042 320310ee 00000000 c1aeca00
f3920240 f0c69180
[ 7249.867268]  f483ba04 f855058b a89b66cd f483ba44 f8962f4b 00000000
e659266c f483ba54
[ 7249.867361]  8004753c f483ba5c f8962f4b f2031140 000003c1 ffbd8fa0
c16b0e00 00000064
[ 7249.867448] Call Trace:
[ 7249.867494]  [<f855058b>] ? e1000_xmit_frame+0x87b/0xdc0 [e1000e]
[ 7249.867534]  [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
[ 7249.867576]  [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
[ 7249.867615]  [<c16b0e00>] ? icmp_send+0xa0/0x380
[ 7249.867648]  [<c16b102f>] icmp_send+0x2cf/0x380
[ 7249.867681]  [<f89c8126>] nf_send_unreach+0xa6/0xc0 [nf_reject_ipv4]
[ 7249.867714]  [<f89cd0da>] reject_tg+0x7a/0x9f [ipt_REJECT]
[ 7249.867746]  [<f88c29a7>] ipt_do_table+0x317/0x70c [ip_tables]
[ 7249.867780]  [<f895e0a6>] ? __nf_conntrack_find_get+0x166/0x3b0
[nf_conntrack]
[ 7249.867838]  [<f895eea8>] ? nf_conntrack_in+0x398/0x600 [nf_conntrack]
[ 7249.867889]  [<f84c0035>] iptable_filter_hook+0x35/0x80 [iptable_filter]
[ 7249.867933]  [<c16776a1>] nf_iterate+0x71/0x80
[ 7249.867970]  [<c1677715>] nf_hook_slow+0x65/0xc0
[ 7249.868002]  [<c1681811>] __ip_local_out_sk+0xc1/0xd0
[ 7249.868034]  [<c1680f30>] ? ip_forward_options+0x1a0/0x1a0
[ 7249.868066]  [<c1681836>] ip_local_out_sk+0x16/0x30
[ 7249.868097]  [<c1684054>] ip_send_skb+0x14/0x80
[ 7249.868129]  [<c16840f4>] ip_push_pending_frames+0x34/0x40
[ 7249.868163]  [<c16844a2>] ip_send_unicast_reply+0x282/0x310
[ 7249.868196]  [<c16a0863>] tcp_v4_send_reset+0x1b3/0x380
[ 7249.868227]  [<c16a1b63>] tcp_v4_rcv+0x323/0x990
[ 7249.868257]  [<c16776a1>] ? nf_iterate+0x71/0x80
[ 7249.868289]  [<c167dc2b>] ip_local_deliver_finish+0x8b/0x230
[ 7249.868322]  [<c167df4c>] ip_local_deliver+0x4c/0xa0
[ 7249.868353]  [<c167dba0>] ? ip_rcv_finish+0x390/0x390
[ 7249.868384]  [<c167d88c>] ip_rcv_finish+0x7c/0x390
[ 7249.868415]  [<c167e280>] ip_rcv+0x2e0/0x420
...

Prior to the VRF change the oif was not set in the flow struct, so the
VRF support should really have only added the vrf_master_ifindex lookup.

Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
Cc: Andrey Melnikov <temnota.am@gmail.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>