firefly-linux-kernel-4.4.55.git
14 years agox86, paravirt: don't compute pvclock adjustments if we trust the tsc
Glauber Costa [Tue, 11 May 2010 16:17:45 +0000 (12:17 -0400)]
x86, paravirt: don't compute pvclock adjustments if we trust the tsc

If the HV told us we can fully trust the TSC, skip any
correction

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agox86: KVM guest: Try using new kvm clock msrs
Glauber Costa [Tue, 11 May 2010 16:17:44 +0000 (12:17 -0400)]
x86: KVM guest: Try using new kvm clock msrs

We now added a new set of clock-related msrs in replacement of the old
ones. In theory, we could just try to use them and get a return value
indicating they do not exist, due to our use of kvm_write_msr_save.

However, kvm clock registration happens very early, and if we ever
try to write to a non-existant MSR, we raise a lethal #GP, since our
idt handlers are not in place yet.

So this patch tests for a cpuid feature exported by the host to
decide which set of msrs are supported.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
Glauber Costa [Tue, 11 May 2010 16:17:43 +0000 (12:17 -0400)]
KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID

Right now, we were using individual KVM_CAP entities to communicate
userspace about which cpuids we support. This is suboptimal, since it
generates a delay between the feature arriving in the host, and
being available at the guest.

A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID.
This makes userspace automatically aware of what we provide. And if we
ever add a new cpuid bit in the future, we have to do that again,
which create some complexity and delay in feature adoption.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86: add new KVMCLOCK cpuid feature
Glauber Costa [Tue, 11 May 2010 16:17:42 +0000 (12:17 -0400)]
KVM: x86: add new KVMCLOCK cpuid feature

This cpuid, KVM_CPUID_CLOCKSOURCE2, will indicate to the guest
that kvmclock is available through a new set of MSRs. The old ones
are deprecated.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86: change msr numbers for kvmclock
Glauber Costa [Tue, 11 May 2010 16:17:41 +0000 (12:17 -0400)]
KVM: x86: change msr numbers for kvmclock

Avi pointed out a while ago that those MSRs falls into the pentium
PMU range. So the idea here is to add new ones, and after a while,
deprecate the old ones.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agox86, paravirt: Add a global synchronization point for pvclock
Glauber Costa [Tue, 11 May 2010 16:17:40 +0000 (12:17 -0400)]
x86, paravirt: Add a global synchronization point for pvclock

In recent stress tests, it was found that pvclock-based systems
could seriously warp in smp systems. Using ingo's time-warp-test.c,
I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
(to be fair, it wasn't that bad in most of them). Investigating further, I
found out that such warps were caused by the very offset-based calculation
pvclock is based on.

This happens even on some machines that report constant_tsc in its tsc flags,
specially on multi-socket ones.

Two reads of the same kernel timestamp at approx the same time, will likely
have tsc timestamped in different occasions too. This means the delta we
calculate is unpredictable at best, and can probably be smaller in a cpu
that is legitimately reading clock in a forward ocasion.

Some adjustments on the host could make this window less likely to happen,
but still, it pretty much poses as an intrinsic problem of the mechanism.

A while ago, I though about using a shared variable anyway, to hold clock
last state, but gave up due to the high contention locking was likely
to introduce, possibly rendering the thing useless on big machines. I argue,
however, that locking is not necessary.

We do a read-and-return sequence in pvclock, and between read and return,
the global value can have changed. However, it can only have changed
by means of an addition of a positive value. So if we detected that our
clock timestamp is less than the current global, we know that we need to
return a higher one, even though it is not exactly the one we compared to.

OTOH, if we detect we're greater than the current time source, we atomically
replace the value with our new readings. This do causes contention on big
boxes (but big here means *BIG*), but it seems like a good trade off, since
it provide us with a time source guaranteed to be stable wrt time warps.

After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Avi Kivity <avi@redhat.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>
CC: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agox86, paravirt: Enable pvclock flags in vcpu_time_info structure
Glauber Costa [Tue, 11 May 2010 16:17:39 +0000 (12:17 -0400)]
x86, paravirt: Enable pvclock flags in vcpu_time_info structure

This patch removes one padding byte and transform it into a flags
field. New versions of guests using pvclock will query these flags
upon each read.

Flags, however, will only be interpreted when the guest decides to.
It uses the pvclock_valid_flags function to signal that a specific
set of flags should be taken into consideration. Which flags are valid
are usually devised via HV negotiation.

Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86: Inject #GP with the right rip on efer writes
Roedel, Joerg [Thu, 6 May 2010 09:38:43 +0000 (11:38 +0200)]
KVM: x86: Inject #GP with the right rip on efer writes

This patch fixes a bug in the KVM efer-msr write path. If a
guest writes to a reserved efer bit the set_efer function
injects the #GP directly. The architecture dependent wrmsr
function does not see this, assumes success and advances the
rip. This results in a #GP in the guest with the wrong rip.
This patch fixes this by reporting efer write errors back to
the architectural wrmsr function.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: SVM: Don't allow nested guest to VMMCALL into host
Joerg Roedel [Wed, 5 May 2010 14:04:45 +0000 (16:04 +0200)]
KVM: SVM: Don't allow nested guest to VMMCALL into host

This patch disables the possibility for a l2-guest to do a
VMMCALL directly into the host. This would happen if the
l1-hypervisor doesn't intercept VMMCALL and the l2-guest
executes this instruction.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: x86: Fix exception reinjection forced to true
Joerg Roedel [Wed, 5 May 2010 14:04:41 +0000 (16:04 +0200)]
KVM: x86: Fix exception reinjection forced to true

The patch merged recently which allowed to mark an exception
as reinjected has a bug as it always marks the exception as
reinjected. This breaks nested-svm shadow-on-shadow
implementation.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: Fix wallclock version writing race
Avi Kivity [Tue, 4 May 2010 12:00:37 +0000 (15:00 +0300)]
KVM: Fix wallclock version writing race

Wallclock writing uses an unprotected global variable to hold the version;
this can cause one guest to interfere with another if both write their
wallclock at the same time.

Acked-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
Avi Kivity [Tue, 4 May 2010 09:58:32 +0000 (12:58 +0300)]
KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots

On svm, kvm_read_pdptr() may require reading guest memory, which can sleep.

Push the spinlock into mmu_alloc_roots(), and only take it after we've read
the pdptr.

Tested-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: VMX: enable VMXON check with SMX enabled (Intel TXT)
Shane Wang [Thu, 29 Apr 2010 16:09:01 +0000 (12:09 -0400)]
KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)

Per document, for feature control MSR:

  Bit 1 enables VMXON in SMX operation. If the bit is clear, execution
        of VMXON in SMX operation causes a general-protection exception.
  Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution
        of VMXON outside SMX operation causes a general-protection exception.

This patch is to enable this kind of check with SMX for VMXON in KVM.

Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: x86: properly update ready_for_interrupt_injection
Marcelo Tosatti [Tue, 4 May 2010 02:04:27 +0000 (23:04 -0300)]
KVM: x86: properly update ready_for_interrupt_injection

The recent changes to emulate string instructions without entering guest
mode exposed a bug where pending interrupts are not properly reflected
in ready_for_interrupt_injection.

The result is that userspace overwrites a previously queued interrupt,
when irqchip's are emulated in userspace.

Fix by always updating state before returning to userspace.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: VMX: Atomically switch efer if EPT && !EFER.NX
Avi Kivity [Wed, 28 Apr 2010 13:42:29 +0000 (16:42 +0300)]
KVM: VMX: Atomically switch efer if EPT && !EFER.NX

When EPT is enabled, we cannot emulate EFER.NX=0 through the shadow page
tables.  This causes accesses through ptes with bit 63 set to succeed instead
of failing a reserved bit check.

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: VMX: Add facility to atomically switch MSRs on guest entry/exit
Avi Kivity [Wed, 28 Apr 2010 13:40:38 +0000 (16:40 +0300)]
KVM: VMX: Add facility to atomically switch MSRs on guest entry/exit

Some guest msr values cannot be used on the host (for example. EFER.NX=0),
so we need to switch them atomically during guest entry or exit.

Add a facility to program the vmx msr autoload registers accordingly.

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: VMX: Add definitions for guest and host EFER autoswitch vmcs entries
Avi Kivity [Wed, 28 Apr 2010 12:41:03 +0000 (15:41 +0300)]
KVM: VMX: Add definitions for guest and host EFER autoswitch vmcs entries

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: VMX: Add definition for msr autoload entry
Avi Kivity [Wed, 28 Apr 2010 12:40:31 +0000 (15:40 +0300)]
KVM: VMX: Add definition for msr autoload entry

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: Let vcpu structure alignment be determined at runtime
Avi Kivity [Wed, 28 Apr 2010 12:39:01 +0000 (15:39 +0300)]
KVM: Let vcpu structure alignment be determined at runtime

vmx and svm vcpus have different contents and therefore may have different
alignmment requirements.  Let each specify its required alignment.

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: cleanup invlpg code
Xiao Guangrong [Wed, 28 Apr 2010 03:55:15 +0000 (11:55 +0800)]
KVM: MMU: cleanup invlpg code

Using is_last_spte() to cleanup invlpg code

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: MMU: move unsync/sync tracpoints to proper place
Xiao Guangrong [Wed, 28 Apr 2010 03:55:06 +0000 (11:55 +0800)]
KVM: MMU: move unsync/sync tracpoints to proper place

Move unsync/sync tracepoints to the proper place, it's good
for us to obtain unsync page live time

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: MMU: convert mmu tracepoints
Xiao Guangrong [Wed, 28 Apr 2010 03:54:55 +0000 (11:54 +0800)]
KVM: MMU: convert mmu tracepoints

Convert mmu tracepoints by using DECLARE_EVENT_CLASS

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: MMU: fix for calculating gpa in invlpg code
Xiao Guangrong [Wed, 28 Apr 2010 03:54:44 +0000 (11:54 +0800)]
KVM: MMU: fix for calculating gpa in invlpg code

If the guest is 32-bit, we should use 'quadrant' to adjust gpa
offset

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: powerpc: use of kzalloc/kfree requires including slab.h
Stephen Rothwell [Tue, 27 Apr 2010 05:49:17 +0000 (15:49 +1000)]
KVM: powerpc: use of kzalloc/kfree requires including slab.h

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: Fix mmu shrinker error
Gui Jianfeng [Tue, 27 Apr 2010 02:39:49 +0000 (10:39 +0800)]
KVM: Fix mmu shrinker error

kvm_mmu_remove_one_alloc_mmu_page() assumes kvm_mmu_zap_page() only reclaims
only one sp, but that's not the case. This will cause mmu shrinker returns
a wrong number. This patch fix the counting error.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: MMU: fix hashing for TDP and non-paging modes
Eric Northup [Tue, 27 Apr 2010 00:00:05 +0000 (17:00 -0700)]
KVM: MMU: fix hashing for TDP and non-paging modes

For TDP mode, avoid creating multiple page table roots for the single
guest-to-host physical address map by fixing the inputs used for the
shadow page table hash in mmu_alloc_roots().

Signed-off-by: Eric Northup <digitaleric@google.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: Minor MMU documentation edits
Avi Kivity [Mon, 26 Apr 2010 08:59:21 +0000 (11:59 +0300)]
KVM: Minor MMU documentation edits

Reported by Andrew Jones.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: Document KVM_GET_MP_STATE and KVM_SET_MP_STATE
Avi Kivity [Sun, 25 Apr 2010 12:51:46 +0000 (15:51 +0300)]
KVM: Document KVM_GET_MP_STATE and KVM_SET_MP_STATE

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: MMU: fix sp->unsync type error in trace event definition
Gui Jianfeng [Thu, 22 Apr 2010 09:33:57 +0000 (17:33 +0800)]
KVM: MMU: fix sp->unsync type error in trace event definition

sp->unsync is bool now, so update trace event declaration.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: SVM: Handle MCE intercepts always on host level
Joerg Roedel [Thu, 22 Apr 2010 10:33:14 +0000 (12:33 +0200)]
KVM: SVM: Handle MCE intercepts always on host level

This patch prevents MCE intercepts from being propagated
into the L1 guest if they happened in an L2 guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: x86: Allow marking an exception as reinjected
Joerg Roedel [Thu, 22 Apr 2010 10:33:13 +0000 (12:33 +0200)]
KVM: x86: Allow marking an exception as reinjected

This patch adds logic to kvm/x86 which allows to mark an
injected exception as reinjected. This allows to remove an
ugly hack from svm_complete_interrupts that prevented
exceptions from being reinjected at all in the nested case.
The hack was necessary because an reinjected exception into
the nested guest could cause a nested vmexit emulation. But
reinjected exceptions must not intercept. The downside of
the hack is that a exception that in injected could get
lost.
This patch fixes the problem and puts the code for it into
generic x86 files because. Nested-VMX will likely have the
same problem and could reuse the code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: SVM: Report emulated SVM features to userspace
Joerg Roedel [Thu, 22 Apr 2010 10:33:12 +0000 (12:33 +0200)]
KVM: SVM: Report emulated SVM features to userspace

This patch implements the reporting of the emulated SVM
features to userspace instead of the real hardware
capabilities. Every real hardware capability needs emulation
in nested svm so the old behavior was broken.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: x86: Add callback to let modules decide over some supported cpuid bits
Joerg Roedel [Thu, 22 Apr 2010 10:33:11 +0000 (12:33 +0200)]
KVM: x86: Add callback to let modules decide over some supported cpuid bits

This patch adds the get_supported_cpuid callback to
kvm_x86_ops. It will be used in do_cpuid_ent to delegate the
decission about some supported cpuid bits to the
architecture modules.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: SVM: Propagate nested entry failure into guest hypervisor
Joerg Roedel [Thu, 22 Apr 2010 10:33:10 +0000 (12:33 +0200)]
KVM: SVM: Propagate nested entry failure into guest hypervisor

This patch implements propagation of a failes guest vmrun
back into the guest instead of killing the whole guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: SVM: Sync cr0 and cr3 to kvm state before nested handling
Joerg Roedel [Thu, 22 Apr 2010 10:33:09 +0000 (12:33 +0200)]
KVM: SVM: Sync cr0 and cr3 to kvm state before nested handling

This patch syncs cr0 and cr3 from the vmcb to the kvm state
before nested intercept handling is done. This allows to
simplify the vmexit path.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: SVM: Make sure rip is synced to vmcb before nested vmexit
Joerg Roedel [Thu, 22 Apr 2010 10:33:08 +0000 (12:33 +0200)]
KVM: SVM: Make sure rip is synced to vmcb before nested vmexit

This patch fixes a bug where a nested guest always went over
the same instruction because the rip was not advanced on a
nested vmexit.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: SVM: Fix nested nmi handling
Joerg Roedel [Thu, 22 Apr 2010 10:33:07 +0000 (12:33 +0200)]
KVM: SVM: Fix nested nmi handling

The patch introducing nested nmi handling had a bug. The
check does not belong to enable_nmi_window but must be in
nmi_allowed. This patch fixes this.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoMerge remote branch 'tip/perf/core'
Avi Kivity [Fri, 23 Apr 2010 10:49:06 +0000 (13:49 +0300)]
Merge remote branch 'tip/perf/core'

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: Remove test-before-set optimization for dirty bits
Takuya Yoshikawa [Fri, 23 Apr 2010 08:48:35 +0000 (17:48 +0900)]
KVM: Remove test-before-set optimization for dirty bits

As Avi pointed out, testing bit part in mark_page_dirty() was important
in the days of shadow paging, but currently EPT and NPT has already become
common and the chance of faulting a page more that once per iteration is
small. So let's remove the test bit to avoid extra access.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: Document mmu
Avi Kivity [Wed, 21 Apr 2010 13:08:20 +0000 (16:08 +0300)]
KVM: Document mmu

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: VMX: free vpid when fail to create vcpu
Lai Jiangshan [Sat, 17 Apr 2010 08:41:47 +0000 (16:41 +0800)]
KVM: VMX: free vpid when fail to create vcpu

Fix bug of the exception path, free allocated vpid when fail
to create vcpu.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Enable native paired singles
Alexander Graf [Tue, 20 Apr 2010 00:49:54 +0000 (02:49 +0200)]
KVM: PPC: Enable native paired singles

When we're on a paired single capable host, we can just always enable
paired singles and expose them to the guest directly.

This approach breaks when multiple VMs run and access PS concurrently,
but this should suffice until we get a proper framework for it in Linux.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Find HTAB ourselves
Alexander Graf [Tue, 20 Apr 2010 00:49:53 +0000 (02:49 +0200)]
KVM: PPC: Find HTAB ourselves

For KVM we need to find the location of the HTAB. We can either rely
on internal data structures of the kernel or ask the hardware.

Ben issued complaints about the internal data structure method, so
let's switch it to our own inquiry of the HTAB. Now we're fully
independend :-).

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Fix Book3S_64 Host MMU debug output
Alexander Graf [Tue, 20 Apr 2010 00:49:52 +0000 (02:49 +0200)]
KVM: PPC: Fix Book3S_64 Host MMU debug output

We have some debug output in Book3S_64. Some of that was invalid though,
partially not even compiling because it accessed incorrect variables.

So let's fix that up, making debugging more fun again.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Set VSID_PR also for Book3S_64
Alexander Graf [Tue, 20 Apr 2010 00:49:51 +0000 (02:49 +0200)]
KVM: PPC: Set VSID_PR also for Book3S_64

Book3S_64 didn't set VSID_PR when we're in PR=1. This lead to pretty bad
behavior when searching for the shadow segment, as part of the code relied
on VSID_PR being set.

This patch fixes booting Book3S_64 guests.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Be more informative on BUG
Alexander Graf [Tue, 20 Apr 2010 00:49:50 +0000 (02:49 +0200)]
KVM: PPC: Be more informative on BUG

We have a condition in the ppc64 host mmu code that should never occur.
Unfortunately, it just did happen to me and I was rather puzzled on why,
because BUG_ON doesn't tell me anything useful.

So let's add some more debug output in case this goes wrong. Also change
BUG to WARN, since I don't want to reboot every time I mess something up.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Make Alignment interrupts work again
Alexander Graf [Tue, 20 Apr 2010 00:49:49 +0000 (02:49 +0200)]
KVM: PPC: Make Alignment interrupts work again

In the process of merging Book3S_32 and 64 I somehow ended up having the
alignment interrupt handler take last_inst, but the fetching code not
fetching it. So we ended up with stale last_inst values.

Let's just enable last_inst fetching for alignment interrupts too.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Improve split mode
Alexander Graf [Tue, 20 Apr 2010 00:49:48 +0000 (02:49 +0200)]
KVM: PPC: Improve split mode

When in split mode, instruction relocation and data relocation are not equal.

So far we implemented this mode by reserving a special pseudo-VSID for the
two cases and flushing all PTEs when going into split mode, which is slow.

Unfortunately 32bit Linux and Mac OS X use split mode extensively. So to not
slow down things too much, I came up with a different idea: Mark the split
mode with a bit in the VSID and then treat it like any other segment.

This means we can just flush the shadow segment cache, but keep the PTEs
intact. I verified that this works with ppc32 Linux and Mac OS X 10.4
guests and does speed them up.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Make Performance Counters work
Alexander Graf [Tue, 20 Apr 2010 00:49:47 +0000 (02:49 +0200)]
KVM: PPC: Make Performance Counters work

When we get a performance counter interrupt we need to route it on to the
Linux handler after we got out of the guest context. We also need to tell
our handling code that this particular interrupt doesn't need treatment.

So let's add those two bits in, making perf work while having a KVM guest
running.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Convert u64 -> ulong
Alexander Graf [Tue, 20 Apr 2010 00:49:46 +0000 (02:49 +0200)]
KVM: PPC: Convert u64 -> ulong

There are some pieces in the code that I overlooked that still use
u64s instead of longs. This slows down 32 bit hosts unnecessarily, so
let's just move them to ulong.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Enable Book3S_32 KVM building
Alexander Graf [Thu, 15 Apr 2010 22:11:58 +0000 (00:11 +0200)]
KVM: PPC: Enable Book3S_32 KVM building

Now that we have all the bits and pieces in place, let's enable building
of the Book3S_32 target.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Add KVM intercept handlers
Alexander Graf [Thu, 15 Apr 2010 22:11:57 +0000 (00:11 +0200)]
KVM: PPC: Add KVM intercept handlers

When an interrupt occurs we don't know yet if we're in guest context or
in host context. When in guest context, KVM needs to handle it.

So let's pull the same trick we did on Book3S_64: Just add a macro to
determine if we're in guest context or not and if so jump on to KVM code.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Check max IRQ prio
Alexander Graf [Thu, 15 Apr 2010 22:11:56 +0000 (00:11 +0200)]
KVM: PPC: Check max IRQ prio

We have a define on what the highest bit of IRQ priorities is. So we can
just as well use it in the bit checking code and avoid invalid IRQ values
to be triggered.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoPPC: Export SWITCH_FRAME_SIZE
Alexander Graf [Thu, 15 Apr 2010 22:11:55 +0000 (00:11 +0200)]
PPC: Export SWITCH_FRAME_SIZE

We need the SWITCH_FRAME_SIZE define on Book3S_32 now too.
So let's export it unconditionally.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Export MMU variables
Alexander Graf [Thu, 15 Apr 2010 22:11:54 +0000 (00:11 +0200)]
KVM: PPC: Export MMU variables

Our shadow MMU code needs to know where the HTAB is located and how
big it is. So we need some variables from the kernel exported to
module space if KVM is built as a module.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Add Book3S compatibility code
Alexander Graf [Thu, 15 Apr 2010 22:11:53 +0000 (00:11 +0200)]
KVM: PPC: Add Book3S compatibility code

Some code we had so far required defines and had code that was completely
Book3S_64 specific. Since we now opened book3s.c to Book3S_32 too, we need
to take care of these pieces.

So let's add some minor code where it makes sense to not go the Book3S_64
code paths and add compat defines on others.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Emulate segment fault
Alexander Graf [Thu, 15 Apr 2010 22:11:52 +0000 (00:11 +0200)]
KVM: PPC: Emulate segment fault

Book3S_32 doesn't know about segment faults. It only knows about page faults.
So in order to know that we didn't map a segment, we need to fake segment
faults.

We do this by setting invalid segment registers to an invalid VSID and then
check for that VSID on normal page faults.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Add SVCPU to Book3S_32
Alexander Graf [Thu, 15 Apr 2010 22:11:51 +0000 (00:11 +0200)]
KVM: PPC: Add SVCPU to Book3S_32

We need to keep the pointer to the shadow vcpu somewhere accessible from
within really early interrupt code. The best fit I found was the thread
struct, as that resides in an SPRG.

So let's put a pointer to the shadow vcpu in the thread struct and add
an asm-offset so we can find it.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Remove fetch fail code
Alexander Graf [Thu, 15 Apr 2010 22:11:50 +0000 (00:11 +0200)]
KVM: PPC: Remove fetch fail code

When instruction fetch failed, the inline function hook automatically
detects that and starts the internal guest memory load function. So
whenever we access kvmppc_get_last_inst(), we're sure the result is sane.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Release clean pages as clean
Alexander Graf [Thu, 15 Apr 2010 22:11:49 +0000 (00:11 +0200)]
KVM: PPC: Release clean pages as clean

When we mapped a page as read-only, we can just release it as clean to
KVM's page claim mechanisms, because we're pretty sure it hasn't been
touched.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Make SLB switching code the new segment framework
Alexander Graf [Thu, 15 Apr 2010 22:11:48 +0000 (00:11 +0200)]
KVM: PPC: Make SLB switching code the new segment framework

We just introduced generic segment switching code that only needs to call
small macros to do the actual switching, but keeps most of the entry / exit
code generic.

So let's move the SLB switching code over to use this new mechanism.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Make highmem code generic
Alexander Graf [Thu, 15 Apr 2010 22:11:47 +0000 (00:11 +0200)]
KVM: PPC: Make highmem code generic

Since we now have several fields in the shadow VCPU, we also change
the internal calling convention between the different entry/exit code
layers.

Let's reflect that in the IR=1 code and make sure we use "long" defines
for long field access.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Make real mode handler generic
Alexander Graf [Thu, 15 Apr 2010 22:11:46 +0000 (00:11 +0200)]
KVM: PPC: Make real mode handler generic

The real mode handler code was originally writen for 64 bit Book3S only.
But since we not add 32 bit functionality too, we need to make some tweaks
to it.

This patch basically combines using the "long" access defines and using
fields from the shadow VCPU we just moved there.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Extract MMU init
Alexander Graf [Thu, 15 Apr 2010 22:11:45 +0000 (00:11 +0200)]
KVM: PPC: Extract MMU init

The host shadow mmu code needs to get initialized. It needs to fetch a
segment it can use to put shadow PTEs into.

That initialization code was in generic code, which is icky. Let's move
it over to the respective MMU file.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Use now shadowed vcpu fields
Alexander Graf [Thu, 15 Apr 2010 22:11:44 +0000 (00:11 +0200)]
KVM: PPC: Use now shadowed vcpu fields

The shadow vcpu now contains some fields we don't use from the vcpu anymore.
Access to them happens using inline functions that happily use the shadow
vcpu fields.

So let's now ifdef them out to booke only and add asm-offsets.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoPPC: Add STLU
Alexander Graf [Thu, 15 Apr 2010 22:11:43 +0000 (00:11 +0200)]
PPC: Add STLU

For assembly code there are several "long" load and store defines already.
The one that's missing is the typical stack store, stdu/stwu.

So let's add that define as well, making my KVM code happy.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Use CONFIG_PPC_BOOK3S define
Alexander Graf [Thu, 15 Apr 2010 22:11:42 +0000 (00:11 +0200)]
KVM: PPC: Use CONFIG_PPC_BOOK3S define

Upstream recently added a new name for PPC64: Book3S_64.

So instead of using CONFIG_PPC64 we should use CONFIG_PPC_BOOK3S consotently.
That makes understanding the code easier (I hope).

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Use KVM_BOOK3S_HANDLER
Alexander Graf [Thu, 15 Apr 2010 22:11:41 +0000 (00:11 +0200)]
KVM: PPC: Use KVM_BOOK3S_HANDLER

So far we had a lot of conditional code on CONFIG_KVM_BOOK3S_64_HANDLER.
As we're moving towards common code between 32 and 64 bits, most of
these ifdefs can be moved to a more generic term define, called
CONFIG_KVM_BOOK3S_HANDLER.

This patch adds the new generic config option and moves ifdefs over.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Improve indirect svcpu accessors
Alexander Graf [Thu, 15 Apr 2010 22:11:40 +0000 (00:11 +0200)]
KVM: PPC: Improve indirect svcpu accessors

We already have some inline fuctions we use to access vcpu or svcpu structs,
depending on whether we're on booke or book3s. Since we just put a few more
registers into the svcpu, we also need to make sure the respective callbacks
are available and get used.

So this patch moves direct use of the now in the svcpu struct fields to
inline function calls. While at it, it also moves the definition of those
inline function calls to respective header files for booke and book3s,
greatly improving readability.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Add fields to shadow vcpu
Alexander Graf [Thu, 15 Apr 2010 22:11:39 +0000 (00:11 +0200)]
KVM: PPC: Add fields to shadow vcpu

After a lot of thought on how to make the entry / exit code easier,
I figured it'd be clever to put even more register state into the
shadow vcpu. That way we have more registers available to use, making
the code easier to read.

So this patch adds a few new fields to that shadow vcpu. Later on we
will remove the originals from the vcpu and paca.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Add kvm_book3s_32.h
Alexander Graf [Thu, 15 Apr 2010 22:11:38 +0000 (00:11 +0200)]
KVM: PPC: Add kvm_book3s_32.h

In analogy to the 64 bit specific header file, this is the 32 bit
pendant. With this in place we can just always call to_svcpu and
be assured we get the right pointer anywhere.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Add kvm_book3s_64.h
Alexander Graf [Thu, 15 Apr 2010 22:11:37 +0000 (00:11 +0200)]
KVM: PPC: Add kvm_book3s_64.h

In the process of generalizing as much code as possible, I also moved
the shadow vcpu code together to a generic book3s file. Unfortunately
the location of the shadow vcpu is different on 32 and 64 bit, so we
need a wrapper function to tell us where it is.

That sounded like a perfect fit for a subarch specific header file.
Here we can put anything that needs to be different between those two.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoPPC: Split context init/destroy functions
Alexander Graf [Thu, 15 Apr 2010 22:11:36 +0000 (00:11 +0200)]
PPC: Split context init/destroy functions

We need to reserve a context from KVM to make sure we have our own
segment space. While we did that split for Book3S_64 already, 32 bit
is still outstanding.

So let's split it now.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Add generic segment switching code
Alexander Graf [Thu, 15 Apr 2010 22:11:35 +0000 (00:11 +0200)]
KVM: PPC: Add generic segment switching code

This is the code that will later be used instead of book3s_64_slb.S. It
does the last step of guest entry and the first generic steps of guest
exiting, once we have determined the interrupt is a KVM interrupt.

It also reads the last used instruction from the guest virtual address
space if necessary, to speed up that path.

The new thing about this file is that it makes use of generic long load
and store functions and calls a macro to fill in the actual segment
switching code. That still needs to be done differently for book3s_32 and
book3s_64.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Add SR swapping code
Alexander Graf [Thu, 15 Apr 2010 22:11:34 +0000 (00:11 +0200)]
KVM: PPC: Add SR swapping code

Later in this series we will move the current segment switch code to
generic code and make that call hooks for the specific sub-archs (32
vs. 64 bit). This is the hook for 32 bits.

It enabled the entry and exit code to swap segment registers with
values from the shadow cpu structure.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Add host MMU Support
Alexander Graf [Thu, 15 Apr 2010 22:11:33 +0000 (00:11 +0200)]
KVM: PPC: Add host MMU Support

In order to support 32 bit Book3S, we need to add code to enable our
shadow MMU to actually add shadow PTEs. This is the module enabling
that support.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Name generic 64-bit code generic
Alexander Graf [Thu, 15 Apr 2010 22:11:32 +0000 (00:11 +0200)]
KVM: PPC: Name generic 64-bit code generic

We have quite some code that can be used by Book3S_32 and Book3S_64 alike,
so let's call it "Book3S" instead of "Book3S_64", so we can later on
use it from the 32 bit port too.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: cleanup for function unaccount_shadowed()
Wei Yongjun [Fri, 16 Apr 2010 08:21:42 +0000 (16:21 +0800)]
KVM: MMU: cleanup for function unaccount_shadowed()

Since gfn is not changed in the for loop, we do not need to call
gfn_to_memslot_unaliased() under the loop, and it is safe to move
it out.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: Get rid of dead function gva_to_page()
Gui Jianfeng [Fri, 16 Apr 2010 09:19:48 +0000 (17:19 +0800)]
KVM: Get rid of dead function gva_to_page()

Nobody use gva_to_page() anymore, get rid of it.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: Remove unused varialbe in rmap_next()
Gui Jianfeng [Fri, 16 Apr 2010 09:18:54 +0000 (17:18 +0800)]
KVM: MMU: Remove unused varialbe in rmap_next()

Remove unused varialbe in rmap_next()

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: Make use of is_large_pte() in walker
Gui Jianfeng [Fri, 16 Apr 2010 09:18:01 +0000 (17:18 +0800)]
KVM: MMU: Make use of is_large_pte() in walker

Make use of is_large_pte() instead of checking PT_PAGE_SIZE_MASK
bit directly.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: Move sync_page() first pte address calculation out of loop
Gui Jianfeng [Fri, 16 Apr 2010 09:16:40 +0000 (17:16 +0800)]
KVM: MMU: Move sync_page() first pte address calculation out of loop

Move first pte address calculation out of loop to save some cycles.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: do not call hardware_disable() on CPU_UP_CANCELED
Lai Jiangshan [Sat, 17 Apr 2010 09:00:19 +0000 (17:00 +0800)]
KVM: do not call hardware_disable() on CPU_UP_CANCELED

When CPU_UP_CANCELED, hardware_enable() has not been called at the CPU
which is going up because raw_notifier_call_chain(CPU_ONLINE)
has not been called for this cpu.

Drop the handling for CPU_UP_CANCELED.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: Drop cr4.pge from shadow page role
Avi Kivity [Mon, 19 Apr 2010 14:25:53 +0000 (17:25 +0300)]
KVM: MMU: Drop cr4.pge from shadow page role

Since commit bf47a760f66ad, we no longer handle ptes with the global bit
set specially, so there is no reason to distinguish between shadow pages
created with cr4.gpe set and clear.

Such tracking is expensive when the guest toggles cr4.pge, so drop it.

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: use the correct RCU API for PROVE_RCU=y
Lai Jiangshan [Mon, 19 Apr 2010 09:41:23 +0000 (17:41 +0800)]
KVM: use the correct RCU API for PROVE_RCU=y

The RCU/SRCU API have already changed for proving RCU usage.

I got the following dmesg when PROVE_RCU=y because we used incorrect API.
This patch coverts rcu_deference() to srcu_dereference() or family API.

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/8550:
 #0:  (&kvm->slots_lock){+.+.+.}, at: [<ffffffffa011a6ac>] kvm_set_memory_region+0x29/0x50 [kvm]
 #1:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: [<ffffffffa012262d>] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]

stack backtrace:
Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
Call Trace:
 [<ffffffff8106c59e>] lockdep_rcu_dereference+0xaa/0xb3
 [<ffffffffa012f6c1>] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
 [<ffffffffa012263e>] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
 [<ffffffffa011a5d7>] __kvm_set_memory_region+0x636/0x6e2 [kvm]
 [<ffffffffa011a6ba>] kvm_set_memory_region+0x37/0x50 [kvm]
 [<ffffffffa015e956>] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
 [<ffffffffa0126592>] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
 [<ffffffff810a8692>] ? unlock_page+0x27/0x2c
 [<ffffffff810bf879>] ? __do_fault+0x3a9/0x3e1
 [<ffffffffa011b12f>] kvm_vm_ioctl+0x364/0x38d [kvm]
 [<ffffffff81060cfa>] ? up_read+0x23/0x3d
 [<ffffffff810f3587>] vfs_ioctl+0x32/0xa6
 [<ffffffff810f3b19>] do_vfs_ioctl+0x495/0x4db
 [<ffffffff810e6b2f>] ? fget_light+0xc2/0x241
 [<ffffffff810e416c>] ? do_sys_open+0x104/0x116
 [<ffffffff81382d6d>] ? retint_swapgs+0xe/0x13
 [<ffffffff810f3ba6>] sys_ioctl+0x47/0x6a
 [<ffffffff810021db>] system_call_fastpath+0x16/0x1b

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoMerge branch 'perf'
Avi Kivity [Mon, 19 Apr 2010 09:52:53 +0000 (12:52 +0300)]
Merge branch 'perf'

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: cleanup for hlist walk restart
Xiao Guangrong [Fri, 16 Apr 2010 08:35:54 +0000 (16:35 +0800)]
KVM: MMU: cleanup for hlist walk restart

Quote from Avi:

|Just change the assignment to a 'goto restart;' please,
|I don't like playing with list_for_each internals.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: prevent spurious exit to userspace during task switch emulation.
Gleb Natapov [Thu, 15 Apr 2010 18:03:50 +0000 (21:03 +0300)]
KVM: prevent spurious exit to userspace during task switch emulation.

If kvm_task_switch() fails code exits to userspace without specifying
exit reason, so the previous exit reason is reused by userspace. Fix
this by specifying exit reason correctly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: MMU: remove unused parameter in mmu_parent_walk()
Xiao Guangrong [Fri, 16 Apr 2010 13:29:17 +0000 (21:29 +0800)]
KVM: MMU: remove unused parameter in mmu_parent_walk()

'vcpu' is unused, remove it

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: MMU: reduce 'struct kvm_mmu_page' size
Xiao Guangrong [Fri, 16 Apr 2010 13:27:54 +0000 (21:27 +0800)]
KVM: MMU: reduce 'struct kvm_mmu_page' size

Define 'multimapped' as 'bool'.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: MMU: remove unused struct kvm_unsync_walk
Xiao Guangrong [Fri, 16 Apr 2010 13:23:41 +0000 (21:23 +0800)]
KVM: MMU: remove unused struct kvm_unsync_walk

Remove 'struct kvm_unsync_walk' since it's not used.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: fix emulator_task_switch() return value.
Gleb Natapov [Thu, 15 Apr 2010 09:29:50 +0000 (12:29 +0300)]
KVM: fix emulator_task_switch() return value.

emulator_task_switch() should return -1 for failure and 0 for success to
the caller, just like x86_emulate_insn() does.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: MMU: Replace role.glevels with role.cr4_pae
Avi Kivity [Wed, 14 Apr 2010 16:20:03 +0000 (19:20 +0300)]
KVM: MMU: Replace role.glevels with role.cr4_pae

There is no real distinction between glevels=3 and glevels=4; both have
exactly the same format and the code is treated exactly the same way.  Drop
role.glevels and replace is with role.cr4_pae (which is meaningful).  This
simplifies the code a bit.

As a side effect, it allows sharing shadow page tables between pae and
longmode guest page tables at the same guest page.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86: Push potential exception error code on task switches
Jan Kiszka [Wed, 14 Apr 2010 13:51:09 +0000 (15:51 +0200)]
KVM: x86: Push potential exception error code on task switches

When a fault triggers a task switch, the error code, if existent, has to
be pushed on the new task's stack. Implement the missing bits.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86: Terminate early if task_switch_16/32 failed
Jan Kiszka [Wed, 14 Apr 2010 13:50:57 +0000 (15:50 +0200)]
KVM: x86: Terminate early if task_switch_16/32 failed

Stop the switch immediately if task_switch_16/32 returned an error. Only
if that step succeeded, the switch should actually take place and update
any register states.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86: get rid of mmu_only parameter in emulator_write_emulated()
Gleb Natapov [Tue, 13 Apr 2010 07:21:56 +0000 (10:21 +0300)]
KVM: x86: get rid of mmu_only parameter in emulator_write_emulated()

We can call kvm_mmu_pte_write() directly from
emulator_cmpxchg_emulated() instead of passing mmu_only down to
emulator_write_emulated_onepage() and call it there.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: limit the number of pages per memory slot
Takuya Yoshikawa [Tue, 13 Apr 2010 13:47:24 +0000 (22:47 +0900)]
KVM: limit the number of pages per memory slot

This patch limits the number of pages per memory slot to make
us free from extra care about type issues.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: move DR register access handling into generic code
Gleb Natapov [Tue, 13 Apr 2010 07:05:23 +0000 (10:05 +0300)]
KVM: move DR register access handling into generic code

Currently both SVM and VMX have their own DR handling code. Move it to
x86.c.

Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: SVM: implement NEXTRIPsave SVM feature
Andre Przywara [Sun, 11 Apr 2010 21:07:28 +0000 (23:07 +0200)]
KVM: SVM: implement NEXTRIPsave SVM feature

On SVM we set the instruction length of skipped instructions
to hard-coded, well known values, which could be wrong when (bogus,
but valid) prefixes (REX, segment override) are used.
Newer AMD processors (Fam10h 45nm and better, aka. PhenomII or
AthlonII) have an explicit NEXTRIP field in the VMCB containing the
desired information.
Since it is cheap to do so, we use this field to override the guessed
value on newer processors.
A fix for older CPUs would be rather expensive, as it would require
to fetch and partially decode the instruction. As the problem is not
a security issue and needs special, handcrafted code to trigger
(no compiler will ever generate such code), I omit a fix for older
CPUs.
If someone is interested, I have both a patch for these CPUs as well as
demo code triggering this issue: It segfaults under KVM, but runs
perfectly on native Linux.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: Fix MAXPHYADDR calculation when cpuid does not support it
Avi Kivity [Sun, 11 Apr 2010 12:33:32 +0000 (15:33 +0300)]
KVM: Fix MAXPHYADDR calculation when cpuid does not support it

MAXPHYADDR is derived from cpuid 0x80000008, but when that isn't present, we
get some random value.

Fix by checking first that cpuid 0x80000008 is supported.

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>