firefly-linux-kernel-4.4.55.git
16 years agoKVM: kill file->f_count abuse in kvm
Al Viro [Sat, 19 Apr 2008 19:33:56 +0000 (20:33 +0100)]
KVM: kill file->f_count abuse in kvm

Use kvm own refcounting instead of playing with ->filp->f_count.
That will allow to get rid of a lot of crap in anon_inode_getfd() and
kill a race in kvm_dev_ioctl_create_vm() (file might have been closed
immediately by another thread, so ->filp might point to already freed
struct file when we get around to setting it).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: kvm_pv_mmu_op should not take mmap_sem
Marcelo Tosatti [Wed, 16 Apr 2008 20:19:06 +0000 (17:19 -0300)]
KVM: MMU: kvm_pv_mmu_op should not take mmap_sem

kvm_pv_mmu_op should not take mmap_sem. All gfn_to_page() callers down
in the MMU processing will take it if necessary, so as it is it can
deadlock.

Apparently a leftover from the days before slots_lock.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: remove selective CR0 comment
Joerg Roedel [Wed, 16 Apr 2008 15:01:05 +0000 (17:01 +0200)]
KVM: SVM: remove selective CR0 comment

There is not selective cr0 intercept bug. The code in the comment sets the
CR0.PG bit. But KVM sets the CR4.PG bit for SVM always to implement the paged
real mode. So the 'mov %eax,%cr0' instruction does not change the CR0.PG bit.
Selective CR0 intercepts only occur when a bit is actually changed. So its the
right behavior that there is no intercept on this instruction.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: remove now obsolete FIXME comment
Joerg Roedel [Wed, 16 Apr 2008 14:51:19 +0000 (16:51 +0200)]
KVM: SVM: remove now obsolete FIXME comment

With the usage of the V_TPR field this comment is now obsolete.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: disable CR8 intercept when tpr is not masking interrupts
Joerg Roedel [Wed, 16 Apr 2008 14:51:18 +0000 (16:51 +0200)]
KVM: SVM: disable CR8 intercept when tpr is not masking interrupts

This patch disables the intercept of CR8 writes if the TPR is not masking
interrupts. This reduces the total number CR8 intercepts to below 1 percent of
what we have without this patch using Windows 64 bit guests.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled
Joerg Roedel [Wed, 16 Apr 2008 14:51:17 +0000 (16:51 +0200)]
KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled

If the CR8 write intercept is disabled the V_TPR field of the VMCB needs to be
synced with the TPR field in the local apic.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: export kvm_lapic_set_tpr() to modules
Joerg Roedel [Wed, 16 Apr 2008 14:51:16 +0000 (16:51 +0200)]
KVM: export kvm_lapic_set_tpr() to modules

This patch exports the kvm_lapic_set_tpr() function from the lapic code to
modules. It is required in the kvm-amd module to optimize CR8 intercepts.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: sync TPR value to V_TPR field in the VMCB
Joerg Roedel [Wed, 16 Apr 2008 14:51:15 +0000 (16:51 +0200)]
KVM: SVM: sync TPR value to V_TPR field in the VMCB

This patch adds syncing of the lapic.tpr field to the V_TPR field of the VMCB.
With this change we can safely remove the CR8 read intercept.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ppc: PowerPC 440 KVM implementation
Hollis Blanchard [Thu, 17 Apr 2008 04:28:09 +0000 (23:28 -0500)]
KVM: ppc: PowerPC 440 KVM implementation

This functionality is definitely experimental, but is capable of running
unmodified PowerPC 440 Linux kernels as guests on a PowerPC 440 host. (Only
tested with 440EP "Bamboo" guests so far, but with appropriate userspace
support other SoC/board combinations should work.)

See Documentation/powerpc/kvm_440.txt for technical details.

[stephen: build fix]

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Add MAINTAINERS entry for PowerPC KVM
Hollis Blanchard [Thu, 17 Apr 2008 04:28:08 +0000 (23:28 -0500)]
KVM: Add MAINTAINERS entry for PowerPC KVM

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ppc: Add DCR access information to struct kvm_run
Hollis Blanchard [Thu, 17 Apr 2008 04:28:07 +0000 (23:28 -0500)]
KVM: ppc: Add DCR access information to struct kvm_run

Device Control Registers are essentially another address space found on PowerPC
4xx processors, analogous to PIO on x86. DCRs are always 32 bits, and can be
identified by a 32-bit number. We forward most DCR accesses to userspace for
emulation (with the exception of CPR0 registers, which can be read directly
for simplicity in timebase frequency determination).

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoppc: Export tlb_44x_hwater for KVM
Hollis Blanchard [Thu, 17 Apr 2008 04:28:06 +0000 (23:28 -0500)]
ppc: Export tlb_44x_hwater for KVM

PowerPC 440 KVM needs to know how many TLB entries are used for the host kernel
linear mapping (it does not modify these mappings when switching between guest
and host execution).

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Rename debugfs_dir to kvm_debugfs_dir
Hollis Blanchard [Tue, 15 Apr 2008 21:05:42 +0000 (16:05 -0500)]
KVM: Rename debugfs_dir to kvm_debugfs_dir

It's a globally exported symbol now.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86 emulator: fix lea to really get the effective address
Avi Kivity [Mon, 14 Apr 2008 20:46:37 +0000 (23:46 +0300)]
KVM: x86 emulator: fix lea to really get the effective address

We never hit this, since there is currently no reason to emulate lea.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86 emulator: fix smsw and lmsw with a memory operand
Avi Kivity [Mon, 14 Apr 2008 11:40:50 +0000 (14:40 +0300)]
KVM: x86 emulator: fix smsw and lmsw with a memory operand

lmsw and smsw were implemented only with a register operand.  Extend them
to support a memory operand as well.  Fixes Windows running some display
compatibility test on AMD hosts.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86 emulator: initialize src.val and dst.val for register operands
Avi Kivity [Mon, 14 Apr 2008 20:27:07 +0000 (23:27 +0300)]
KVM: x86 emulator: initialize src.val and dst.val for register operands

This lets us treat the case where mod == 3 in the same manner as other cases.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: force a new asid when initializing the vmcb
Avi Kivity [Mon, 14 Apr 2008 10:10:21 +0000 (13:10 +0300)]
KVM: SVM: force a new asid when initializing the vmcb

Shutdown interception clears the vmcb, leaving the asid at zero (which is
illegal.  so force a new asid on vmcb initialization.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: fix kvm_vcpu_kick vs __vcpu_run race
Marcelo Tosatti [Fri, 11 Apr 2008 18:01:22 +0000 (15:01 -0300)]
KVM: fix kvm_vcpu_kick vs __vcpu_run race

There is a window open between testing of pending IRQ's
and assignment of guest_mode in __vcpu_run.

Injection of IRQ's can race with __vcpu_run as follows:

CPU0                                CPU1
kvm_x86_ops->run()
vcpu->guest_mode = 0                SET_IRQ_LINE ioctl
..
kvm_x86_ops->inject_pending_irq
kvm_cpu_has_interrupt()

                                    apic_test_and_set_irr()
                                    kvm_vcpu_kick
                                    if (vcpu->guest_mode)
                                        send_ipi()

vcpu->guest_mode = 1

So move guest_mode=1 assignment before ->inject_pending_irq, and make
sure that it won't reorder after it.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: add ioctls to save/store mpstate
Marcelo Tosatti [Fri, 11 Apr 2008 16:24:45 +0000 (13:24 -0300)]
KVM: add ioctls to save/store mpstate

So userspace can save/restore the mpstate during migration.

[avi: export the #define constants describing the value]
[christian: add s390 stubs]
[avi: ditto for ia64]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*
Avi Kivity [Sun, 13 Apr 2008 14:54:35 +0000 (17:54 +0300)]
KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*

We wish to export it to userspace, so move it into the kvm namespace.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: hlt emulation should take in-kernel APIC/PIT timers into account
Marcelo Tosatti [Fri, 11 Apr 2008 17:53:26 +0000 (14:53 -0300)]
KVM: hlt emulation should take in-kernel APIC/PIT timers into account

Timers that fire between guest hlt and vcpu_block's add_wait_queue() are
ignored, possibly resulting in hangs.

Also make sure that atomic_inc and waitqueue_active tests happen in the
specified order, otherwise the following race is open:

CPU0                                        CPU1
                                            if (waitqueue_active(wq))
add_wait_queue()
if (!atomic_read(pit_timer->pending))
    schedule()
                                            atomic_inc(pit_timer->pending)

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: do not intercept task switch with NPT
Joerg Roedel [Wed, 9 Apr 2008 14:04:32 +0000 (16:04 +0200)]
KVM: SVM: do not intercept task switch with NPT

When KVM uses NPT there is no reason to intercept task switches. This patch
removes the intercept for it in that case.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Add kvm trace userspace interface
Feng(Eric) Liu [Thu, 10 Apr 2008 12:47:53 +0000 (08:47 -0400)]
KVM: Add kvm trace userspace interface

This interface allows user a space application to read the trace of kvm
related events through relayfs.

Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Stub out kvmtrace
Avi Kivity [Thu, 10 Apr 2008 23:51:52 +0000 (02:51 +0300)]
KVM: ia64: Stub out kvmtrace

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: s390: Stub out kvmtrace
Avi Kivity [Thu, 10 Apr 2008 23:50:40 +0000 (02:50 +0300)]
KVM: s390: Stub out kvmtrace

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Add trace markers
Feng (Eric) Liu [Thu, 10 Apr 2008 19:31:10 +0000 (15:31 -0400)]
KVM: Add trace markers

Trace markers allow userspace to trace execution of a virtual machine
in order to monitor its performance.

Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: add intercept for machine check exception
Joerg Roedel [Wed, 9 Apr 2008 12:15:30 +0000 (14:15 +0200)]
KVM: SVM: add intercept for machine check exception

To properly forward a MCE occured while the guest is running to the host, we
have to intercept this exception and call the host handler by hand. This is
implemented by this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: align shadow CR4.MCE with host
Joerg Roedel [Wed, 9 Apr 2008 12:15:29 +0000 (14:15 +0200)]
KVM: SVM: align shadow CR4.MCE with host

This patch aligns the host version of the CR4.MCE bit with the CR4 active in
the guest. This is necessary to get MCE exceptions when the guest is running.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: indent svm_set_cr4 with tabs instead of spaces
Joerg Roedel [Wed, 9 Apr 2008 12:15:28 +0000 (14:15 +0200)]
KVM: SVM: indent svm_set_cr4 with tabs instead of spaces

The svm_set_cr4 function is indented with spaces. This patch replaces
them with tabs.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Register ioctl range
Avi Kivity [Sun, 6 Apr 2008 11:25:46 +0000 (14:25 +0300)]
KVM: Register ioctl range

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Don't assume struct page for x86
Anthony Liguori [Wed, 2 Apr 2008 19:46:56 +0000 (14:46 -0500)]
KVM: MMU: Don't assume struct page for x86

This patch introduces a gfn_to_pfn() function and corresponding functions like
kvm_release_pfn_dirty().  Using these new functions, we can modify the x86
MMU to no longer assume that it can always get a struct page for any given gfn.

We don't want to eliminate gfn_to_page() entirely because a number of places
assume they can do gfn_to_page() and then kmap() the results.  When we support
IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will
succeed.

This does not implement support for avoiding reference counting for reserved
RAM or for IO memory.  However, it should make those things pretty straight
forward.

Since we're only introducing new common symbols, I don't think it will break
the non-x86 architectures but I haven't tested those.  I've tested Intel,
AMD, NPT, and hugetlbfs with Windows and Linux guests.

[avi: fix overflow when shifting left pfns by adding casts]

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Add a guide about how to create kvm guests on ia64
Xiantao Zhang [Tue, 1 Apr 2008 07:08:29 +0000 (15:08 +0800)]
KVM: ia64: Add a guide about how to create kvm guests on ia64

Guide for creating virtual machine on kvm/ia64.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Enable kvm build for ia64
Xiantao Zhang [Fri, 28 Mar 2008 06:58:47 +0000 (14:58 +0800)]
KVM: ia64: Enable kvm build for ia64

Update the related Makefile and KConfig for kvm build

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Add kvm sal/pal virtulization support
Xiantao Zhang [Tue, 1 Apr 2008 06:59:30 +0000 (14:59 +0800)]
KVM: ia64: Add kvm sal/pal virtulization support

Some sal/pal calls would be traped to kvm for virtulization
from guest firmware.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Add guest interruption injection support
Xiantao Zhang [Tue, 1 Apr 2008 06:58:42 +0000 (14:58 +0800)]
KVM: ia64: Add guest interruption injection support

process.c mainly handle interruption injection, and some faults handling.

Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Generate offset values for assembly code use
Xiantao Zhang [Tue, 1 Apr 2008 06:57:53 +0000 (14:57 +0800)]
KVM: ia64: Generate offset values for assembly code use

asm-offsets.c will generate offset values used for assembly code
for some fileds of special structures.

Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Add optimization for some virtulization faults
Xiantao Zhang [Tue, 1 Apr 2008 06:57:09 +0000 (14:57 +0800)]
KVM: ia64: Add optimization for some virtulization faults

optvfault.S Add optimization for some performance-critical
virtualization faults.

Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Add processor virtulization support
Xiantao Zhang [Tue, 1 Apr 2008 08:14:28 +0000 (16:14 +0800)]
KVM: ia64: Add processor virtulization support

vcpu.c provides processor virtualization logic for kvm.

Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Add trampoline for guest/host mode switch
Xiantao Zhang [Tue, 1 Apr 2008 06:54:42 +0000 (14:54 +0800)]
KVM: ia64: Add trampoline for guest/host mode switch

trampoline code targets for guest/host world switch.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Add mmio decoder for kvm/ia64
Xiantao Zhang [Tue, 1 Apr 2008 06:53:32 +0000 (14:53 +0800)]
KVM: ia64: Add mmio decoder for kvm/ia64

mmio.c includes mmio decoder, and related mmio logics.

Signed-off-by: Anthony Xu <Anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Add interruption vector table for vmm
Xiantao Zhang [Tue, 1 Apr 2008 06:52:19 +0000 (14:52 +0800)]
KVM: ia64: Add interruption vector table for vmm

vmm_ivt.S includes an ivt for vmm use.

Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Add TLB virtulization support
Xiantao Zhang [Tue, 1 Apr 2008 06:50:59 +0000 (14:50 +0800)]
KVM: ia64: Add TLB virtulization support

vtlb.c includes tlb/VHPT virtulization.

Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: VMM module interfaces
Xiantao Zhang [Tue, 1 Apr 2008 06:49:24 +0000 (14:49 +0800)]
KVM: ia64: VMM module interfaces

vmm.c adds the interfaces with kvm/module, and initialize global data area.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Add header files for kvm/ia64
Xiantao Zhang [Tue, 1 Apr 2008 08:00:24 +0000 (16:00 +0800)]
KVM: ia64: Add header files for kvm/ia64

kvm_minstate.h : Marcos about Min save routines.
lapic.h: apic structure definition.
vcpu.h : routions related to vcpu virtualization.
vti.h  : Some macros or routines for VT support on Itanium.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Add kvm arch-specific core code for kvm/ia64
Xiantao Zhang [Tue, 1 Apr 2008 07:29:29 +0000 (15:29 +0800)]
KVM: ia64: Add kvm arch-specific core code for kvm/ia64

kvm_ia64.c is created to handle kvm ia64-specific core logic.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Add header files for kvm/ia64
Xiantao Zhang [Tue, 1 Apr 2008 06:45:06 +0000 (14:45 +0800)]
KVM: ia64: Add header files for kvm/ia64

Three header files are added:
asm-ia64/kvm.h
asm-ia64/kvm_host.h
asm-ia64/kvm_para.h

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: ia64: Prepare some structure and routines for kvm use
Xiantao Zhang [Tue, 1 Apr 2008 06:42:00 +0000 (14:42 +0800)]
KVM: ia64: Prepare some structure and routines for kvm use

Register structures are defined per SDM.
Add three small routines for kernel:
ia64_ttag, ia64_loadrs, ia64_flushrs

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: s390: Improve pgste accesses
Heiko Carstens [Fri, 4 Apr 2008 14:03:34 +0000 (16:03 +0200)]
KVM: s390: Improve pgste accesses

There is no need to use interlocked updates when the rcp
lock is held. Therefore the simple bitops variants can be
used. This should improve performance.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: s390: rename stfl to kvm_stfl
Heiko Carstens [Fri, 4 Apr 2008 13:12:40 +0000 (15:12 +0200)]
KVM: s390: rename stfl to kvm_stfl

Temporarily rename this function to avoid merge conflicts and/or
dependencies. This function will be removed as soon as git-s390
and kvm.git are finally upstream.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: s390: Fix incorrect return value
Heiko Carstens [Fri, 4 Apr 2008 13:12:35 +0000 (15:12 +0200)]
KVM: s390: Fix incorrect return value

kvm_arch_vcpu_ioctl_run currently incorrectly always returns 0.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: prepopulate guest pages after write-protecting
Marcelo Tosatti [Fri, 4 Apr 2008 17:56:44 +0000 (14:56 -0300)]
KVM: MMU: prepopulate guest pages after write-protecting

Zdenek reported a bug where a looping "dmsetup status" eventually hangs
on SMP guests.

The problem is that kvm_mmu_get_page() prepopulates the shadow MMU
before write protecting the guest page tables. By doing so, it leaves a
window open where the guest can mark a pte as present while the host has
shadow cached such pte as "notrap". Accesses to such address will fault
in the guest without the host having a chance to fix the situation.

Fix by moving the write protection before the pte prefetch.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Only mark_page_accessed() if the page was accessed by the guest
Avi Kivity [Thu, 3 Apr 2008 09:02:21 +0000 (12:02 +0300)]
KVM: MMU: Only mark_page_accessed() if the page was accessed by the guest

If the accessed bit is not set, the guest has never accessed this page
(at least through this spte), so there's no need to mark the page
accessed.  This provides more accurate data for the eviction algortithm.

Noted by Andrea Arcangeli.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: add vm refcounting
Izik Eidus [Sun, 30 Mar 2008 13:01:25 +0000 (16:01 +0300)]
KVM: add vm refcounting

the main purpose of adding this functions is the abilaty to release the
spinlock that protect the kvm list while still be able to do operations
on a specific kvm in a safe way.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: function declaration parameter name cleanup
Joerg Roedel [Tue, 1 Apr 2008 14:44:56 +0000 (16:44 +0200)]
KVM: function declaration parameter name cleanup

The kvm_host.h file for x86 declares the functions kvm_set_cr[0348]. In the
header file their second parameter is named cr0 in all cases. This patch
renames the parameters so that they match the function name.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Free apic access page on vm destruction
Avi Kivity [Tue, 25 Mar 2008 09:26:13 +0000 (11:26 +0200)]
KVM: Free apic access page on vm destruction

Noticed by Marcelo Tosatti.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: allow the vm to shrink the kvm mmu shadow caches
Izik Eidus [Sun, 30 Mar 2008 12:17:21 +0000 (15:17 +0300)]
KVM: MMU: allow the vm to shrink the kvm mmu shadow caches

Allow the Linux memory manager to reclaim memory in the kvm shadow cache.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: unify slots_lock usage
Marcelo Tosatti [Sat, 29 Mar 2008 23:17:59 +0000 (20:17 -0300)]
KVM: MMU: unify slots_lock usage

Unify slots_lock acquision around vcpu_run(). This is simpler and less
error-prone.

Also fix some callsites that were not grabbing the lock properly.

[avi: drop slots_lock while in guest mode to avoid holding the lock
      for indefinite periods]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: VMX: Enable MSR Bitmap feature
Sheng Yang [Fri, 28 Mar 2008 05:18:56 +0000 (13:18 +0800)]
KVM: VMX: Enable MSR Bitmap feature

MSR Bitmap controls whether the accessing of an MSR causes VM Exit.
Eliminating exits on automatically saved and restored MSRs yields a
small performance gain.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agos390: KVM guest: virtio device support, and kvm hypercalls
Christian Borntraeger [Tue, 25 Mar 2008 17:47:46 +0000 (18:47 +0100)]
s390: KVM guest: virtio device support, and kvm hypercalls

This patch implements kvm guest kernel support for paravirtualized devices
and contains two parts:
o a basic virtio stub using virtio_ring and external interrupts and hypercalls
o full hypercall implementation in kvm_para.h

Currently we dont have PCI on s390. Making virtio_pci usable for s390 seems
more complicated that providing an own stub. This virtio stub is similar to
the lguest one, the memory for the descriptors and the device detection is made
via additional mapped memory on top of the guest storage. We use an external
interrupt with extint code 0x2603 for host->guest notification.

The hypercall definition uses the diag instruction for issuing a hypercall. The
parameters are written in R2-R7, the hypercall number is written in R1. This is
similar to the system call ABI (svc) which can use R1 for the number and R2-R6
for the parameters.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agos390: KVM guest: detect when running on kvm
Carsten Otte [Tue, 25 Mar 2008 17:47:44 +0000 (18:47 +0100)]
s390: KVM guest: detect when running on kvm

This patch adds functionality to detect if the kernel runs under the KVM
hypervisor. A macro MACHINE_IS_KVM is exported for device drivers. This
allows drivers to skip device detection if the systems runs non-virtualized.
We also define a preferred console to avoid having the ttyS0, which is a line
mode only console.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: s390: update maintainers
Christian Borntraeger [Tue, 25 Mar 2008 17:47:41 +0000 (18:47 +0100)]
KVM: s390: update maintainers

This patch adds an entry for kvm on s390 to the MAINTAINERS file :-). We intend
to push all patches regarding this via Avi's kvm.git.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: s390: API documentation
Carsten Otte [Tue, 25 Mar 2008 17:47:38 +0000 (18:47 +0100)]
KVM: s390: API documentation

This patch adds Documentation/s390/kvm.txt, which describes specifics of kvm's
user interface that are unique to s390 architecture.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: s390: add kvm to kconfig on s390
Christian Borntraeger [Tue, 25 Mar 2008 17:47:36 +0000 (18:47 +0100)]
KVM: s390: add kvm to kconfig on s390

This patch adds the virtualization submenu and the kvm option to the kernel
config. It also defines HAVE_KVM for 64bit kernels.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: s390: intercepts for diagnose instructions
Christian Borntraeger [Tue, 25 Mar 2008 17:47:34 +0000 (18:47 +0100)]
KVM: s390: intercepts for diagnose instructions

This patch introduces interpretation of some diagnose instruction intercepts.
Diagnose is our classic architected way of doing a hypercall. This patch
features the following diagnose codes:
- vm storage size, that tells the guest about its memory layout
- time slice end, which is used by the guest to indicate that it waits
  for a lock and thus cannot use up its time slice in a useful way
- ipl functions, which a guest can use to reset and reboot itself

In order to implement ipl functions, we also introduce an exit reason that
causes userspace to perform various resets on the virtual machine. All resets
are described in the principles of operation book, except KVM_S390_RESET_IPL
which causes a reboot of the machine.

Acked-by: Martin Schwidefsky <martin.schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: s390: interprocessor communication via sigp
Christian Borntraeger [Tue, 25 Mar 2008 17:47:31 +0000 (18:47 +0100)]
KVM: s390: interprocessor communication via sigp

This patch introduces in-kernel handling of _some_ sigp interprocessor
signals (similar to ipi).
kvm_s390_handle_sigp() decodes the sigp instruction and calls individual
handlers depending on the operation requested:
- sigp sense tries to retrieve information such as existence or running state
  of the remote cpu
- sigp emergency sends an external interrupt to the remove cpu
- sigp stop stops a remove cpu
- sigp stop store status stops a remote cpu, and stores its entire internal
  state to the cpus lowcore
- sigp set arch sets the architecture mode of the remote cpu. setting to
  ESAME (s390x 64bit) is accepted, setting to ESA/S390 (s390, 31 or 24 bit) is
  denied, all others are passed to userland
- sigp set prefix sets the prefix register of a remote cpu

For implementation of this, the stop intercept indication starts to get reused
on purpose: a set of action bits defines what to do once a cpu gets stopped:
ACTION_STOP_ON_STOP  really stops the cpu when a stop intercept is recognized
ACTION_STORE_ON_STOP stores the cpu status to lowcore when a stop intercept is
                     recognized

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: s390: intercepts for privileged instructions
Christian Borntraeger [Tue, 25 Mar 2008 17:47:29 +0000 (18:47 +0100)]
KVM: s390: intercepts for privileged instructions

This patch introduces in-kernel handling of some intercepts for privileged
instructions:

handle_set_prefix()        sets the prefix register of the local cpu
handle_store_prefix()      stores the content of the prefix register to memory
handle_store_cpu_address() stores the cpu number of the current cpu to memory
handle_skey()              just decrements the instruction address and retries
handle_stsch()             delivers condition code 3 "operation not supported"
handle_chsc()              same here
handle_stfl()              stores the facility list which contains the
                           capabilities of the cpu
handle_stidp()             stores cpu type/model/revision and such
handle_stsi()              stores information about the system topology

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: s390: interrupt subsystem, cpu timer, waitpsw
Carsten Otte [Tue, 25 Mar 2008 17:47:26 +0000 (18:47 +0100)]
KVM: s390: interrupt subsystem, cpu timer, waitpsw

This patch contains the s390 interrupt subsystem (similar to in kernel apic)
including timer interrupts (similar to in-kernel-pit) and enabled wait
(similar to in kernel hlt).

In order to achieve that, this patch also introduces intercept handling
for instruction intercepts, and it implements load control instructions.

This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
the vm file descriptors and the vcpu file descriptors. In case this ioctl is
issued against a vm file descriptor, the interrupt is considered floating.
Floating interrupts may be delivered to any virtual cpu in the configuration.

The following interrupts are supported:
SIGP STOP       - interprocessor signal that stops a remote cpu
SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
                  (stopped) remote cpu
INT EMERGENCY   - interprocessor interrupt, usually used to signal need_reshed
                  and for smp_call_function() in the guest.
PROGRAM INT     - exception during program execution such as page fault, illegal
                  instruction and friends
RESTART         - interprocessor signal that starts a stopped cpu
INT VIRTIO      - floating interrupt for virtio signalisation
INT SERVICE     - floating interrupt for signalisations from the system
                  service processor

struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
an interrupt, also carrys parameter data for interrupts along with the interrupt
type. Interrupts on s390 usually have a state that represents the current
operation, or identifies which device has caused the interruption on s390.

kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
disabled wait (that is, disabled for interrupts), we exit to userspace. In case
of an enabled wait we set up a timer that equals the cpu clock comparator value
and sleep on a wait queue.

[christian: change virtio interrupt to 0x2603]

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: s390: sie intercept handling
Christian Borntraeger [Tue, 25 Mar 2008 17:47:23 +0000 (18:47 +0100)]
KVM: s390: sie intercept handling

This path introduces handling of sie intercepts in three flavors: Intercepts
are either handled completely in-kernel by kvm_handle_sie_intercept(),
or passed to userspace with corresponding data in struct kvm_run in case
kvm_handle_sie_intercept() returns -ENOTSUPP.
In case of partial execution in kernel with the need of userspace support,
kvm_handle_sie_intercept() may choose to set up struct kvm_run and return
-EREMOTE.

The trivial intercept reasons are handled in this patch:
handle_noop() just does nothing for intercepts that don't require our support
  at all
handle_stop() is called when a cpu enters stopped state, and it drops out to
  userland after updating our vcpu state
handle_validity() faults in the cpu lowcore if needed, or passes the request
  to userland

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: s390: arch backend for the kvm kernel module
Heiko Carstens [Tue, 25 Mar 2008 17:47:20 +0000 (18:47 +0100)]
KVM: s390: arch backend for the kvm kernel module

This patch contains the port of Qumranet's kvm kernel module to IBM zSeries
 (aka s390x, mainframe) architecture. It uses the mainframe's virtualization
instruction SIE to run virtual machines with up to 64 virtual CPUs each.
This port is only usable on 64bit host kernels, and can only run 64bit guest
kernels. However, running 31bit applications in guest userspace is possible.

The following source files are introduced by this patch
arch/s390/kvm/kvm-s390.c    similar to arch/x86/kvm/x86.c, this implements all
                            arch callbacks for kvm. __vcpu_run calls back into
                            sie64a to enter the guest machine context
arch/s390/kvm/sie64a.S      assembler function sie64a, which enters guest
                            context via SIE, and switches world before and after                            that
include/asm-s390/kvm_host.h contains all vital data structures needed to run
                            virtual machines on the mainframe
include/asm-s390/kvm.h      defines kvm_regs and friends for user access to
                            guest register content
arch/s390/kvm/gaccess.h     functions similar to uaccess to access guest memory
arch/s390/kvm/kvm-s390.h    header file for kvm-s390 internals, extended by
                            later patches

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agos390: KVM preparation: address of the 64bit extint parm in lowcore
Christian Borntraeger [Tue, 25 Mar 2008 17:47:15 +0000 (18:47 +0100)]
s390: KVM preparation: address of the 64bit extint parm in lowcore

The address 0x11b8 is used by z/VM for pfault and diag 250 I/O to
provide a 64 bit extint parameter. virtio uses the same address, so
its time to update the lowcore structure.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agos390: KVM preparation: host memory management changes for s390 kvm
Christian Borntraeger [Tue, 25 Mar 2008 17:47:12 +0000 (18:47 +0100)]
s390: KVM preparation: host memory management changes for s390 kvm

This patch changes the s390 memory management defintions to use the pgste field
for dirty and reference bit tracking of host and guest code. Usually on s390,
dirty and referenced are tracked in storage keys, which belong to the physical
page. This changes with virtualization: The guest and host dirty/reference bits
are defined to be the logical OR of the values for the mapping and the physical
page. This patch implements the necessary changes in pgtable.h for s390.

There is a common code change in mm/rmap.c, the call to
page_test_and_clear_young must be moved. This is a no-op for all
architecture but s390. page_referenced checks the referenced bits for
the physiscal page and for all mappings:
o The physical page is checked with page_test_and_clear_young.
o The mappings are checked with ptep_test_and_clear_young and friends.

Without pgstes (the current implementation on Linux s390) the physical page
check is implemented but the mapping callbacks are no-ops because dirty
and referenced are not tracked in the s390 page tables. The pgstes introduces
guest and host dirty and reference bits for s390 in the host mapping. These
mapping must be checked before page_test_and_clear_young resets the reference
bit.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agos390: KVM preparation: provide hook to enable pgstes in user pagetable
Carsten Otte [Tue, 25 Mar 2008 17:47:10 +0000 (18:47 +0100)]
s390: KVM preparation: provide hook to enable pgstes in user pagetable

The SIE instruction on s390 uses the 2nd half of the page table page to
virtualize the storage keys of a guest. This patch offers the s390_enable_sie
function, which reorganizes the page tables of a single-threaded process to
reserve space in the page table:
s390_enable_sie makes sure that the process is single threaded and then uses
dup_mm to create a new mm with reorganized page tables. The old mm is freed
and the process has now a page status extended field after every page table.

Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.

This patch has a small common code hit, namely making dup_mm non-static.

Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
review feedback. Now we do have the prototype for dup_mm in
include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now
call task_lock() to prevent race against ptrace modification of mm_users.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86: hardware task switching support
Izik Eidus [Mon, 24 Mar 2008 21:14:53 +0000 (23:14 +0200)]
KVM: x86: hardware task switching support

This emulates the x86 hardware task switch mechanism in software, as it is
unsupported by either vmx or svm.  It allows operating systems which use it,
like freedos, to run as kvm guests.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86: add functions to get the cpl of vcpu
Izik Eidus [Mon, 24 Mar 2008 17:38:34 +0000 (19:38 +0200)]
KVM: x86: add functions to get the cpl of vcpu

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: VMX: Add module option to disable flexpriority
Avi Kivity [Mon, 24 Mar 2008 16:15:14 +0000 (18:15 +0200)]
KVM: VMX: Add module option to disable flexpriority

Useful for debugging.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: no longer EXPERIMENTAL
Avi Kivity [Sun, 23 Mar 2008 16:36:30 +0000 (18:36 +0200)]
KVM: no longer EXPERIMENTAL

Long overdue.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Introduce and use spte_to_page()
Avi Kivity [Sun, 23 Mar 2008 13:06:23 +0000 (15:06 +0200)]
KVM: MMU: Introduce and use spte_to_page()

Encapsulate the pte mask'n'shift in a function.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: fix dirty bit setting when removing write permissions
Izik Eidus [Thu, 20 Mar 2008 16:17:24 +0000 (18:17 +0200)]
KVM: MMU: fix dirty bit setting when removing write permissions

When mmu_set_spte() checks if a page related to spte should be release as
dirty or clean, it check if the shadow pte was writeble, but in case
rmap_write_protect() is called called it is possible for shadow ptes that were
writeble to become readonly and therefor mmu_set_spte will release the pages
as clean.

This patch fix this issue by marking the page as dirty inside
rmap_write_protect().

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Move some x86 specific constants and structures to include/asm-x86
Avi Kivity [Fri, 21 Mar 2008 10:38:23 +0000 (12:38 +0200)]
KVM: Move some x86 specific constants and structures to include/asm-x86

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Set the accessed bit on non-speculative shadow ptes
Avi Kivity [Tue, 18 Mar 2008 09:05:52 +0000 (11:05 +0200)]
KVM: MMU: Set the accessed bit on non-speculative shadow ptes

If we populate a shadow pte due to a fault (and not speculatively due to a
pte write) then we can set the accessed bit on it, as we know it will be
set immediately on the next guest instruction.  This saves a read-modify-write
operation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: kvm.h: __user requires compiler.h
Christian Borntraeger [Wed, 12 Mar 2008 17:10:45 +0000 (18:10 +0100)]
KVM: kvm.h: __user requires compiler.h

include/linux/kvm.h defines struct kvm_dirty_log to
[...]
union {
void __user *dirty_bitmap; /* one bit per page */
__u64 padding;
};

__user requires compiler.h to compile. Currently, this works on x86
only coincidentally due to other include files. This patch makes
kvm.h compile in all cases.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agox86: KVM guest: disable clock before rebooting.
Glauber Costa [Mon, 17 Mar 2008 19:08:40 +0000 (16:08 -0300)]
x86: KVM guest: disable clock before rebooting.

This patch writes 0 (actually, what really matters is that the
LSB is cleared) to the system time msr before shutting down
the machine for kexec.

Without it, we can have a random memory location being written
when the guest comes back

It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c)
and crash_shutdown, used in the path of crash_kexec() (kexec.c)

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agox86: make native_machine_shutdown non-static
Glauber Costa [Mon, 17 Mar 2008 19:08:39 +0000 (16:08 -0300)]
x86: make native_machine_shutdown non-static

it will allow external users to call it. It is mainly
useful for routines that will override its machine_ops
field for its own special purposes, but want to call the
normal shutdown routine after they're done

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agox86: allow machine_crash_shutdown to be replaced
Glauber Costa [Mon, 17 Mar 2008 19:08:38 +0000 (16:08 -0300)]
x86: allow machine_crash_shutdown to be replaced

This patch a llows machine_crash_shutdown to
be replaced, just like any of the other functions
in machine_ops

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agox86: KVM guest: hypercall batching
Marcelo Tosatti [Fri, 22 Feb 2008 17:21:38 +0000 (12:21 -0500)]
x86: KVM guest: hypercall batching

Batch pte updates and tlb flushes in lazy MMU mode.

[avi:
 - adjust to mmu_op
 - helper for getting para_state without debug warnings]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agox86: KVM guest: hypercall based pte updates and TLB flushes
Marcelo Tosatti [Fri, 22 Feb 2008 17:21:37 +0000 (12:21 -0500)]
x86: KVM guest: hypercall based pte updates and TLB flushes

Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.

Don't report the feature if two dimensional paging is enabled.

[avi:
 - guest/host split
 - fix 32-bit truncation issues
 - adjust to mmu_op
 - adjust to ->release_*() renamed
 - add ->release_pud()]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: hypercall based pte updates and TLB flushes
Marcelo Tosatti [Fri, 22 Feb 2008 17:21:37 +0000 (12:21 -0500)]
KVM: MMU: hypercall based pte updates and TLB flushes

Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.

Don't report the feature if two dimensional paging is enabled.

[avi:
 - one mmu_op hypercall instead of one per op
 - allow 64-bit gpa on hypercall
 - don't pass host errors (-ENOMEM) to guest]

[akpm: warning fix on i386]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Provide unlocked version of emulator_write_phys()
Avi Kivity [Sun, 2 Mar 2008 12:06:05 +0000 (14:06 +0200)]
KVM: Provide unlocked version of emulator_write_phys()

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agox86: KVM guest: add basic paravirt support
Marcelo Tosatti [Fri, 22 Feb 2008 17:21:36 +0000 (12:21 -0500)]
x86: KVM guest: add basic paravirt support

Add basic KVM paravirt support. Avoid vm-exits on IO delays.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: add basic paravirt support
Marcelo Tosatti [Fri, 22 Feb 2008 17:21:36 +0000 (12:21 -0500)]
KVM: add basic paravirt support

Add basic KVM paravirt support. Avoid vm-exits on IO delays.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Add reset support for in kernel PIT
Sheng Yang [Thu, 13 Mar 2008 02:22:26 +0000 (10:22 +0800)]
KVM: Add reset support for in kernel PIT

Separate the reset part and prepare for reset support.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Add save/restore supporting of in kernel PIT
Sheng Yang [Mon, 3 Mar 2008 16:50:59 +0000 (00:50 +0800)]
KVM: Add save/restore supporting of in kernel PIT

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: In kernel PIT model
Sheng Yang [Sun, 27 Jan 2008 21:10:22 +0000 (05:10 +0800)]
KVM: In kernel PIT model

The patch moves the PIT model from userspace to kernel, and increases
the timer accuracy greatly.

[marcelo: make last_injected_time per-guest]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-and-Acked-by: Alex Davis <alex14641@yahoo.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Remove pointless desc_ptr #ifdef
Avi Kivity [Wed, 5 Mar 2008 07:33:44 +0000 (09:33 +0200)]
KVM: Remove pointless desc_ptr #ifdef

The desc_struct changes left an unnecessary #ifdef; remove it.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: VMX: Don't adjust tsc offset forward
Avi Kivity [Tue, 4 Mar 2008 08:44:51 +0000 (10:44 +0200)]
KVM: VMX: Don't adjust tsc offset forward

Most Intel hosts have a stable tsc, and playing with the offset only
reduces accuracy.  By limiting tsc offset adjustment only to forward updates,
we effectively disable tsc offset adjustment on these hosts.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: replace remaining __FUNCTION__ occurances
Harvey Harrison [Mon, 3 Mar 2008 20:59:56 +0000 (12:59 -0800)]
KVM: replace remaining __FUNCTION__ occurances

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: detect if VCPU triple faults
Joerg Roedel [Tue, 26 Feb 2008 15:49:16 +0000 (16:49 +0100)]
KVM: detect if VCPU triple faults

In the current inject_page_fault path KVM only checks if there is another PF
pending and injects a DF then. But it has to check for a pending DF too to
detect a shutdown condition in the VCPU.  If this is not detected the VCPU goes
to a PF -> DF -> PF loop when it should triple fault. This patch detects this
condition and handles it with an KVM_SHUTDOWN exit to userspace.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Use kzalloc to avoid allocating kvm_regs from kernel stack
Xiantao Zhang [Mon, 25 Feb 2008 10:52:20 +0000 (18:52 +0800)]
KVM: Use kzalloc to avoid allocating kvm_regs from kernel stack

Since the size of kvm_regs is too big to allocate from kernel stack on ia64,
use kzalloc to allocate it.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Prefix control register accessors with kvm_ to avoid namespace pollution
Avi Kivity [Sun, 24 Feb 2008 09:20:43 +0000 (11:20 +0200)]
KVM: Prefix control register accessors with kvm_ to avoid namespace pollution

Names like 'set_cr3()' look dangerously close to affecting the host.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: large page support
Marcelo Tosatti [Sat, 23 Feb 2008 14:44:30 +0000 (11:44 -0300)]
KVM: MMU: large page support

Create large pages mappings if the guest PTE's are marked as such and
the underlying memory is hugetlbfs backed.  If the largepage contains
write-protected pages, a large pte is not used.

Gives a consistent 2% improvement for data copies on ram mounted
filesystem, without NPT/EPT.

Anthony measures a 4% improvement on 4-way kernbench, with NPT.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>