firefly-linux-kernel-4.4.55.git
17 years agoKVM: Add get/set irqchip ioctls for in-kernel PIC live migration support
He, Qing [Thu, 26 Jul 2007 08:05:18 +0000 (11:05 +0300)]
KVM: Add get/set irqchip ioctls for in-kernel PIC live migration support

This patch adds two new ioctls to dump and write kernel irqchips for
save/restore and live migration. PIC s/r and l/m is implemented in this
patch.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Protect in-kernel pio using kvm->lock
Eddie Dong [Sun, 22 Jul 2007 07:36:31 +0000 (10:36 +0300)]
KVM: Protect in-kernel pio using kvm->lock

pio operation and IRQ_LINE kvm_vm_ioctl is not kvm->lock
protected.  Add lock to same with IOAPIC MMIO operations.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Emulate hlt in the kernel
Eddie Dong [Wed, 18 Jul 2007 09:15:21 +0000 (12:15 +0300)]
KVM: Emulate hlt in the kernel

By sleeping in the kernel when hlt is executed, we simplify the in-kernel
guest interrupt path considerably.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: In-kernel I/O APIC model
Eddie Dong [Wed, 18 Jul 2007 09:03:39 +0000 (12:03 +0300)]
KVM: In-kernel I/O APIC model

This allows in-kernel host-side device drivers to raise guest interrupts
without going to userspace.

[avi: fix level-triggered interrupt redelivery on eoi]
[avi: add missing #include]
[avi: avoid redelivery of edge-triggered interrupt]
[avi: implement polarity]
[avi: don't deliver edge-triggered interrupts when unmasking]
[avi: fix host oops on invalid guest access]

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Emulate local APIC in kernel
Eddie Dong [Wed, 12 Sep 2007 07:58:04 +0000 (10:58 +0300)]
KVM: Emulate local APIC in kernel

Because lightweight exits (exits which don't involve userspace) are many
times faster than heavyweight exits, it makes sense to emulate high usage
devices in the kernel.  The local APIC is one such device, especially for
Windows and for SMP, so we add an APIC model to kvm.

It also allows in-kernel host-side drivers to inject interrupts without
going through userspace.

[compile fix on i386 from Jindrich Makovicka]

Signed-off-by: Yaozu (Eddie) Dong <Eddie.Dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Define and use cr8 access functions
Eddie Dong [Wed, 18 Jul 2007 08:34:57 +0000 (11:34 +0300)]
KVM: Define and use cr8 access functions

This patch is to wrap APIC base register and CR8 operation which can
provide a unique API for user level irqchip and kernel irqchip.
This is a preparation of merging lapic/ioapic patch.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Add support for in-kernel PIC emulation
Eddie Dong [Fri, 6 Jul 2007 09:20:49 +0000 (12:20 +0300)]
KVM: Add support for in-kernel PIC emulation

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Split segments reload in vmx_load_host_state()
Laurent Vivier [Thu, 23 Aug 2007 14:33:11 +0000 (16:33 +0200)]
KVM: VMX: Split segments reload in vmx_load_host_state()

vmx_load_host_state() bundles fs, gs, ldt, and tss reloading into
one in the hope that it is infrequent. With smp guests, fs reloading is
frequent due to fs being used by threads.

Unbundle the reloads so reduce expensive gs reloads.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: X86 emulator: fix 'push reg' writeback
Avi Kivity [Wed, 22 Aug 2007 15:09:29 +0000 (18:09 +0300)]
KVM: X86 emulator: fix 'push reg' writeback

Pointed out by Rusty Russell.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Support more memory slots
Izik Eidus [Mon, 20 Aug 2007 15:11:00 +0000 (18:11 +0300)]
KVM: Support more memory slots

Needed for mapping memory at 4GB.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: allow rmode_tss_base() to work with >2G of guest memory
Izik Eidus [Sun, 19 Aug 2007 19:24:58 +0000 (22:24 +0300)]
KVM: VMX: allow rmode_tss_base() to work with >2G of guest memory

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: x86 emulator: implement 'push reg' (opcodes 0x50-0x57)
Nitin A Kamble [Sun, 19 Aug 2007 08:07:06 +0000 (11:07 +0300)]
KVM: x86 emulator: implement 'push reg' (opcodes 0x50-0x57)

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: x86 emulator: Implement 'jmp rel short' instruction (opcode 0xeb)
Nitin A Kamble [Sun, 19 Aug 2007 08:03:13 +0000 (11:03 +0300)]
KVM: x86 emulator: Implement 'jmp rel short' instruction (opcode 0xeb)

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: x86 emulator: implement 'jmp rel' instruction (opcode 0xe9)
Nitin A Kamble [Sun, 19 Aug 2007 08:00:36 +0000 (11:00 +0300)]
KVM: x86 emulator: implement 'jmp rel' instruction (opcode 0xe9)

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: x86 emulator: implement 'and $imm, %{al|ax|eax}'
Nitin A Kamble [Fri, 17 Aug 2007 12:17:41 +0000 (15:17 +0300)]
KVM: x86 emulator: implement 'and $imm, %{al|ax|eax}'

Implement emulation of instruction
    and al imm8 (opcode 0x24)
    and ax/eax imm16/imm32 (opcode 0x25)

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Communicate cr8 changes to userspace
Yang, Sheng [Thu, 16 Aug 2007 10:01:00 +0000 (13:01 +0300)]
KVM: Communicate cr8 changes to userspace

This allows running 64-bit Windows.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Close minor race in signal handling
Avi Kivity [Wed, 15 Aug 2007 12:23:34 +0000 (15:23 +0300)]
KVM: Close minor race in signal handling

We need to check for signals inside the critical section, otherwise a
signal can be sent which we will not notice.  Also move the check
before entry, so that if the signal happens before the first entry,
we exit immediately instead of waiting for something to happen to the
guest.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Clean up kvm_setup_pio()
Laurent Vivier [Sun, 5 Aug 2007 07:43:32 +0000 (10:43 +0300)]
KVM: Clean up kvm_setup_pio()

Split kvm_setup_pio() into two functions, one to setup in/out pio
(kvm_emulate_pio()) and one to setup ins/outs pio (kvm_emulate_pio_string()).

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Cleanup string I/O instruction emulation
Laurent Vivier [Sun, 5 Aug 2007 07:36:40 +0000 (10:36 +0300)]
KVM: Cleanup string I/O instruction emulation

Both vmx and svm decode the I/O instructions, and both botch the job,
requiring the instruction prefixes to be fetched in order to completely
decode the instruction.

So, if we see a string I/O instruction, use the x86 emulator to decode it,
as it already has all the prefix decoding machinery.

This patch defines ins/outs opcodes in x86_emulate.c and calls
emulate_instruction() from io_interception() (svm.c) and from handle_io()
(vmx.c).  It removes all vmx/svm prefix instruction decoders
(get_addr_size(), io_get_override(), io_address(), get_io_count())

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Remove useless assignment
Laurent Vivier [Wed, 1 Aug 2007 18:51:09 +0000 (21:51 +0300)]
KVM: Remove useless assignment

Line 1809 of kvm_main.c is useless, value is overwritten in line 1815:

1809         now = min(count, PAGE_SIZE / size);
1810
1811         if (!down)
1812                 in_page = PAGE_SIZE - offset_in_page(address);
1813         else
1814                 in_page = offset_in_page(address) + size;
1815         now = min(count, (unsigned long)in_page / size);
1816         if (!now) {

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Remove a duplicated ia32e mode vm entry control
Li, Xin B [Wed, 1 Aug 2007 18:49:10 +0000 (21:49 +0300)]
KVM: VMX: Remove a duplicated ia32e mode vm entry control

Remove a duplicated ia32e mode VM Entry control definition and use the
proper one.

Signed-off-by: Xin Li <xin.b.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Use kmem_cache_free for kmem_cache_zalloc'ed objects
Rusty Russell [Wed, 1 Aug 2007 04:46:11 +0000 (14:46 +1000)]
KVM: Use kmem_cache_free for kmem_cache_zalloc'ed objects

We use kfree in svm.c and vmx.c, and this works, but it could break at
any time.  kfree() is supposed to match up with kmalloc().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Add and use pr_unimpl for standard formatting of unimplemented features
Rusty Russell [Wed, 1 Aug 2007 00:48:02 +0000 (10:48 +1000)]
KVM: Add and use pr_unimpl for standard formatting of unimplemented features

All guest-invokable printks should be ratelimited to prevent malicious
guests from flooding logs.  This is a start.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Remove unneeded kvm_dev_open and kvm_dev_release functions.
Rusty Russell [Wed, 1 Aug 2007 00:17:06 +0000 (10:17 +1000)]
KVM: Remove unneeded kvm_dev_open and kvm_dev_release functions.

Devices don't need open or release functions.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Remove stat_set from debugfs
Rusty Russell [Wed, 1 Aug 2007 00:12:22 +0000 (10:12 +1000)]
KVM: Remove stat_set from debugfs

We shouldn't define stat_set on the debug attributes, since that will
cause silent failure on writing: without a set argument, userspace
will get -EACCESS.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Fix defined but not used warning in drivers/kvm/vmx.c
Gabriel C [Wed, 1 Aug 2007 14:23:10 +0000 (16:23 +0200)]
KVM: Fix defined but not used warning in drivers/kvm/vmx.c

move_msr_up() is used only on X86_64 and generates a warning on !X86_64

Signed-off-by: Gabriel Craciunescu <nix.or.die@googlemail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Remove redundant alloc_vmcs_cpu declaration
Rusty Russell [Tue, 31 Jul 2007 10:46:12 +0000 (20:46 +1000)]
KVM: Remove redundant alloc_vmcs_cpu declaration

alloc_vmcs_cpu is already declared (static) above, no need to
redeclare.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: SVM: Make set_msr_interception more reliable
Rusty Russell [Tue, 31 Jul 2007 10:42:42 +0000 (20:42 +1000)]
KVM: SVM: Make set_msr_interception more reliable

set_msr_interception() is used by svm to set up which MSRs should be
intercepted.  It can only fail if someone has changed the code to try
to intercept an MSR without updating the array of ranges.

The return value is ignored anyway: it should just BUG() if it doesn't
work.  (A build-time failure would be better, but that's tricky).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Cleanup mark_page_dirty
Rusty Russell [Tue, 31 Jul 2007 10:41:14 +0000 (20:41 +1000)]
KVM: Cleanup mark_page_dirty

For some reason, mark_page_dirty open-codes __gfn_to_memslot().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Don't assign vcpu->cr3 if it's invalid: check first, set last
Rusty Russell [Tue, 31 Jul 2007 10:45:03 +0000 (20:45 +1000)]
KVM: Don't assign vcpu->cr3 if it's invalid: check first, set last

sSigned-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Add cpu consistency check
Yang, Sheng [Tue, 31 Jul 2007 11:23:01 +0000 (14:23 +0300)]
KVM: VMX: Add cpu consistency check

All the physical CPUs on the board should support the same VMX feature
set.  Add check_processor_compatibility to kvm_arch_ops for the consistency
check.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: kvm_vm_ioctl_get_dirty_log restore "nothing dirty" optimization
Rusty Russell [Tue, 31 Jul 2007 09:57:47 +0000 (19:57 +1000)]
KVM: kvm_vm_ioctl_get_dirty_log restore "nothing dirty" optimization

kvm_vm_ioctl_get_dirty_log scans bitmap to see it it's all zero, but
doesn't use that information.

Avi says:
Looks like it was used to guard kvm_mmu_slot_remove_write_access();
optimizing the case where the guest just leaves the screen alone (which
it usually does, especially in benchmarks).

I'd rather reinstate that optimization.  See
90cb0529dd230548a7f0d6b315997be854caea1b where the damage was done.

It's pretty simple: if the bitmap is all zero, we don't need to do anything to
clean it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Use alignment properties of vcpu to simplify FPU ops
Rusty Russell [Mon, 30 Jul 2007 11:13:43 +0000 (21:13 +1000)]
KVM: Use alignment properties of vcpu to simplify FPU ops

Now we use a kmem cache for allocating vcpus, we can get the 16-byte
alignment required by fxsave & fxrstor instructions, and avoid
manually aligning the buffer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Use kmem cache for allocating vcpus
Rusty Russell [Mon, 30 Jul 2007 11:12:19 +0000 (21:12 +1000)]
KVM: Use kmem cache for allocating vcpus

Avi wants the allocations of vcpus centralized again.  The easiest way
is to add a "size" arg to kvm_init_arch, and expose the thus-prepared
cache to the modules.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Remove kvm_{read,write}_guest()
Laurent Vivier [Mon, 30 Jul 2007 10:41:19 +0000 (13:41 +0300)]
KVM: Remove kvm_{read,write}_guest()

... in favor of the more general emulator_{read,write}_*.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Change the emulator_{read,write,cmpxchg}_* functions to take a vcpu
Laurent Vivier [Mon, 30 Jul 2007 10:35:24 +0000 (13:35 +0300)]
KVM: Change the emulator_{read,write,cmpxchg}_* functions to take a vcpu

... instead of a x86_emulate_ctxt, so that other callers can use it easily.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: SVM: internal function name cleanup
Rusty Russell [Mon, 30 Jul 2007 10:08:05 +0000 (20:08 +1000)]
KVM: SVM: internal function name cleanup

Changes some svm.c internal function names:
1) io_adress -> io_address  (de-germanify the spelling)
2) kvm_reput_irq -> reput_irq  (it's not a generic kvm function)
3) kvm_do_inject_irq -> (it's not a generic kvm function)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: SVM: de-containization
Rusty Russell [Mon, 30 Jul 2007 10:07:08 +0000 (20:07 +1000)]
KVM: SVM: de-containization

container_of is wonderful, but not casting at all is better.  This
patch changes svm.c's internal functions to pass "struct vcpu_svm"
instead of "struct kvm_vcpu" and using container_of.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Remove three magic numbers
Rusty Russell [Mon, 30 Jul 2007 06:41:57 +0000 (16:41 +1000)]
KVM: Remove three magic numbers

There are several places where hardcoded numbers are used in place of
the easily-available constant, which is poor form.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: pass vcpu_vmx internally
Rusty Russell [Mon, 30 Jul 2007 06:31:43 +0000 (16:31 +1000)]
KVM: VMX: pass vcpu_vmx internally

container_of is wonderful, but not casting at all is better.  This
patch changes vmx.c's internal functions to pass "struct vcpu_vmx"
instead of "struct kvm_vcpu" and using container_of.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: fx_init() needs preemption disabled while it plays with the FPU state
Rusty Russell [Mon, 30 Jul 2007 06:29:56 +0000 (16:29 +1000)]
KVM: fx_init() needs preemption disabled while it plays with the FPU state

Now that kvm generally runs with preemption enabled, we need to protect
the fpu intialization sequence.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Convert vm lock to a mutex
Shaohua Li [Mon, 23 Jul 2007 06:51:37 +0000 (14:51 +0800)]
KVM: Convert vm lock to a mutex

This allows the kvm mmu to perform sleepy operations, such as memory
allocation.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Use the scheduler preemption notifiers to make kvm preemptible
Avi Kivity [Wed, 11 Jul 2007 15:17:21 +0000 (18:17 +0300)]
KVM: Use the scheduler preemption notifiers to make kvm preemptible

Current kvm disables preemption while the new virtualization registers are
in use.  This of course is not very good for latency sensitive workloads (one
use of virtualization is to offload user interface and other latency
insensitive stuff to a container, so that it is easier to analyze the
remaining workload).  This patch re-enables preemption for kvm; preemption
is now only disabled when switching the registers in and out, and during
the switch to guest mode and back.

Contains fixes from Shaohua Li <shaohua.li@intel.com>.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: add hypercall nr to kvm_run
Jeff Dike [Mon, 16 Jul 2007 19:24:47 +0000 (15:24 -0400)]
KVM: add hypercall nr to kvm_run

Add the hypercall number to kvm_run and initialize it.  This changes the ABI,
but as this particular ABI was unusable before this no users are affected.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Improve the method of writing vmcs control
Yang, Sheng [Sun, 29 Jul 2007 08:07:42 +0000 (11:07 +0300)]
KVM: VMX: Improve the method of writing vmcs control

Put cpu feature detecting part in hardware_setup, and stored the vmcs
condition in global variable for further check.

[glommer: fix for some i386-only machines not supporting CR8 load/store
 exiting]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Dynamically allocate vcpus
Rusty Russell [Fri, 27 Jul 2007 07:16:56 +0000 (17:16 +1000)]
KVM: Dynamically allocate vcpus

This patch converts the vcpus array in "struct kvm" to a pointer
array, and changes the "vcpu_create" and "vcpu_setup" hooks into one
"vcpu_create" call which does the allocation and initialization of the
vcpu (calling back into the kvm_vcpu_init core helper).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Remove arch specific components from the general code
Gregory Haskins [Fri, 27 Jul 2007 12:13:10 +0000 (08:13 -0400)]
KVM: Remove arch specific components from the general code

struct kvm_vcpu has vmx-specific members; remove them to a private structure.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: load_pdptrs() cleanups
Rusty Russell [Wed, 25 Jul 2007 03:29:51 +0000 (13:29 +1000)]
KVM: load_pdptrs() cleanups

load_pdptrs can be handed an invalid cr3, and it should not oops.
This can happen because we injected #gp in set_cr3() after we set
vcpu->cr3 to the invalid value, or from kvm_vcpu_ioctl_set_sregs(), or
memory configuration changes after the guest did set_cr3().

We should also copy the pdpte array once, before checking and
assigning, otherwise an SMP guest can potentially alter the values
between the check and the set.

Finally one nitpick: ret = 1 should be done as late as possible: this
allows GCC to check for unset "ret" should the function change in
future.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Remove dead code in the cmpxchg instruction emulation
Aurelien Jarno [Wed, 25 Jul 2007 09:41:57 +0000 (11:41 +0200)]
KVM: Remove dead code in the cmpxchg instruction emulation

The writeback fixes (02c03a326a5df825cc01de426f72e160db2b9538) let
some dead code in the cmpxchg instruction emulation. Remove it.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Import some constants of vmcs from IA32 SDM
Yang, Sheng [Wed, 25 Jul 2007 09:17:06 +0000 (12:17 +0300)]
KVM: VMX: Import some constants of vmcs from IA32 SDM

This patch mainly imports some constants and rename two exist constants
of vmcs according to IA32 SDM.

It also adds two constants to indicate Lock bit and Enable bit in
MSR_IA32_FEATURE_CONTROL, and replace the hardcode _5_ with these two
bits.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Move gfn_to_page out of kmap/unmap pairs
Shaohua Li [Mon, 23 Jul 2007 06:51:39 +0000 (14:51 +0800)]
KVM: Move gfn_to_page out of kmap/unmap pairs

gfn_to_page might sleep with swap support. Move it out of the kmap calls.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Hoist kvm_mmu_reload() out of the critical section
Shaohua Li [Mon, 23 Jul 2007 06:51:32 +0000 (14:51 +0800)]
KVM: Hoist kvm_mmu_reload() out of the critical section

vmx_cpu_run doesn't handle error correctly and kvm_mmu_reload might
sleep with mutex changes, so I move it above.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Return if the pdptrs are invalid when the guest turns on PAE.
Rusty Russell [Mon, 23 Jul 2007 07:11:02 +0000 (17:11 +1000)]
KVM: Return if the pdptrs are invalid when the guest turns on PAE.

Don't fall through and turn on PAE in this case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: x86 emulator: fix faulty check for two-byte opcode
Avi Kivity [Sun, 22 Jul 2007 12:51:58 +0000 (15:51 +0300)]
KVM: x86 emulator: fix faulty check for two-byte opcode

Right now, the bug is harmless as we never emulate one-byte 0xb6 or 0xb7.
But things may change.

Noted by the mysterious Gabriel C.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: x86 emulator: fix cmov for writeback changes
Avi Kivity [Fri, 20 Jul 2007 09:30:58 +0000 (12:30 +0300)]
KVM: x86 emulator: fix cmov for writeback changes

The writeback fixes (02c03a326a5df825cc01de426f72e160db2b9538) broke
cmov emulation.  Fix.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Use standard CR8 flags, and fix TPR definition
Rusty Russell [Tue, 17 Jul 2007 13:37:17 +0000 (23:37 +1000)]
KVM: Use standard CR8 flags, and fix TPR definition

Intel manual (and KVM definition) say the TPR is 4 bits wide.  Also fix
CR8_RESEVED_BITS typo.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Set exit_reason to KVM_EXIT_MMIO where run->mmio is initialized.
Jeff Dike [Tue, 17 Jul 2007 16:26:59 +0000 (12:26 -0400)]
KVM: Set exit_reason to KVM_EXIT_MMIO where run->mmio is initialized.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Trivial: Use standard BITMAP macros, open-code userspace-exposed header
Rusty Russell [Wed, 18 Jul 2007 03:05:58 +0000 (13:05 +1000)]
KVM: Trivial: Use standard BITMAP macros, open-code userspace-exposed header

Creating one's own BITMAP macro seems suboptimal: if we use manual
arithmetic in the one place exposed to userspace, we can use standard
macros elsewhere.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Use standard CR4 flags, tighten checking
Rusty Russell [Tue, 17 Jul 2007 13:34:16 +0000 (23:34 +1000)]
KVM: Use standard CR4 flags, tighten checking

On this machine (Intel), writing to the CR4 bits 0x00000800 and
0x00001000 cause a GPF.  The Intel manual is a little unclear, but
AFIACT they're reserved, too.

Also fix spelling of CR4_RESEVED_BITS.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Use standard CR3 flags, tighten checking
Rusty Russell [Tue, 17 Jul 2007 13:32:55 +0000 (23:32 +1000)]
KVM: Use standard CR3 flags, tighten checking

The kernel now has asm/cpu-features.h: use those macros instead of inventing
our own.

Also spell out definition of CR3_RESEVED_BITS, fix spelling and
tighten it for the non-PAE case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Trivial: Use standard CR0 flags macros from asm/cpu-features.h
Rusty Russell [Tue, 17 Jul 2007 13:19:08 +0000 (23:19 +1000)]
KVM: Trivial: Use standard CR0 flags macros from asm/cpu-features.h

The kernel now has asm/cpu-features.h: use those macros instead of
inventing our own.

Also spell out definition of CR0_RESEVED_BITS (no code change) and fix typo.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Trivial: Avoid hardware_disable predeclaration
Rusty Russell [Tue, 17 Jul 2007 13:17:55 +0000 (23:17 +1000)]
KVM: Trivial: Avoid hardware_disable predeclaration

Don't pre-declare hardware_disable: shuffle the reboot hook down.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Trivial: Comment spelling may escape grep
Rusty Russell [Tue, 17 Jul 2007 13:16:56 +0000 (23:16 +1000)]
KVM: Trivial: Comment spelling may escape grep

Speling error in comment.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Trivial: Make decode_register() static
Rusty Russell [Tue, 17 Jul 2007 13:16:11 +0000 (23:16 +1000)]
KVM: Trivial: Make decode_register() static

I have shied away from touching x86_emulate.c (it could definitely use
some love, but it is forked from the Xen code, and it would be more
productive to cross-merge fixes).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Trivial: Remove unused struct cpu_user_regs declaration
Rusty Russell [Tue, 17 Jul 2007 13:15:29 +0000 (23:15 +1000)]
KVM: Trivial: Remove unused struct cpu_user_regs declaration

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Trivial: /dev/kvm interface is no longer experimental.
Rusty Russell [Tue, 17 Jul 2007 13:12:26 +0000 (23:12 +1000)]
KVM: Trivial: /dev/kvm interface is no longer experimental.

KVM interface is no longer experimental.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: In-kernel string pio write support
Eddie Dong [Tue, 17 Jul 2007 08:52:33 +0000 (11:52 +0300)]
KVM: In-kernel string pio write support

Add string pio write support to support some version of Windows.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Future-proof the exit information union ABI
Avi Kivity [Tue, 17 Jul 2007 08:45:55 +0000 (11:45 +0300)]
KVM: Future-proof the exit information union ABI

Note that as the size of struct kvm_run is not part of the ABI, we can add
things at the end.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: SMP: Add vcpu_id field in struct vcpu
Qing He [Thu, 12 Jul 2007 09:33:56 +0000 (12:33 +0300)]
KVM: SMP: Add vcpu_id field in struct vcpu

This patch adds a `vcpu_id' field in `struct vcpu', so we can
differentiate BSP and APs without pointer comparison or arithmetic.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Fix *nopage() in kvm_main.c
Nguyen Anh Quynh [Wed, 11 Jul 2007 11:30:54 +0000 (14:30 +0300)]
KVM: Fix *nopage() in kvm_main.c

*nopage() in kvm_main.c should only store the type of mmap() fault if
the pointers are not NULL. This patch fixes the problem.

Signed-off-by: Nguyen Anh Quynh <aquynh@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoi386: Expose IOAPIC register definitions even if CONFIG_X86_IO_APIC is not set
Avi Kivity [Thu, 27 Sep 2007 08:07:04 +0000 (10:07 +0200)]
i386: Expose IOAPIC register definitions even if CONFIG_X86_IO_APIC is not set

KVM reuses the IOAPIC register definitions, and needs them even if the
host is not compiled with IOAPIC support.  Move the #ifdef below so that only
the IOAPIC variables and functions are protected, and the register definitions
are available to all.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agox86/pci/acpi: fix DMI const-ification fallout
Jeff Garzik [Sat, 13 Oct 2007 02:34:40 +0000 (22:34 -0400)]
x86/pci/acpi: fix DMI const-ification fallout

Fix DMI const-ification fallout that appeared when merging subsystem
trees.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agox86: optimise barriers
Nick Piggin [Sat, 13 Oct 2007 01:07:38 +0000 (03:07 +0200)]
x86: optimise barriers

According to latest memory ordering specification documents from Intel
and AMD, both manufacturers are committed to in-order loads from
cacheable memory for the x86 architecture.  Hence, smp_rmb() may be a
simple barrier.

Also according to those documents, and according to existing practice in
Linux (eg.  spin_unlock doesn't enforce ordering), stores to cacheable
memory are visible in program order too.  Special string stores are safe
-- their constituent stores may be out of order, but they must complete
in order WRT surrounding stores.  Nontemporal stores to WB memory can go
out of order, and so they should be fenced explicitly to make them
appear in-order WRT other stores.  Hence, smp_wmb() may be a simple
barrier.

    http://developer.intel.com/products/processor/manuals/318147.pdf
    http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf

In userspace microbenchmarks on a core2 system, fence instructions range
anywhere from around 15 cycles to 50, which may not be totally
insignificant in performance critical paths (code size will go down
too).

However the primary motivation for this is to have the canonical barrier
implementation for x86 architecture.

smp_rmb on buggy pentium pros remains a locked op, which is apparently
required.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agox86: fix IO write barrier
Nick Piggin [Sat, 13 Oct 2007 01:06:55 +0000 (03:06 +0200)]
x86: fix IO write barrier

wmb() on x86 must always include a barrier, because stores can go out of
order in many cases when dealing with devices (eg. WC memory).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agox86: fence oostores on 64-bit
Nick Piggin [Sat, 13 Oct 2007 01:06:00 +0000 (03:06 +0200)]
x86: fence oostores on 64-bit

movnt* instructions are not strongly ordered with respect to other stores,
so if we are to assume stores are strongly ordered in the rest of the 64
bit code, we must fence these off (see similar examples in 32 bit code).

[ The AMD memory ordering document seems to say that nontemporal stores can
  also pass earlier regular stores, so maybe we need sfences _before_
  movnt* everywhere too? ]

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoOnly enable BLOCK_COMPAT if COMPAT is needed
Linus Torvalds [Sat, 13 Oct 2007 00:58:36 +0000 (17:58 -0700)]
Only enable BLOCK_COMPAT if COMPAT is needed

IOW, it needs to depend on both CONFIG_BLOCK and CONFIG_COMPAT.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoMerge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik...
Linus Torvalds [Fri, 12 Oct 2007 23:16:41 +0000 (16:16 -0700)]
Merge branch 'upstream' of git://git./linux/kernel/git/jgarzik/libata-dev

* 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (119 commits)
  [libata] struct pci_dev related cleanups
  libata: use ata_exec_internal() for PMP register access
  libata: implement ATA_PFLAG_RESETTING
  libata: add @timeout to ata_exec_internal[_sg]()
  ahci: fix notification handling
  ahci: clean up PORT_IRQ_BAD_PMP enabling
  ahci: kill leftover from enabling NCQ over PMP
  libata: wrap schedule_timeout_uninterruptible() in loop
  libata: skip suppress reporting if ATA_EHI_QUIET
  libata: clear ehi description after initial host report
  pata_jmicron: match vendor and class code only
  libata: add ST9160821AS / 3.ALD to NCQ blacklist
  pata_acpi: ACPI driver support
  libata-core: Expose gtm methods for driver use
  libata: add HDT722516DLA380 to NCQ blacklist
  libata: blacklist NCQ on Seagate Barracuda ST380817AS
  [libata] Turn on ACPI by default
  libata_scsi: Fix ATAPI transfer lengths
  libata: correct handling of SRST reset sequences
  libata: Integrate ACPI-based PATA/SATA hotplug - version 5
  ...

17 years agoUpdate maintainers file
Andi Kleen [Fri, 12 Oct 2007 23:01:08 +0000 (01:01 +0200)]
Update maintainers file

Since there is no x86-64 architecture anymore it cannot be maintained.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
Linus Torvalds [Fri, 12 Oct 2007 22:50:23 +0000 (15:50 -0700)]
Merge /pub/scm/linux/kernel/git/gregkh/pci-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (37 commits)
  PCI: merge almost all of pci_32.h and pci_64.h together
  PCI: X86: Introduce and enable PCI domain support
  PCI: Add 'nodomains' boot option, and pci_domains_supported global
  PCI: modify PCI bridge control ISA flag for clarity
  PCI: use _CRS for PCI resource allocation
  PCI: avoid P2P prefetch window for expansion ROMs
  PCI: skip ISA ioresource alignment on some systems
  PCI: remove transparent bridge sizing
  pci: write file size to inode on proc bus file write
  pci: use size stored in proc_dir_entry for proc bus files
  pci: implement "pci=noaer"
  PCI: fix IDE legacy mode resources
  MSI: Use correct data offset for 32-bit MSI in read_msi_msg()
  PCI: Fix incorrect argument order to list_add_tail() in PCI dynamic ID code
  PCI: i386: Compaq EVO N800c needs PCI bus renumbering
  PCI: Remove no longer correct documentation regarding MSI vector assignment
  PCI: re-enable onboard sound on "MSI K8T Neo2-FIR"
  PCI: quirk_vt82c586_acpi: Omit reading PCI revision ID
  PCI: quirk amd_8131_mmrbc: Omit reading pci revision ID
  cpqphp: Use PCI_CLASS_REVISION instead of PCI_REVISION_ID for read
  ...

17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
Linus Torvalds [Fri, 12 Oct 2007 22:49:37 +0000 (15:49 -0700)]
Merge /pub/scm/linux/kernel/git/gregkh/driver-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (75 commits)
  PM: merge device power-management source files
  sysfs: add copyrights
  kobject: update the copyrights
  kset: add some kerneldoc to help describe what these strange things are
  Driver core: rename ktype_edd and ktype_efivar
  Driver core: rename ktype_driver
  Driver core: rename ktype_device
  Driver core: rename ktype_class
  driver core: remove subsystem_init()
  sysfs: move sysfs file poll implementation to sysfs_open_dirent
  sysfs: implement sysfs_open_dirent
  sysfs: move sysfs_dirent->s_children into sysfs_dirent->s_dir
  sysfs: make sysfs_root a regular directory dirent
  sysfs: open code sysfs_attach_dentry()
  sysfs: make s_elem an anonymous union
  sysfs: make bin attr open get active reference of parent too
  sysfs: kill unnecessary NULL pointer check in sysfs_release()
  sysfs: kill unnecessary sysfs_get() in open paths
  sysfs: reposition sysfs_dirent->s_mode.
  sysfs: kill sysfs_update_file()
  ...

17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6
Linus Torvalds [Fri, 12 Oct 2007 22:49:10 +0000 (15:49 -0700)]
Merge /pub/scm/linux/kernel/git/gregkh/usb-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6: (142 commits)
  USB: fix race in autosuspend reschedule
  atmel_usba_udc: Keep track of the device status
  USB: Nikon D40X unusual_devs entry
  USB: serial core should respect driver requirements
  USB: documentation for USB power management
  USB: skip autosuspended devices during system resume
  USB: mutual exclusion for EHCI init and port resets
  USB: allow usbstorage to have LUNS greater than 2Tb
  USB: Adding support for SHARP WS011SH to ipaq.c
  USB: add atmel_usba_udc driver
  USB: ohci SSB bus glue
  USB: ehci build fixes on au1xxx, ppc-soc
  USB: add runtime frame_no quirk for big-endian OHCI
  USB: funsoft: Fix termios
  USB: visor: termios bits
  USB: unusual_devs entry for Nikon DSC D2Xs
  USB: re-remove <linux/usb_sl811.h>
  USB: move <linux/usb_gadget.h> to <linux/usb/gadget.h>
  USB: Export URB statistics for powertop
  USB: serial gadget: Disable endpoints on unload
  ...

17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq
Linus Torvalds [Fri, 12 Oct 2007 22:42:01 +0000 (15:42 -0700)]
Merge /pub/scm/linux/kernel/git/davej/cpufreq

* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Don't take semaphore in cpufreq_quick_get()
  [CPUFREQ] Support different families in fid/did to frequency conversion
  [CPUFREQ] cpufreq_stats: misc cpuinit section annotations
  [CPUFREQ] implement !CONFIG_CPU_FREQ stub for  cpufreq_unregister_notifier()
  [CPUFREQ] mark hotplug notifier callback as __cpuinit
  [CPUFREQ] Only check for transition latency on problematic governors (kconfig fix)
  [CPUFREQ] allow ondemand and conservative cpufreq governors to be used as default
  [CPUFREQ] move policy's governor initialisation out of low-level drivers into cpufreq core
  [CPUFREQ] Longhaul - Add support for PM133 northbridge
  [CPUFREQ] x86: use num_online_nodes to get physical cpus numbers for

17 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86
Linus Torvalds [Fri, 12 Oct 2007 22:39:39 +0000 (15:39 -0700)]
Merge git://git./linux/kernel/git/tglx/linux-2.6-x86

* git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86: (40 commits)
  x86: HPET add another ICH7 PCI id
  x86: HPET force enable ICH5 suspend/resume fix
  x86: HPET force enable for ICH5
  x86: HPET try to activate force detected hpet
  x86: HPET force enable o ICH7 and later
  x86: HPET restructure hpet code for hpet force enable
  clock events: allow replacement of broadcast timer
  i386/x8664: cleanup the shared hpet code
  i386: Remove the useless #ifdef in i8253.h
  ACPI: remove the now unused ifdef code
  jiffies: remove unused macros
  x86_64: cleanup apic.c after clock events switch
  x86_64: remove now unused code
  x86: unify timex.h variants
  x86: kill 8253pit.h
  x86: disable apic timer for AMD C1E enabled CPUs
  x86: Fix irq0 / local apic timer accounting
  x86_64: convert to clock events
  x86_64: Add (not yet used) clock event functions
  x86_64: prepare idle loop for dynamic ticks
  ...

17 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfashe...
Linus Torvalds [Fri, 12 Oct 2007 22:04:00 +0000 (15:04 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/mfasheh/ocfs2

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (23 commits)
  ocfs2: Optionally return filldir errors
  ocfs2: Write support for directories with inline data
  ocfs2: Read support for directories with inline data
  ocfs2: Write support for inline data
  ocfs2: Read support for inline data
  ocfs2: Structure updates for inline data
  ocfs2: Cleanup dirent size check
  ocfs2: Rename cleanups
  ocfs2: Provide convenience function for ino lookup
  ocfs2: Implement ocfs2_empty_dir() as a caller of ocfs2_dir_foreach()
  ocfs2: Remove open coded readdir()
  ocfs2: Pass raw u64 to filldir
  ocfs2: Abstract out core dir listing functionality
  ocfs2: Move directory manipulation code into dir.c
  ocfs2: Small refactor of truncate zeroing code
  ocfs2: move nonsparse hole-filling into ocfs2_write_begin()
  ocfs2: Sync ocfs2_fs.h with ocfs2-tools
  [PATCH] fs/ocfs2/: removed unneeded initial value and function's return value
  ocfs2: Implement show_options()
  ocfs2: Clear slot map when umounting a local volume
  ...

17 years agoMerge branch 'isdn-cleanups' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik...
Linus Torvalds [Fri, 12 Oct 2007 22:03:35 +0000 (15:03 -0700)]
Merge branch 'isdn-cleanups' of /linux/kernel/git/jgarzik/misc-2.6

* 'isdn-cleanups' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6:
  [ISDN] HiSax diva: split setup into three smaller functions
  [ISDN] HiSax sedlbauer: move ISAPNP and PCI code into functions of their own
  [ISDN] HiSax elsa: split huge setup function into four smaller functions
  [ISDN] HiSax avm_pci: split setup into three smaller functions
  [ISDN] Remove CONFIG_PCI ifdefs from 100% PCI source code

17 years agoPCI: merge almost all of pci_32.h and pci_64.h together
Greg Kroah-Hartman [Fri, 12 Oct 2007 21:07:23 +0000 (14:07 -0700)]
PCI: merge almost all of pci_32.h and pci_64.h together

It was just duplicated code...

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
17 years agoPCI: X86: Introduce and enable PCI domain support
Jeff Garzik [Thu, 11 Oct 2007 20:58:30 +0000 (16:58 -0400)]
PCI: X86: Introduce and enable PCI domain support

* fix bug in pci_read() and pci_write() which prevented PCI domain
  support from working (hardcoded domain 0).

* unconditionally enable CONFIG_PCI_DOMAINS

* implement pci_domain_nr() and pci_proc_domain(), as required of
  all arches when CONFIG_PCI_DOMAINS is enabled.

* store domain in struct pci_sysdata, as assigned by ACPI

* support "pci=nodomains"

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
17 years agoPCI: Add 'nodomains' boot option, and pci_domains_supported global
Jeff Garzik [Thu, 11 Oct 2007 20:57:27 +0000 (16:57 -0400)]
PCI: Add 'nodomains' boot option, and pci_domains_supported global

* Introduce pci_domains_supported global, hardcoded to zero if
  !CONFIG_PCI_DOMAINS.

* Introduce 'nodomains' boot option, which clears pci_domains_supported
  on platforms that enable it by default (x86, x86-64, and others when
  they are converted to use this).

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
17 years agoPCI: modify PCI bridge control ISA flag for clarity
Gary Hade [Mon, 8 Oct 2007 23:24:16 +0000 (16:24 -0700)]
PCI: modify PCI bridge control ISA flag for clarity

Modify PCI Bridge Control ISA flag for clarity

This patch changes PCI_BRIDGE_CTL_NO_ISA to PCI_BRIDGE_CTL_ISA
and modifies it's clarifying comment and locations where used.
The change reduces the chance of future confusion since it makes
the set/unset meaning of the bit the same in both the bridge
control register and bridge_ctl field of the pci_bus struct.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
17 years agoPCI: use _CRS for PCI resource allocation
Gary Hade [Wed, 3 Oct 2007 22:56:51 +0000 (15:56 -0700)]
PCI: use _CRS for PCI resource allocation

Use _CRS for PCI resource allocation

This patch resolves an issue where incorrect PCI memory and i/o ranges
are being assigned to hotplugged PCI devices on some IBM systems.  The
resource mis-allocation not only makes the PCI device unuseable but
often makes the entire system unuseable due to resulting machine checks.

The hotplug capable PCI slots on the affected systems are not located
under a standard P2P bridge but are instead located under PCI root
bridges or subtractive decode P2P bridges.  For example, the IBM x3850
contains 2 hotplug capable PCI-X slots and 4 hotplug capable PCIe slots
with the PCI-X slots each located under a PCI root bridge and the PCIe
slots each located under a subtractive decode P2P bridge.

The current i386/x86_64 PCI resource allocation code does not use _CRS
returned resource information.  No other resource information source is
available for slots that are not below a standard P2P bridge so
incorrect ranges are being allocated from e820 hole causing the bad
result.

This patch causes the kernel to use _CRS returned resource info.  It is
roughly based on a change provided by Matthew Wilcox for the ia64 kernel
in 2005.  Due to possible buggy BIOS factor and possible yet to be
discovered kernel issues the function is disabled by default and can be
enabled with pci=use_crs.

Signed-off-by: Gary Hade <gary.hade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
17 years agoPCI: avoid P2P prefetch window for expansion ROMs
Gary Hade [Wed, 3 Oct 2007 22:56:30 +0000 (15:56 -0700)]
PCI: avoid P2P prefetch window for expansion ROMs

Avoid creating P2P prefetch window for expansion ROMs

Because of the future possibility that P2P prefetch windows will contain
address ranges above 4GB some BIOSes are providing space in the P2P
non-prefetch windows for expansion ROMs.  This is due to expansion ROM
BAR 32-bit limitation.  When expansion ROM BARs without BIOS assigned
address(es) are currently found behind a P2P bridge, the kernel attempts
to create a P2P prefetch window for them even though space for them has
already been provided in the non-prefetch window.  _CRS on some systems
with certain resource conservation conscious BIOSes may not provide the
extra 1MB or more memory resource needed for the expansion ROM motivated
prefetch window causing resource allocation errors.

This change corrects the problem by removing IORESOURCE_PREFETCH from
the expansion ROM flags initialization.  It also removes
IORESOURCE_CACHEABLE which seems inappropriate if only non-cacheable
memory is available.

Signed-off-by: Gary Hade <gary.hade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
17 years agoPCI: skip ISA ioresource alignment on some systems
Gary Hade [Wed, 3 Oct 2007 22:56:14 +0000 (15:56 -0700)]
PCI: skip ISA ioresource alignment on some systems

Skip ISA ioresource alignment on some systems

To conserve limited PCI i/o resource on some IBM multi-node systems, the
BIOS allocates (via _CRS) and expects the kernel to use addresses in
ranges currently excluded by pcibios_align_resource() [i386/pci/i386.c].
This change allows the kernel to use the currently excluded address
ranges on the IBM x3800, x3850, and x3950.

Signed-off-by: Gary Hade <gary.hade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
17 years agoPCI: remove transparent bridge sizing
Gary Hade [Wed, 3 Oct 2007 22:55:51 +0000 (15:55 -0700)]
PCI: remove transparent bridge sizing

Remove transparent bridge sizing.

Due to code in pci_read_bridge_bases() [drivers/pci/probe.c] the child
bus of a transparent bridge already has access to the parent bus
resources so transparent bridge sizing appears unnecessary.  The bridge
sizing includes alignment and granularity adjustments that can cause
significantly more memory to be reserved from the parant bus than
required by devices on the child bus and allotted by _CRS.

Signed-off-by: Gary Hade <gary.hade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
17 years agopci: write file size to inode on proc bus file write
David Rientjes [Thu, 27 Sep 2007 20:41:17 +0000 (13:41 -0700)]
pci: write file size to inode on proc bus file write

When a /proc/bus/pci file is written to, the size of that PCI device's
configuration space must be written to the inode.  Otherwise, it is
possible for the file to specify a size of 0 on stat if a task is holding
the same file open.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
17 years agopci: use size stored in proc_dir_entry for proc bus files
David Rientjes [Thu, 27 Sep 2007 20:41:16 +0000 (13:41 -0700)]
pci: use size stored in proc_dir_entry for proc bus files

On pci_proc_attach_device(), the size of the PCI configuration space is
stored in the proc_dir_entry as the size of the file.  Thus, the procfs
interface to PCI devices should use it instead of the device directly.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
17 years agopci: implement "pci=noaer"
Randy Dunlap [Fri, 5 Oct 2007 20:17:58 +0000 (13:17 -0700)]
pci: implement "pci=noaer"

For cases in which CONFIG_PCIEAER=y (such as distro kernels), allow users
to disable PCIE Advanced Error Reporting by using "pci=noaer" on the
kernel command line.

This can be used to work around hardware or (kernel) software problems.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
17 years agoPCI: fix IDE legacy mode resources
Yoichi Yuasa [Tue, 2 Oct 2007 21:19:23 +0000 (14:19 -0700)]
PCI: fix IDE legacy mode resources

I got the following error on MIPS Cobalt.

PCI: Unable to reserve I/O region #1:8@f00001f0 for device 0000:00:09.1
pata_via 0000:00:09.1: failed to request/iomap BARs for port 0 (errno=-16)
PCI: Unable to reserve I/O region #3:8@f0000170 for device 0000:00:09.1
pata_via 0000:00:09.1: failed to request/iomap BARs for port 1 (errno=-16)
pata_via 0000:00:09.1: no available native port

The legacy mode IDE resources set the following order.

pci_setup_device()
    Legacy mode ATA controllers have fixed addresses.
    IDE resources: 0x1F0-0x1F7, 0x3F6, 0x170-0x177, 0x376
    |
    V
pcibios_fixup_bus()
    MIPS Cobalt PCI bus regions have the -0x10000000 offset from PCI resources.
    pcibios_fixup_bus() fix PCI bus regions.
    0x1F0 - 0x10000000 = 0xF00001F0
    |
    V
ata_pci_init_one()
    PCI: Unable to reserve I/O region #1:8@f00001f0 for device 0000:00:09.1

In some architectures, PCI bus regions have the offset from PCI resources.
For this reason, pci_setup_device() should set PCI bus regions to
dev->resource[].

[akpm@linux-foundation.org: use struct initialiser]
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
17 years agoMSI: Use correct data offset for 32-bit MSI in read_msi_msg()
Roland Dreier [Wed, 3 Oct 2007 18:15:11 +0000 (11:15 -0700)]
MSI: Use correct data offset for 32-bit MSI in read_msi_msg()

While reading the MSI code trying to find a reason why MSI wouldn't
work for devices that have a 32-bit MSI address capability, I noticed
that read_msi_msg() seems to read the message data from the wrong
offset in this case.

Signed-off-by: Roland Dreier <roland@digitalvampire.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
17 years agoPCI: Fix incorrect argument order to list_add_tail() in PCI dynamic ID code
Michael Ellerman [Fri, 14 Sep 2007 05:33:13 +0000 (15:33 +1000)]
PCI: Fix incorrect argument order to list_add_tail() in PCI dynamic ID code

The code for dynamically assigning new ids to PCI drivers,
store_new_id(), calls list_add_tail() with the list head and new node
arguments in reversed order.

The result is that every new id written essentially overwrites the
previous list of ids.

Caught with the help of Rusty's "horribly bad" list_node patch:
 http://lkml.org/lkml/2007/6/10/10

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
17 years agoPCI: i386: Compaq EVO N800c needs PCI bus renumbering
Juha Laiho [Thu, 13 Sep 2007 18:21:34 +0000 (21:21 +0300)]
PCI: i386: Compaq EVO N800c needs PCI bus renumbering

Force PCI bus renumbering for Compaq EVO N800c laptop, in order to get
the cardbus slot recognised.

Signed-off-by: Juha Laiho <Juha.Laiho@iki.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>