firefly-linux-kernel-4.4.55.git
13 years agoKVM: x86 emulator: fix const value warning on i386 in svm insn RAX check
Randy Dunlap [Thu, 21 Apr 2011 16:09:22 +0000 (09:09 -0700)]
KVM: x86 emulator: fix const value warning on i386 in svm insn RAX check

arch/x86/kvm/emulate.c:2598: warning: integer constant is too large for 'long' type

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: avoid calling wbinvd() macro
Clemens Noss [Thu, 21 Apr 2011 19:16:05 +0000 (21:16 +0200)]
KVM: x86 emulator: avoid calling wbinvd() macro

Commit 0b56652e33c72092956c651ab6ceb9f0ad081153 fails to build:

  CC [M]  arch/x86/kvm/emulate.o
arch/x86/kvm/emulate.c: In function 'x86_emulate_insn':
arch/x86/kvm/emulate.c:4095:25: error: macro "wbinvd" passed 1 arguments, but takes just 0
arch/x86/kvm/emulate.c:4095:3: warning: statement with no effect
make[2]: *** [arch/x86/kvm/emulate.o] Error 1
make[1]: *** [arch/x86/kvm] Error 2
make: *** [arch/x86] Error 2

Work around this for now.

Signed-off-by: Clemens Noss <cnoss@gmx.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: ioapic: Fix an error field reference
Liu Yuan [Thu, 21 Apr 2011 06:53:57 +0000 (14:53 +0800)]
KVM: ioapic: Fix an error field reference

Function ioapic_debug() in the ioapic_deliver() misnames
one filed by reference. This patch correct it.

Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: Make cmpxchg_gpte aware of nesting too
Roedel, Joerg [Wed, 20 Apr 2011 13:33:16 +0000 (15:33 +0200)]
KVM: MMU: Make cmpxchg_gpte aware of nesting too

This patch makes the cmpxchg_gpte() function aware of the
difference between l1-gfns and l2-gfns when nested
virtualization is in use.  This fixes a potential
data-corruption problem in the l1-guest and makes the code
work correct (at least as correct as the hardware which is
emulated in this code) again.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: drop x86_emulate_ctxt::vcpu
Avi Kivity [Wed, 20 Apr 2011 12:56:20 +0000 (15:56 +0300)]
KVM: x86 emulator: drop x86_emulate_ctxt::vcpu

No longer used.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Avoid using x86_emulate_ctxt.vcpu
Avi Kivity [Wed, 20 Apr 2011 12:55:40 +0000 (15:55 +0300)]
KVM: Avoid using x86_emulate_ctxt.vcpu

We can use container_of() instead.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: add new ->wbinvd() callback
Avi Kivity [Wed, 20 Apr 2011 12:53:23 +0000 (15:53 +0300)]
KVM: x86 emulator: add new ->wbinvd() callback

Instead of calling kvm_emulate_wbinvd() directly.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: add ->fix_hypercall() callback
Avi Kivity [Wed, 20 Apr 2011 12:47:13 +0000 (15:47 +0300)]
KVM: x86 emulator: add ->fix_hypercall() callback

Artificial, but needed to remove direct calls to KVM.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: add new ->halt() callback
Avi Kivity [Wed, 20 Apr 2011 12:43:05 +0000 (15:43 +0300)]
KVM: x86 emulator: add new ->halt() callback

Instead of reaching into vcpu internals.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: make emulate_invlpg() an emulator callback
Avi Kivity [Wed, 20 Apr 2011 12:38:44 +0000 (15:38 +0300)]
KVM: x86 emulator: make emulate_invlpg() an emulator callback

Removing direct calls to KVM.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: emulate CLTS internally
Avi Kivity [Wed, 20 Apr 2011 12:32:49 +0000 (15:32 +0300)]
KVM: x86 emulator: emulate CLTS internally

Avoid using ctxt->vcpu; we can do everything with ->get_cr() and ->set_cr().

A side effect is that we no longer activate the fpu on emulated CLTS; but that
should be very rare.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Replace calls to is_pae() and is_paging with ->get_cr()
Avi Kivity [Wed, 20 Apr 2011 12:24:32 +0000 (15:24 +0300)]
KVM: x86 emulator: Replace calls to is_pae() and is_paging with ->get_cr()

Avoid use of ctxt->vcpu.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: drop use of is_long_mode()
Avi Kivity [Wed, 20 Apr 2011 12:21:35 +0000 (15:21 +0300)]
KVM: x86 emulator: drop use of is_long_mode()

Requires ctxt->vcpu, which is to be abolished.  Replace with open calls
to get_msr().

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: add and use new callbacks set_idt(), set_gdt()
Avi Kivity [Wed, 20 Apr 2011 12:12:00 +0000 (15:12 +0300)]
KVM: x86 emulator: add and use new callbacks set_idt(), set_gdt()

Replacing direct calls to realmode_lgdt(), realmode_lidt().

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: avoid using ctxt->vcpu in check_perm() callbacks
Avi Kivity [Wed, 20 Apr 2011 12:01:23 +0000 (15:01 +0300)]
KVM: x86 emulator: avoid using ctxt->vcpu in check_perm() callbacks

Unneeded for register access.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: drop vcpu argument from intercept callback
Avi Kivity [Wed, 20 Apr 2011 10:37:53 +0000 (13:37 +0300)]
KVM: x86 emulator: drop vcpu argument from intercept callback

Making the emulator caller agnostic.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: drop vcpu argument from cr/dr/cpl/msr callbacks
Avi Kivity [Wed, 20 Apr 2011 10:37:53 +0000 (13:37 +0300)]
KVM: x86 emulator: drop vcpu argument from cr/dr/cpl/msr callbacks

Making the emulator caller agnostic.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: drop vcpu argument from segment/gdt/idt callbacks
Avi Kivity [Wed, 20 Apr 2011 10:37:53 +0000 (13:37 +0300)]
KVM: x86 emulator: drop vcpu argument from segment/gdt/idt callbacks

Making the emulator caller agnostic.

[Takuya Yoshikawa: fix typo leading to LDT failures]

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: drop vcpu argument from pio callbacks
Avi Kivity [Wed, 20 Apr 2011 10:37:53 +0000 (13:37 +0300)]
KVM: x86 emulator: drop vcpu argument from pio callbacks

Making the emulator caller agnostic.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: drop vcpu argument from memory read/write callbacks
Avi Kivity [Wed, 20 Apr 2011 10:37:53 +0000 (13:37 +0300)]
KVM: x86 emulator: drop vcpu argument from memory read/write callbacks

Making the emulator caller agnostic.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: whitespace cleanups
Avi Kivity [Wed, 20 Apr 2011 10:12:27 +0000 (13:12 +0300)]
KVM: x86 emulator: whitespace cleanups

Clean up lines longer than 80 columns.  No code changes.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: emulator: Use linearize() when fetching instructions
Nelson Elhage [Mon, 18 Apr 2011 16:05:53 +0000 (12:05 -0400)]
KVM: emulator: Use linearize() when fetching instructions

Since segments need to be handled slightly differently when fetching
instructions, we add a __linearize helper that accepts a new 'fetch' boolean.

[avi: fix oops caused by wrong segmented_address initialization order]

Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: X86: Update last_guest_tsc in vcpu_put
Joerg Roedel [Mon, 18 Apr 2011 09:42:53 +0000 (11:42 +0200)]
KVM: X86: Update last_guest_tsc in vcpu_put

The last_guest_tsc is used in vcpu_load to adjust the
tsc_offset since tsc-scaling is merged. So the
last_guest_tsc needs to be updated in vcpu_put instead of
the the last_host_tsc. This is fixed with this patch.

Reported-by: Jan Kiszka <jan.kiszka@web.de>
Tested-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Fix nested sel_cr0 intercept path with decode-assists
Joerg Roedel [Mon, 18 Apr 2011 09:42:52 +0000 (11:42 +0200)]
KVM: SVM: Fix nested sel_cr0 intercept path with decode-assists

This patch fixes a bug in the nested-svm path when
decode-assists is available on the machine. After a
selective-cr0 intercept is detected the rip is advanced
unconditionally. This causes the l1-guest to continue
running with an l2-rip.
This bug was with the sel_cr0 unit-test on decode-assists
capable hardware.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Handle wraparound in (cs_base + offset) when fetching insns
Nelson Elhage [Wed, 13 Apr 2011 15:44:13 +0000 (11:44 -0400)]
KVM: x86 emulator: Handle wraparound in (cs_base + offset) when fetching insns

Currently, setting a large (i.e. negative) base address for %cs does not work on
a 64-bit host. The "JOS" teaching operating system, used by MIT and other
universities, relies on such segments while bootstrapping its way to full
virtual memory management.

Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: remove useless function declaration kvm_inject_pit_timer_irqs()
Duan Jiong [Mon, 11 Apr 2011 04:44:06 +0000 (12:44 +0800)]
KVM: remove useless function declaration kvm_inject_pit_timer_irqs()

Just remove useless function define kvm_inject_pit_timer_irqs() from
file arch/x86/kvm/i8254.h

Signed-off-by:Duan Jiong<djduanjiong@gmail.com>

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: remove useless function declarations from file arch/x86/kvm/irq.h
Duan Jiong [Mon, 11 Apr 2011 04:56:01 +0000 (12:56 +0800)]
KVM: remove useless function declarations from file arch/x86/kvm/irq.h

Just remove useless function define kvm_pic_clear_isr_ack() and
pit_has_pending_timer()

Signed-off-by: Duan Jiong<djduanjiong@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Fix off by one in kvm_for_each_vcpu iteration
Jeff Mahoney [Wed, 13 Apr 2011 01:30:17 +0000 (21:30 -0400)]
KVM: Fix off by one in kvm_for_each_vcpu iteration

This patch avoids gcc issuing the following warning when KVM_MAX_VCPUS=1:
warning: array subscript is above array bounds

kvm_for_each_vcpu currently checks to see if the index for the vcpu is
valid /after/ loading it. We don't run into problems because the address
is still inside the enclosing struct kvm and we never deference or write
to it, so this isn't a security issue.

The warning occurs when KVM_MAX_VCPUS=1 because the increment portion of
the loop will *always* cause the loop to load an invalid location since
++idx will always be > 0.

This patch moves the load so that the check occurs before the load and
we don't run into the compiler warning.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: fix push of wrong eip when doing softint
Serge E. Hallyn [Wed, 13 Apr 2011 14:12:54 +0000 (09:12 -0500)]
KVM: fix push of wrong eip when doing softint

When doing a soft int, we need to bump eip before pushing it to
the stack.  Otherwise we'll do the int a second time.

[apw@canonical.com: merged eip update as per Jan's recommendation.]
Signed-off-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Use em_push() instead of emulate_push()
Takuya Yoshikawa [Tue, 12 Apr 2011 15:31:23 +0000 (00:31 +0900)]
KVM: x86 emulator: Use em_push() instead of emulate_push()

em_push() is a simple wrapper of emulate_push().  So this patch replaces
emulate_push() with em_push() and removes the unnecessary former.

In addition, the unused ops arguments are removed from emulate_pusha()
and emulate_grp45().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Make emulate_push() store the value directly
Takuya Yoshikawa [Tue, 12 Apr 2011 15:29:09 +0000 (00:29 +0900)]
KVM: x86 emulator: Make emulate_push() store the value directly

PUSH emulation stores the value by calling writeback() after setting
the dst operand appropriately in emulate_push().

This writeback() using dst is not needed at all because we know the
target is the stack.  So this patch makes emulate_push() call, newly
introduced, segmented_write() directly.

By this, many inlined writeback()'s are removed.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Disable writeback for CMP emulation
Takuya Yoshikawa [Tue, 12 Apr 2011 15:24:55 +0000 (00:24 +0900)]
KVM: x86 emulator: Disable writeback for CMP emulation

This stops "CMP r/m, reg" to write back the data into memory.
Pointed out by Avi.

The writeback suppression now covers CMP, CMPS, SCAS.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Ensure that vmx_create_vcpu always returns proper error
Jan Kiszka [Tue, 12 Apr 2011 23:27:55 +0000 (01:27 +0200)]
KVM: VMX: Ensure that vmx_create_vcpu always returns proper error

In case certain allocations fail, vmx_create_vcpu may return 0 as error
instead of a negative value encoded via ERR_PTR. This causes a NULL
pointer dereferencing later on in kvm_vm_ioctl_vcpu_create.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
Gleb Natapov [Thu, 31 Mar 2011 10:06:41 +0000 (12:06 +0200)]
KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

Currently we sync registers back and forth before/after exiting
to userspace for IO, but during IO device model shouldn't need to
read/write the registers, so we can as well skip those sync points. The
only exaception is broken vmware backdor interface. The new code sync
registers content during IO only if registers are read from/written to
by userspace in the middle of the IO operation and this almost never
happens in practise.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: implement segment permission checks
Avi Kivity [Sun, 3 Apr 2011 09:32:09 +0000 (12:32 +0300)]
KVM: x86 emulator: implement segment permission checks

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: move desc_limit_scaled()
Avi Kivity [Sun, 3 Apr 2011 11:08:51 +0000 (14:08 +0300)]
KVM: x86 emulator: move desc_limit_scaled()

For reuse later.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: move linearize() downwards
Avi Kivity [Sun, 3 Apr 2011 09:33:12 +0000 (12:33 +0300)]
KVM: x86 emulator: move linearize() downwards

So it can call emulate_gp() without forward declarations.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: pass access size and read/write intent to linearize()
Avi Kivity [Sun, 3 Apr 2011 08:31:19 +0000 (11:31 +0300)]
KVM: x86 emulator: pass access size and read/write intent to linearize()

Needed for segment read/write checks.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: change address linearization to return an error code
Avi Kivity [Thu, 31 Mar 2011 16:54:30 +0000 (18:54 +0200)]
KVM: x86 emulator: change address linearization to return an error code

Preparing to add segment checks.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: move invlpg emulation into a function
Avi Kivity [Thu, 31 Mar 2011 16:48:09 +0000 (18:48 +0200)]
KVM: x86 emulator: move invlpg emulation into a function

It's going to get more complicated soon.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Add helpers for memory access using segmented addresses
Avi Kivity [Thu, 31 Mar 2011 14:52:26 +0000 (16:52 +0200)]
KVM: x86 emulator: Add helpers for memory access using segmented addresses

Will help later adding proper segment checks.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Fix fault-rip on vmsave/vmload emulation
Joerg Roedel [Wed, 6 Apr 2011 10:30:03 +0000 (12:30 +0200)]
KVM: SVM: Fix fault-rip on vmsave/vmload emulation

When the emulation of vmload or vmsave fails because the
guest passed an unsupported physical address it gets an #GP
with rip pointing to the instruction after vmsave/vmload.
This is a bug and fixed by this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: X86: Implement userspace interface to set virtual_tsc_khz
Joerg Roedel [Fri, 25 Mar 2011 08:44:51 +0000 (09:44 +0100)]
KVM: X86: Implement userspace interface to set virtual_tsc_khz

This patch implements two new vm-ioctls to get and set the
virtual_tsc_khz if the machine supports tsc-scaling. Setting
the tsc-frequency is only possible before userspace creates
any vcpu.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: X86: Delegate tsc-offset calculation to architecture code
Joerg Roedel [Fri, 25 Mar 2011 08:44:50 +0000 (09:44 +0100)]
KVM: X86: Delegate tsc-offset calculation to architecture code

With TSC scaling in SVM the tsc-offset needs to be
calculated differently. This patch propagates this
calculation into the architecture specific modules so that
this complexity can be handled there.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: X86: Implement call-back to propagate virtual_tsc_khz
Joerg Roedel [Fri, 25 Mar 2011 08:44:49 +0000 (09:44 +0100)]
KVM: X86: Implement call-back to propagate virtual_tsc_khz

This patch implements a call-back into the architecture code
to allow the propagation of changes to the virtual tsc_khz
of the vcpu.
On SVM it updates the tsc_ratio variable, on VMX it does
nothing.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: X86: Make tsc_delta calculation a function of guest tsc
Joerg Roedel [Fri, 25 Mar 2011 08:44:48 +0000 (09:44 +0100)]
KVM: X86: Make tsc_delta calculation a function of guest tsc

The calculation of the tsc_delta value to ensure a
forward-going tsc for the guest is a function of the
host-tsc. This works as long as the guests tsc_khz is equal
to the hosts tsc_khz. With tsc-scaling hardware support this
is not longer true and the tsc_delta needs to be calculated
using guest_tsc values.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: X86: Let kvm-clock report the right tsc frequency
Joerg Roedel [Fri, 25 Mar 2011 08:44:47 +0000 (09:44 +0100)]
KVM: X86: Let kvm-clock report the right tsc frequency

This patch changes the kvm_guest_time_update function to use
TSC frequency the guest actually has for updating its clock.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Implement infrastructure for TSC_RATE_MSR
Joerg Roedel [Fri, 25 Mar 2011 08:44:46 +0000 (09:44 +0100)]
KVM: SVM: Implement infrastructure for TSC_RATE_MSR

This patch enhances the kvm_amd module with functions to
support the TSC_RATE_MSR which can be used to set a given
tsc frequency for the guest vcpu.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Drop EFER.SVME requirement from VMMCALL
Avi Kivity [Tue, 5 Apr 2011 13:25:20 +0000 (16:25 +0300)]
KVM: x86 emulator: Drop EFER.SVME requirement from VMMCALL

VMMCALL requires EFER.SVME to be enabled in the host, not in the guest, which
is what check_svme() checks.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Re-add VendorSpecific tag to VMMCALL insn
Avi Kivity [Tue, 5 Apr 2011 13:21:58 +0000 (16:21 +0300)]
KVM: x86 emulator: Re-add VendorSpecific tag to VMMCALL insn

VMMCALL needs the VendorSpecific tag so that #UD emulation
(called if a guest running on AMD was migrated to an Intel host)
is allowed to process the instruction.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: PPC: Fix issue clearing exit timing counters
Bharat Bhushan [Fri, 25 Mar 2011 05:02:13 +0000 (10:32 +0530)]
KVM: PPC: Fix issue clearing exit timing counters

Following dump is observed on host when clearing the exit timing counters

[root@p1021mds kvm]# echo -n 'c' > vm1200_vcpu0_timing
INFO: task echo:1276 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
echo          D 0ff5bf94     0  1276   1190 0x00000000
Call Trace:
[c2157e40] [c0007908] __switch_to+0x9c/0xc4
[c2157e50] [c040293c] schedule+0x1b4/0x3bc
[c2157e90] [c04032dc] __mutex_lock_slowpath+0x74/0xc0
[c2157ec0] [c00369e4] kvmppc_init_timing_stats+0x20/0xb8
[c2157ed0] [c0036b00] kvmppc_exit_timing_write+0x84/0x98
[c2157ef0] [c00b9f90] vfs_write+0xc0/0x16c
[c2157f10] [c00ba284] sys_write+0x4c/0x90
[c2157f40] [c000e320] ret_from_syscall+0x0/0x3c

        The vcpu->mutex is used by kvm_ioctl_* (KVM_RUN etc) and same was
used when clearing the stats (in kvmppc_init_timing_stats()). What happens
is that when the guest is idle then it held the vcpu->mutx. While the
exiting timing process waits for guest to release the vcpu->mutex and
a hang state is reached.

        Now using seprate lock for exit timing stats.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: remove mmu_seq verification on pte update path
Xiao Guangrong [Mon, 28 Mar 2011 02:29:27 +0000 (10:29 +0800)]
KVM: MMU: remove mmu_seq verification on pte update path

The mmu_seq verification can be removed since we get the pfn in the
protection of mmu_lock.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: do not open code return values from the emulator
Gleb Natapov [Mon, 28 Mar 2011 14:57:49 +0000 (16:57 +0200)]
KVM: x86 emulator: do not open code return values from the emulator

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Remove base_addresss in kvm_pit since it is unused
Justin P. Mattock [Wed, 30 Mar 2011 16:54:47 +0000 (09:54 -0700)]
KVM: Remove base_addresss in kvm_pit since it is unused

The patch below removes unsigned long base_addresss; in i8254.h
since it is unused.

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Remove nested sel_cr0_write handling code
Joerg Roedel [Mon, 4 Apr 2011 10:39:36 +0000 (12:39 +0200)]
KVM: SVM: Remove nested sel_cr0_write handling code

This patch removes all the old code which handled the nested
selective cr0 write intercepts. This code was only in place
as a work-around until the instruction emulator is capable
of doing the same. This is the case with this patch-set and
so the code can be removed.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Add checks for IO instructions
Joerg Roedel [Mon, 4 Apr 2011 10:39:35 +0000 (12:39 +0200)]
KVM: SVM: Add checks for IO instructions

This patch adds code to check for IOIO intercepts on
instructions decoded by the KVM instruction emulator.

[avi: fix build error due to missing #define D2bvIP]

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Add intercept checks for one-byte instructions
Joerg Roedel [Mon, 4 Apr 2011 10:39:34 +0000 (12:39 +0200)]
KVM: SVM: Add intercept checks for one-byte instructions

This patch add intercept checks for emulated one-byte
instructions to the KVM instruction emulation path.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Add intercept checks for remaining twobyte instructions
Joerg Roedel [Mon, 4 Apr 2011 10:39:33 +0000 (12:39 +0200)]
KVM: SVM: Add intercept checks for remaining twobyte instructions

This patch adds intercepts checks for the remaining twobyte
instructions to the KVM instruction emulator.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Add intercept checks for remaining group7 instructions
Joerg Roedel [Mon, 4 Apr 2011 10:39:32 +0000 (12:39 +0200)]
KVM: SVM: Add intercept checks for remaining group7 instructions

This patch implements the emulator intercept checks for the
RDTSCP, MONITOR, and MWAIT instructions.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Add intercept checks for SVM instructions
Joerg Roedel [Mon, 4 Apr 2011 10:39:31 +0000 (12:39 +0200)]
KVM: SVM: Add intercept checks for SVM instructions

This patch adds the necessary code changes in the
instruction emulator and the extensions to svm.c to
implement intercept checks for the svm instructions.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Add intercept checks for descriptor table accesses
Joerg Roedel [Mon, 4 Apr 2011 10:39:30 +0000 (12:39 +0200)]
KVM: SVM: Add intercept checks for descriptor table accesses

This patch add intercept checks into the KVM instruction
emulator to check for the 8 instructions that access the
descriptor table addresses.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Add intercept check for accessing dr registers
Joerg Roedel [Mon, 4 Apr 2011 10:39:29 +0000 (12:39 +0200)]
KVM: SVM: Add intercept check for accessing dr registers

This patch adds the intercept checks for instruction
accessing the debug registers.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Add intercept check for emulated cr accesses
Joerg Roedel [Mon, 4 Apr 2011 10:39:28 +0000 (12:39 +0200)]
KVM: SVM: Add intercept check for emulated cr accesses

This patch adds all necessary intercept checks for
instructions that access the crX registers.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: Add x86 callback for intercept check
Joerg Roedel [Mon, 4 Apr 2011 10:39:27 +0000 (12:39 +0200)]
KVM: x86: Add x86 callback for intercept check

This patch adds a callback into kvm_x86_ops so that svm and
vmx code can do intercept checks on emulated instructions.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Add flag to check for protected mode instructions
Joerg Roedel [Mon, 4 Apr 2011 10:39:26 +0000 (12:39 +0200)]
KVM: x86 emulator: Add flag to check for protected mode instructions

This patch adds a flag for the opcoded to tag instruction
which are only recognized in protected mode. The necessary
check is added too.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Add check_perm callback
Joerg Roedel [Mon, 4 Apr 2011 10:39:25 +0000 (12:39 +0200)]
KVM: x86 emulator: Add check_perm callback

This patch adds a check_perm callback for each opcode into
the instruction emulator. This will be used to do all
necessary permission checks on instructions before checking
whether they are intercepted or not.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Don't write-back cpu-state on X86EMUL_INTERCEPTED
Joerg Roedel [Mon, 4 Apr 2011 10:39:24 +0000 (12:39 +0200)]
KVM: x86 emulator: Don't write-back cpu-state on X86EMUL_INTERCEPTED

This patch prevents the changed CPU state to be written back
when the emulator detected that the instruction was
intercepted by the guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: add SVM intercepts
Avi Kivity [Mon, 4 Apr 2011 10:39:23 +0000 (12:39 +0200)]
KVM: x86 emulator: add SVM intercepts

Add intercept codes for instructions defined by SVM as
interceptable.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: add framework for instruction intercepts
Avi Kivity [Mon, 4 Apr 2011 10:39:22 +0000 (12:39 +0200)]
KVM: x86 emulator: add framework for instruction intercepts

When running in guest mode, certain instructions can be intercepted by
hardware.  This also holds for nested guests running on emulated
virtualization hardware, in particular instructions emulated by kvm
itself.

This patch adds a framework for intercepting instructions.  If an
instruction is marked for interception, and if we're running in guest
mode, a callback is called to check whether an intercept is needed or
not.  The callback is called at three points in time: immediately after
beginning execution, after checking privilge exceptions, and after
checking memory exception.  This suits the different interception points
defined for different instructions and for the various virtualization
instruction sets.

In addition, a new X86EMUL_INTERCEPT is defined, which any callback or
memory access may define, allowing the more complicated intercepts to be
implemented in existing callbacks.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: implement movdqu instruction (f3 0f 6f, f3 0f 7f)
Avi Kivity [Wed, 20 Jan 2010 16:09:23 +0000 (18:09 +0200)]
KVM: x86 emulator: implement movdqu instruction (f3 0f 6f, f3 0f 7f)

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: SSE support
Avi Kivity [Tue, 29 Mar 2011 09:41:27 +0000 (11:41 +0200)]
KVM: x86 emulator: SSE support

Add support for marking an instruction as SSE, switching registers used
to the SSE register file.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Specialize decoding for insns with 66/f2/f3 prefixes
Avi Kivity [Tue, 29 Mar 2011 09:34:38 +0000 (11:34 +0200)]
KVM: x86 emulator: Specialize decoding for insns with 66/f2/f3 prefixes

Most SIMD instructions use the 66/f2/f3 prefixes to distinguish between
different variants of the same instruction.  Usually the encoding is quite
regular, but in some cases (including non-SIMD instructions) the prefixes
generate very different instructions.  Examples include XCHG/PAUSE,
MOVQ/MOVDQA/MOVDQU, and MOVBE/CRC32.

Allow the emulator to handle these special cases by splitting such opcodes
into groups, with different decode flags and execution functions for different
prefixes.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: define callbacks for using the guest fpu within the emulator
Avi Kivity [Mon, 28 Mar 2011 14:53:59 +0000 (16:53 +0200)]
KVM: x86 emulator: define callbacks for using the guest fpu within the emulator

Needed for emulating fpu instructions.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: do not munge rep prefix
Avi Kivity [Wed, 20 Jan 2010 14:00:35 +0000 (16:00 +0200)]
KVM: x86 emulator: do not munge rep prefix

Currently we store a rep prefix as 1 or 2 depending on whether it is a REPE or
REPNE.  Since sse instructions depend on the prefix value, store it as the
original opcode to simplify things further on.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: 16-byte mmio support
Avi Kivity [Wed, 20 Jan 2010 10:01:20 +0000 (12:01 +0200)]
KVM: 16-byte mmio support

Since sse instructions can issue 16-byte mmios, we need to support them.  We
can't increase the kvm_run mmio buffer size to 16 bytes without breaking
compatibility, so instead we break the large mmios into two smaller 8-byte
ones.  Since the bus is 64-bit we aren't breaking any atomicity guarantees.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Split mmio completion into a function
Avi Kivity [Tue, 19 Jan 2010 12:20:10 +0000 (14:20 +0200)]
KVM: Split mmio completion into a function

Make room for sse mmio completions.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: extend in-kernel mmio to handle >8 byte transactions
Avi Kivity [Tue, 19 Jan 2010 10:51:22 +0000 (12:51 +0200)]
KVM: extend in-kernel mmio to handle >8 byte transactions

Needed for coalesced mmio using sse.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: better fix for race between nmi injection and enabling nmi window
Gleb Natapov [Fri, 1 Apr 2011 14:26:29 +0000 (11:26 -0300)]
KVM: x86: better fix for race between nmi injection and enabling nmi window

Fix race between nmi injection and enabling nmi window in a simpler way.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoRevert "KVM: Fix race between nmi injection and enabling nmi window"
Marcelo Tosatti [Fri, 1 Apr 2011 14:25:03 +0000 (11:25 -0300)]
Revert "KVM: Fix race between nmi injection and enabling nmi window"

This reverts commit f86368493ec038218e8663cc1b6e5393cd8e008a.

Simpler fix to follow.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: expose async pf through our standard mechanism
Glauber Costa [Wed, 23 Mar 2011 16:40:42 +0000 (13:40 -0300)]
KVM: expose async pf through our standard mechanism

As Avi recently mentioned, the new standard mechanism for exposing features
is KVM_GET_SUPPORTED_CPUID, not spamming CAPs. For some reason async pf
missed that.

So expose async_pf here.

Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Gleb Natapov <gleb@redhat.com>
CC: Avi Kivity <avi@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: simplify NMI mask management
Avi Kivity [Wed, 23 Mar 2011 13:02:47 +0000 (15:02 +0200)]
KVM: VMX: simplify NMI mask management

Use vmx_set_nmi_mask() instead of open-coding management of
the hardware bit and the software hint (nmi_known_unmasked).

There's a slight change of behaviour when running without
hardware virtual NMI support - we now clear the NMI mask if
NMI delivery faulted in that case as well.  This improves
emulation accuracy.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Remove unused svm_features
Jan Kiszka [Thu, 24 Mar 2011 08:45:10 +0000 (09:45 +0100)]
KVM: SVM: Remove unused svm_features

We use boot_cpu_has now.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Use cached VM_EXIT_INTR_INFO in handle_exception
Avi Kivity [Mon, 7 Mar 2011 15:39:45 +0000 (17:39 +0200)]
KVM: VMX: Use cached VM_EXIT_INTR_INFO in handle_exception

vmx_complete_atomic_exit() cached it for us, so we can use it here.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Don't VMREAD VM_EXIT_INTR_INFO unconditionally
Avi Kivity [Mon, 7 Mar 2011 15:37:37 +0000 (17:37 +0200)]
KVM: VMX: Don't VMREAD VM_EXIT_INTR_INFO unconditionally

Only read it if we're going to use it later.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Refactor vmx_complete_atomic_exit()
Avi Kivity [Mon, 7 Mar 2011 15:24:54 +0000 (17:24 +0200)]
KVM: VMX: Refactor vmx_complete_atomic_exit()

Move the exit reason checks to the front of the function, for early
exit in the common case.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Qualify check for host NMI
Avi Kivity [Mon, 7 Mar 2011 15:20:29 +0000 (17:20 +0200)]
KVM: VMX: Qualify check for host NMI

Check for the exit reason first; this allows us, later,
to avoid a VMREAD for VM_EXIT_INTR_INFO_FIELD.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Avoid vmx_recover_nmi_blocking() when unneeded
Avi Kivity [Mon, 7 Mar 2011 14:52:07 +0000 (16:52 +0200)]
KVM: VMX: Avoid vmx_recover_nmi_blocking() when unneeded

When we haven't injected an interrupt, we don't need to recover
the nmi blocking state (since the guest can't set it by itself).
This allows us to avoid a VMREAD later on.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Cache cpl
Avi Kivity [Mon, 7 Mar 2011 13:26:44 +0000 (15:26 +0200)]
KVM: VMX: Cache cpl

We may read the cpl quite often in the same vmexit (instruction privilege
check, memory access checks for instruction and operands), so we gain
a bit if we cache the value.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Optimize vmx_get_cpl()
Avi Kivity [Mon, 7 Mar 2011 12:54:28 +0000 (14:54 +0200)]
KVM: VMX: Optimize vmx_get_cpl()

In long mode, vm86 mode is disallowed, so we need not check for
it.  Reading rflags.vm may require a VMREAD, so it is expensive.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Optimize vmx_get_rflags()
Avi Kivity [Mon, 7 Mar 2011 10:51:22 +0000 (12:51 +0200)]
KVM: VMX: Optimize vmx_get_rflags()

If called several times within the same exit, return cached results.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Use kvm_get_rflags() and kvm_set_rflags() instead of the raw versions
Avi Kivity [Mon, 2 Aug 2010 12:30:20 +0000 (15:30 +0300)]
KVM: Use kvm_get_rflags() and kvm_set_rflags() instead of the raw versions

Some rflags bits are owned by the host, not guest, so we need to use
kvm_get_rflags() to strip those bits away or kvm_set_rflags() to add them
back.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: cleanup memslot_id function
Xiao Guangrong [Wed, 9 Mar 2011 07:41:59 +0000 (15:41 +0800)]
KVM: cleanup memslot_id function

We can get memslot id from memslot->id directly

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Wed, 11 May 2011 00:39:01 +0000 (17:39 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
  slcan: fix ldisc->open retval
  net/usb: mark LG VL600 LTE modem ethernet interface as WWAN
  xfrm: Don't allow esn with disabled anti replay detection
  xfrm: Assign the inner mode output function to the dst entry
  net: dev_close() should check IFF_UP
  vlan: fix GVRP at dismantle time
  netfilter: revert a2361c8735e07322023aedc36e4938b35af31eb0
  netfilter: IPv6: fix DSCP mangle code
  netfilter: IPv6: initialize TOS field in REJECT target module
  IPVS: init and cleanup restructuring
  IPVS: Change of socket usage to enable name space exit.
  netfilter: ebtables: only call xt_compat_add_offset once per rule
  netfilter: fix ebtables compat support
  netfilter: ctnetlink: fix timestamp support for new conntracks
  pch_gbe: support ML7223 IOH
  PCH_GbE : Fixed the issue of checksum judgment
  PCH_GbE : Fixed the issue of collision detection
  NET: slip, fix ldisc->open retval
  be2net: Fixed bugs related to PVID.
  ehea: fix wrongly reported speed and port
  ...

13 years agoslub: Revert "[PARISC] slub: fix panic with DISCONTIGMEM"
David Rientjes [Wed, 11 May 2011 00:08:54 +0000 (17:08 -0700)]
slub: Revert "[PARISC] slub: fix panic with DISCONTIGMEM"

This reverts commit 4a5fa3590f09, which did not allow SLUB to be used
on architectures that use DISCONTIGMEM without compiling NUMA support
without CONFIG_BROKEN also set.

The slub panic that it was intended to prevent is addressed by
d9b41e0b54fd ("[PARISC] set memory ranges in N_NORMAL_MEMORY when
onlined") on parisc so there is no further slub issues with such a
configuration.

The reverts allows SLUB now to be used on such architectures since
there haven't been any reports of additional errors.

Cc: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge branch 'pablo/nf-2.6-updates' of git://1984.lsi.us.es/net-2.6
David S. Miller [Tue, 10 May 2011 22:04:35 +0000 (15:04 -0700)]
Merge branch 'pablo/nf-2.6-updates' of git://1984.lsi.us.es/net-2.6

13 years agoslcan: fix ldisc->open retval
Oliver Hartkopp [Tue, 10 May 2011 20:12:30 +0000 (13:12 -0700)]
slcan: fix ldisc->open retval

TTY layer expects 0 if the ldisc->open operation succeeded.

Reported-by: Matvejchikov Ilya <matvejchikov@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 years agonet/usb: mark LG VL600 LTE modem ethernet interface as WWAN
Dan Williams [Mon, 9 May 2011 07:43:20 +0000 (07:43 +0000)]
net/usb: mark LG VL600 LTE modem ethernet interface as WWAN

Like other mobile broadband device ethernet interfaces, mark the LG
VL600 with the 'wwan' devtype so userspace knows it needs additional
configuration via the AT port before the interface can be used.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 years agoxfrm: Don't allow esn with disabled anti replay detection
Steffen Klassert [Mon, 9 May 2011 19:43:05 +0000 (19:43 +0000)]
xfrm: Don't allow esn with disabled anti replay detection

Unlike the standard case, disabled anti replay detection needs some
nontrivial extra treatment on ESN. RFC 4303 states:

Note: If a receiver chooses to not enable anti-replay for an SA, then
the receiver SHOULD NOT negotiate ESN in an SA management protocol.
Use of ESN creates a need for the receiver to manage the anti-replay
window (in order to determine the correct value for the high-order
bits of the ESN, which are employed in the ICV computation), which is
generally contrary to the notion of disabling anti-replay for an SA.

So return an error if an ESN state with disabled anti replay detection
is inserted for now and add the extra treatment later if we need it.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 years agoxfrm: Assign the inner mode output function to the dst entry
Steffen Klassert [Mon, 9 May 2011 19:36:38 +0000 (19:36 +0000)]
xfrm: Assign the inner mode output function to the dst entry

As it is, we assign the outer modes output function to the dst entry
when we create the xfrm bundle. This leads to two problems on interfamily
scenarios. We might insert ipv4 packets into ip6_fragment when called
from xfrm6_output. The system crashes if we try to fragment an ipv4
packet with ip6_fragment. This issue was introduced with git commit
ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets
as needed). The second issue is, that we might insert ipv4 packets in
netfilter6 and vice versa on interfamily scenarios.

With this patch we assign the inner mode output function to the dst entry
when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner
mode is used and the right fragmentation and netfilter functions are called.
We switch then to outer mode with the output_finish functions.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 years agonet: dev_close() should check IFF_UP
Eric Dumazet [Tue, 10 May 2011 19:26:06 +0000 (12:26 -0700)]
net: dev_close() should check IFF_UP

Commit 443457242beb (factorize sync-rcu call in
unregister_netdevice_many) mistakenly removed one test from dev_close()

Following actions trigger a BUG :

modprobe bonding
modprobe dummy
ifconfig bond0 up
ifenslave bond0 dummy0
rmmod dummy

dev_close() must not close a non IFF_UP device.

With help from Frank Blaschka and Einar EL Lueck

Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reported-by: Einar EL Lueck <ELELUECK@de.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>