Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

KVM: s390/cpacf: Enable/disable protected key functions for kvm guest

Created new KVM device attributes for indicating whether the AES and
DES/TDES protected key functions are available for programs running
on the KVM guest. The attributes are used to set up the controls in
the guest SIE block that specify whether programs running on the
guest will be given access to the protected key functions available
on the s390 hardware.

Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split MSA4/protected key into two patches]

KVM: s390: Provide guest TOD Clock Get/Set Controls

Provide controls for setting/getting the guest TOD clock based on the VM
attribute interface.

Provide TOD and TOD_HIGH vm attributes on s390 for managing guest Time Of
Day clock value.

TOD_HIGH is presently always set to 0. In the future it will contain a high
order expansion of the tod clock value after it overflows the 64-bits of
the TOD.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: trace correct values for set prefix and machine checks

When injecting SIGP set prefix or a machine check, we trace
the values in our per-vcpu local_int data structure instead
of the parameters passed to the function.

Fix this by changing the trace statement to use the correct values.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: fix bug in sigp emergency signal injection

Currently we are always setting the wrong bit in the
bitmap for pending emergency signals. Instead of using
emerg.code from the passed in irq parameter, we use the
value in our per-vcpu local_int structure, which is always zero.
That means all emergency signals will have address 0 as parameter.
If two CPUs send a SIGP to the same target, one might be lost.

Let's fix this by using the value from the parameter and
also trace the correct value.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: Take addressing mode into account for MVPG interception

The handler for MVPG partial execution interception does not take
the current CPU addressing mode into account yet, so addresses are
always treated as 64-bit addresses. For correct behaviour, we should
properly handle 24-bit and 31-bit addresses, too.

Since MVPG is defined to work with logical addresses, we can simply
use guest_translate_address() to achieve the required behaviour
(since DAT is disabled here, guest_translate_address() skips the MMU
translation and only translates the address via kvm_s390_logical_to_effective()
and kvm_s390_real_to_abs(), which is exactly what we want here).

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: no need to hold the kvm->mutex for floating interrupts

The kvm mutex was (probably) used to protect against cpu hotplug.
The current code no longer needs to protect against that, as we only
rely on CPU data structures that are guaranteed to be available
if we can access the CPU. (e.g. vcpu_create will put the cpu
in the array AFTER the cpu is ready).

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>

KVM: s390: forward most SIGP orders to user space

Most SIGP orders are handled partially in kernel and partially in
user space. In order to:
- Get a correct SIGP SET PREFIX handler that informs user space
- Avoid race conditions between concurrently executed SIGP orders
- Serialize SIGP orders per VCPU

We need to handle all "slow" SIGP orders in user space. The remaining
ones to be handled completely in kernel are:
- SENSE
- SENSE RUNNING
- EXTERNAL CALL
- EMERGENCY SIGNAL
- CONDITIONAL EMERGENCY SIGNAL
According to the PoP, they have to be fast. They can be executed
without conflicting to the actions of other pending/concurrently
executing orders (e.g. STOP vs. START).

This patch introduces a new capability that will - when enabled -
forward all but the mentioned SIGP orders to user space. The
instruction counters in the kernel are still updated.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: clear the pfault queue if user space sets the invalid token

We need a way to clear the async pfault queue from user space (e.g.
for resets and SIGP SET ARCHITECTURE).

This patch simply clears the queue as soon as user space sets the
invalid pfault token. The definition of the invalid token is moved
to uapi.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: only one external call may be pending at a time

Only one external call may be pending at a vcpu at a time. For this
reason, we have to detect whether the SIGP externcal call interpretation
facility is available. If so, all external calls have to be injected
using this mechanism.

SIGP EXTERNAL CALL orders have to return whether another external
call is already pending. This check was missing until now.

SIGP SENSE hasn't returned yet in all conditions whether an external
call was pending.

If a SIGP EXTERNAL CALL irq is to be injected and one is already
pending, -EBUSY is returned.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

s390/sclp: introduce check for the SIGP Interpretation Facility

This patch introduces the infrastructure to check whether the SIGP
Interpretation Facility is installed on all VCPUs in the configuration.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: SIGP SET PREFIX cleanup

This patch cleanes up the the SIGP SET PREFIX code.

A SIGP SET PREFIX irq may only be injected if the target vcpu is
stopped. Let's move the checking code into the injection code and
return -EBUSY if the target vcpu is not stopped.

Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: a VCPU may only stop when no interrupts are left pending

As a SIGP STOP is an interrupt with the least priority, it may only result
in stop of the vcpu when no other interrupts are left pending.

To detect whether a non-stop irq is pending, we need a way to mask out
stop irqs from the general kvm_cpu_has_interrupt() function. For this
reason, the existing function (with an outdated name) is replaced by
kvm_s390_vcpu_has_irq() which allows to mask out pending stop irqs.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: handle stop irqs without action_bits

This patch removes the famous action_bits and moves the handling of
SIGP STOP AND STORE STATUS directly into the SIGP STOP interrupt.

The new local interrupt infrastructure is used to track pending stop
requests.

STOP irqs are the only irqs that don't get actively delivered. They
remain pending until the stop function is executed (=stop intercept).

If another STOP irq is already pending, -EBUSY will now be returned
(needed for the SIGP handling code).

Migration of pending SIGP STOP (AND STORE STATUS) orders should now
be supported out of the box.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: new parameter for SIGP STOP irqs

In order to get rid of the action_flags and to properly migrate pending SIGP
STOP irqs triggered e.g. by SIGP STOP AND STORE STATUS, we need to remember
whether to store the status when stopping.

For this reason, a new parameter (flags) for the SIGP STOP irq is introduced.
These flags further define details of the requested STOP and can be easily
migrated.

Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: forward hrtimer if guest ckc not pending yet

Patch 0759d0681cae ("KVM: s390: cleanup handle_wait by reusing
kvm_vcpu_block") changed the way pending guest clock comparator
interrupts are detected. It was assumed that as soon as the hrtimer
wakes up, the condition for the guest ckc is satisfied.

This is however only true as long as adjclock() doesn't speed
up the monotonic clock. Reason is that the hrtimer is based on
CLOCK_MONOTONIC, the guest clock comparator detection is based
on the raw TOD clock. If CLOCK_MONOTONIC runs faster than the
TOD clock, the hrtimer wakes the target VCPU up too early and
the target VCPU will not detect any pending interrupts, therefore
going back to sleep. It will never be woken up again because the
hrtimer has finished. The VCPU is stuck.

As a quick fix, we have to forward the hrtimer until the guest
clock comparator is really due, to guarantee properly timed wake
ups.

As the hrtimer callback might be triggered on another cpu, we
have to make sure that the timer is really stopped and not currently
executing the callback on another cpu. This can happen if the vcpu
thread is scheduled onto another physical cpu, but the timer base
is not migrated. So lets use hrtimer_cancel instead of try_to_cancel.

A proper fix might be to introduce a RAW based hrtimer.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: base hrtimer on a monotonic clock

The hrtimer that handles the wait with enabled timer interrupts
should not be disturbed by changes of the host time.

This patch changes our hrtimer to be based on a monotonic clock.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: prevent sleep duration underflows in handle_wait()

We sometimes get an underflow for the sleep duration, which most
likely won't result in the short sleep time we wanted.

So let's check for sleep duration underflows and directly continue
to run the guest if we get one.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: Allow userspace to limit guest memory size

With commit c6c956b80bdf ("KVM: s390/mm: support gmap page tables with less
than 5 levels") we are able to define a limit for the guest memory size.

As we round up the guest size in respect to the levels of page tables
we get to guest limits of: 2048 MB, 4096 GB, 8192 TB and 16384 PB.
We currently limit the guest size to 16 TB, which means we end up
creating a page table structure supporting guest sizes up to 8192 TB.

This patch introduces an interface that allows userspace to tune
this limit. This may bring performance improvements for small guests.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: move vcpu specific initalization to a later point

As we will allow in a later patch to recreate gmaps with new limits,
we need to make sure that vcpus get their reference for that gmap
after they increased the online_vcpu counter, so there is no possible race.

While we are doing this, we also can simplify the vcpu_init function, by
moving ucontrol specifics to an own function.
That way we also start now setting the kvm_valid_regs for the ucontrol path.

Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: make local function static

sparse rightfully complains about
warning: symbol '__inject_extcall' was not declared. Should it be static?

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: remove unneeded return value of vcpu_postcreate

The return value of kvm_arch_vcpu_postcreate is not checked in its
caller. This is okay, because only x86 provides vcpu_postcreate right
now and it could only fail if vcpu_load failed. But that is not
possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so
just get rid of the unchecked return value.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

kvm: x86: Remove kvm_make_request from lapic.c

Adds a function kvm_vcpu_set_pending_timer instead of calling
kvm_make_request in lapic.c.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Access to LDT/GDT that wraparound is incorrect

When access to descriptor in LDT/GDT wraparound outside long-mode, the address
of the descriptor should be truncated to 32-bit. Citing Intel SDM 2.1.1.1
"Global and Local Descriptor Tables in IA-32e Mode": "GDTR and LDTR registers
are expanded to 64-bits wide in both IA-32e sub-modes (64-bit mode and
compatibility mode)."

So in other cases, we need to truncate. Creating new function to return a
pointer to descriptor table to avoid too much code duplication.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
[Wrap 64-bit check with #ifdef CONFIG_X86_64, to avoid a "right shift count
>= width of type" warning and consequent undefined behavior. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Do not set access bit on accessed segments

When segment is loaded, the segment access bit is set unconditionally. In
fact, it should be set conditionally, based on whether the segment had the
accessed bit set before. In addition, it can improve performance.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: POP [ESP] is not emulated correctly

According to Intel SDM: "If the ESP register is used as a base register for
addressing a destination operand in memory, the POP instruction computes the
effective address of the operand after it increments the ESP register."

The current emulation does not behave so. The fix required to waste another
of the precious instruction flags and to check the flag in decode_modrm.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: em_call_far should return failure result

Currently, if em_call_far fails it returns success instead of the resulting
error-code. Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: JMP/CALL using call- or task-gate causes exception

The KVM emulator does not emulate JMP and CALL that target a call gate or a
task gate. This patch does not try to implement these scenario as they are
presumably rare; yet it returns X86EMUL_UNHANDLEABLE error in such cases
instead of generating an exception.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: fnstcw and fnstsw may cause spurious exception

Since the operand size of fnstcw and fnstsw is updated during the execution,
the emulation may cause spurious exceptions as it reads the memory beforehand.

Marking these instructions as Mov (since the previous value is ignored) and
DstMem16 to simplify the setting of operand size.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: pop sreg accesses only 2 bytes

Although pop sreg updates RSP according to the operand size, only 2 bytes are
read. The current behavior may result in incorrect #GP or #PF exceptions.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: mmu: replace assertions with MMU_WARN_ON, a conditional WARN_ON

This makes the direction of the conditions consistent with code that
is already using WARN_ON.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: mmu: remove ASSERT(vcpu)

Because ASSERT is just a printk, these would oops right away.
The assertion thus hardly adds anything.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: mmu: remove argument to kvm_init_shadow_mmu and kvm_init_shadow_ept_mmu

The initialization function in mmu.c can always use walk_mmu, which
is known to be vcpu->arch.mmu. Only init_kvm_nested_mmu is used to
initialize vcpu->arch.nested_mmu.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: mmu: do not use return to tail-call functions that return void

This is, pedantically, not valid C. It also looks weird.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: add tracepoint to wait_lapic_expire

Add tracepoint to wait_lapic_expire.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
[Remind reader if early or late. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: add option to advance tscdeadline hrtimer expiration

For the hrtimer which emulates the tscdeadline timer in the guest,
add an option to advance expiration, and busy spin on VM-entry waiting
for the actual expiration time to elapse.

This allows achieving low latencies in cyclictest (or any scenario
which requires strict timing regarding timer expiration).

Reduces average cyclictest latency from 12us to 8us
on Core i5 desktop.

Note: this option requires tuning to find the appropriate value
for a particular hardware/guest combination. One method is to measure the
average delay between apic_timer_fn and VM-entry.
Another method is to start with 1000ns, and increase the value
in say 500ns increments until avg cyclictest numbers stop decreasing.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: add method to test PIR bitmap vector

kvm_x86_ops->test_posted_interrupt() returns true/false depending
whether 'vector' is set.

Next patch makes use of this interface.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

kvm: x86: vmx: NULL out hwapic_isr_update() in case of !enable_apicv

In most cases calling hwapic_isr_update(), we always check if
kvm_apic_vid_enabled() == 1, but actually,
kvm_apic_vid_enabled()
    -> kvm_x86_ops->vm_has_apicv()
        -> vmx_vm_has_apicv() or '0' in svm case
            -> return enable_apicv && irqchip_in_kernel(kvm)

So its a little cost to recall vmx_vm_has_apicv() inside
hwapic_isr_update(), here just NULL out hwapic_isr_update() in
case of !enable_apicv inside hardware_setup() then make all
related stuffs follow this. Note we don't check this under that
condition of irqchip_in_kernel() since we should make sure
definitely any caller don't work  without in-kernel irqchip.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Remove FIXMEs in emulate.c for the function,task_switch_32

Remove FIXME comments about needing fault addresses to be returned. These
are propaagated from walk_addr_generic to gva_to_gpa and from there to
ops->read_std and ops->write_std.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: nVMX: consult PFEC_MASK and PFEC_MATCH when generating #PF VM-exit

When generating #PF VM-exit, check equality:
(PFEC & PFEC_MASK) == PFEC_MATCH
If there is equality, the 14 bit of exception bitmap is used to take decision
about generating #PF VM-exit. If there is inequality, inverted 14 bit is used.

Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: nVMX: Improve nested msr switch checking

This patch improve checks required by Intel Software Developer Manual.
- SMM MSRs are not allowed.
- microcode MSRs are not allowed.
- check x2apic MSRs only when LAPIC is in x2apic mode.
- MSR switch areas must be aligned to 16 bytes.
- address of first and last byte in MSR switch areas should not set any bits
beyond the processor's physical-address width.

Also it adds warning messages on failures during MSR switch. These messages
are useful for people who debug their VMMs in nVMX.

Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: nVMX: Add nested msr load/restore algorithm

Several hypervisors need MSR auto load/restore feature.
We read MSRs from VM-entry MSR load area which specified by L1,
and load them via kvm_set_msr in the nested entry.
When nested exit occurs, we get MSRs via kvm_get_msr, writing
them to L1`s MSR store area. After this, we read MSRs from VM-exit
MSR load area, and load them via kvm_set_msr.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Linux 3.19-rc3

Merge tag 'powerpc-3.19-3' of git://git./linux/kernel/git/mpe/linux

Pull powerpc fixes from Michael Ellerman:

- Wire up sys_execveat(). Tested on 32 & 64 bit.

- Fix for kdump on LE systems with cpus hot unplugged.

- Revert Anton's fix for "kernel BUG at kernel/smpboot.c:134!", this
   broke other platforms, we'll do a proper fix for 3.20.

* tag 'powerpc-3.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  Revert "powerpc: Secondary CPUs must set cpu_callin_map after setting active and online"
  powerpc/kdump: Ignore failure in enabling big endian exception during crash
  powerpc: Wire up sys_execveat() syscall

Merge tag 'please-pull-syscall' of git://git./linux/kernel/git/aegl/linux

Pull ia64 fixlet from Tony Luck:
"Add execveat syscall"

* tag 'please-pull-syscall' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
[IA64] Enable execveat syscall for ia64

[IA64] Enable execveat syscall for ia64

See commit 51f39a1f0cea1cacf8c787f652f26dfee9611874
syscalls: implement execveat() system call

Signed-off-by: Tony Luck <tony.luck@intel.com>

Merge branch 'for-linus' of git://git./linux/kernel/git/rw/uml

Pull UML fixes from Richard Weinberger:
"Two fixes for UML regressions. Nothing exciting"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
x86, um: actually mark system call tables readonly
um: Skip futex_atomic_cmpxchg_inatomic() test

Revert "ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo"

Commit 9fc2105aeaaf ("ARM: 7830/1: delay: don't bother reporting
bogomips in /proc/cpuinfo") breaks audio in python, and probably
elsewhere, with message

  FATAL: cannot locate cpu MHz in /proc/cpuinfo

I'm not the first one to hit it, see for example

  https://theredblacktree.wordpress.com/2014/08/10/fatal-cannot-locate-cpu-mhz-in-proccpuinfo/
  https://devtalk.nvidia.com/default/topic/765800/workaround-for-fatal-cannot-locate-cpu-mhz-in-proc-cpuinf/?offset=1

Reading original changelog, I have to say "Stop breaking working setups.
You know who you are!".

Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86, um: actually mark system call tables readonly

Commit a074335a370e ("x86, um: Mark system call tables readonly") was
supposed to mark the sys_call_table in UML as RO by adding the const,
but it doesn't have the desired effect as it's nevertheless being placed
into the data section since __cacheline_aligned enforces sys_call_table
being placed into .data..cacheline_aligned instead. We need to use
the ____cacheline_aligned version instead to fix this issue.

Before:

$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
U sys_writev
0000000000000000 D sys_call_table
0000000000000000 D syscall_table_size

After:

$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
U sys_writev
0000000000000000 R sys_call_table
0000000000000000 D syscall_table_size

Fixes: a074335a370e ("x86, um: Mark system call tables readonly")
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Richard Weinberger <richard@nod.at>

um: Skip futex_atomic_cmpxchg_inatomic() test

futex_atomic_cmpxchg_inatomic() does not work on UML because
it triggers a copy_from_user() in kernel context.
On UML copy_from_user() can only be used if the kernel was called
by a real user space process such that UML can use ptrace()
to fetch the value.

Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Tested-by: Daniel Walter <d.walter@0x90.at>

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"This is a set of three fixes: one to correct an abort path thinko
  causing failures (and a panic) in USB on device misbehaviour, One to
  fix an out of order issue in the fnic driver and one to match discard
  expectations to qemu which otherwise cause Linux to behave badly as a
  guest"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  SCSI: fix regression in scsi_send_eh_cmnd()
  fnic: IOMMU Fault occurs when IO and abort IO is out of order
  sd: tweak discard heuristics to work around QEMU SCSI issue

Merge tag 'sound-3.19-rc3' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Nothing too exciting as a new year's start here: most of fixes are for
  ASoC, a boot crash fix on OMAP for deferred probe, a few driver
  specific fixes (Intel, dwc, rockchip, rt5677), in addition to typo
  fixes in kerneldoc comments for PCM"

* tag 'sound-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: pcm: Fix kerneldoc for params_*() functions
  ASoC: rockchip: i2s: fix maxburst of dma data to 4
  ASoC: rockchip: i2s: fix error defination of transmit data level
  ASoC: Intel: correct the fixed free block allocation
  ASoC: rt5677: fixed rt5677_dsp_vad_put rt5677_dsp_vad_get panic
  ASoC: Intel: Fix BYTCR machine driver MODULE_ALIAS
  ASoC: Intel: Fix BYTCR firmware name
  ASoC: dwc: Iterate over all channels
  ASoC: dwc: Ensure FIFOs are flushed to prevent channel swap
  ASoC: Intel: Add I2C dependency to two new machines
  ASoC: dapm: Remove snd_soc_of_parse_audio_routing() due to deferred probe

Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull vhost cleanup and virtio bugfix
"There's a single change here, fixing a vhost bug where vhost
  initialization fails due to used ring alignment check being too
  strict"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: relax used address alignment
  virtio_ring: document alignment requirements

Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit

Pull audit fix from Paul Moore:
"One audit patch to resolve a panic/oops when recording filenames in
  the audit log, see the mail archive link below.

  The fix isn't as nice as I would like, as it involves an allocate/copy
  of the filename, but it solves the problem and the overhead should
  only affect users who have configured audit rules involving file
  names.

  We'll revisit this issue with future kernels in an attempt to make
  this suck less, but in the meantime I think this fix should go into
  the next release of v3.19-rcX.

  [ https://marc.info/?t=141986927600001&r=1&w=2 ]"

* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
  audit: create private file name copies when auditing inodes

Revert "Input: atmel_mxt_ts - use deep sleep mode when stopped"

This reverts commit 9d469d033d135d80742a4e39e6bbb4519dd5eee1.

It breaks the Chromebook Pixel touchpad (and touchscreen).

Reported-by: Dirk Hohndel <dirk@hohndel.org>
Bisected-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Dyer <nick.dyer@itdev.co.uk>
Cc: Benson Leung <bleung@chromium.org>
Cc: Yufeng Shen <miletus@chromium.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'nios2-fixes-v3.19-rc3' of git://git.rocketboards.org/linux-socfpga-next

Pull arch/nios2 fixes from Ley Foon Tan:

- fix compilation error when enable CONFIG_PREEMPT

- initialize cpuinfo.mmu variable supplied by the device tree

* tag 'nios2-fixes-v3.19-rc3' of git://git.rocketboards.org/linux-socfpga-next:
nios2: Use preempt_schedule_irq
nios2: Initialize cpuinfo.mmu

Merge git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
"Fix a use-after-free crash in the user-space crypto API"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: af_alg - fix backlog handling

nios2: Use preempt_schedule_irq

Follow aa0d53260596 ("ia64: Use preempt_schedule_irq") and use
preempt_schedule_irq instead of enabling/disabling interrupts and
messing around with PREEMPT_ACTIVE in the nios2 low-level preemption
code ourselves. Also get rid of the now needless re-check for
TIF_NEED_RESCHED, preempt_schedule_irq will already take care of
rescheduling.

This also fixes the following build error when building with
CONFIG_PREEMPT:

arch/nios2/kernel/built-in.o: In function `need_resched':
arch/nios2/kernel/entry.S:374: undefined reference to `PREEMPT_ACTIVE'

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Ley Foon Tan <lftan@altera.com>

nios2: Initialize cpuinfo.mmu

This patch initializes the mmu field of the cpuinfo structure to the
value supplied by the devicetree.

Signed-off-by: Walter Goossens <waltergoossens@home.nl>
Acked-by: Ley Foon Tan <lftan@altera.com>

Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
"A very small set of fixes for 3.19, as everyone was out.

  The clocksource patch was something I missed for the merge window
  after the change that broke arm64 was merged through arm-soc.  The
  other two patches are a fix for an undetected merge problem in mvebu
  and a defconfig change to make some exynos boards work with the normal
  multi_v7_defconfig"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  Add USB_EHCI_EXYNOS to multi_v7_defconfig
  ARM: mvebu: Fix pinctrl configuration for Armada 370 DB
  clocksource: arch_timer: Only use the virtual counter (CNTVCT) on arm64

Merge tag 'fbdev-fixes-3.19' of git://git./linux/kernel/git/tomba/linux

Pull fbdev fixes from Tomi Valkeinen:

- Fix regression with Nokia N900 display

- Fix crash on fbdev using freed __initdata logos

- Fix fb_deferred_io_fsync() return value.

* tag 'fbdev-fixes-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
  OMAPDSS: SDI: fix output port_num
  video/fbdev: fix defio's fsync
  video/logo: prevent use of logos after they have been freed
  OMAPDSS: pll: NULL dereference in error handling
  OMAPDSS: HDMI: remove double initializer entries

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input layer fixes from Dmitry Torokhov:
"Fixes for v7 protocol for ALPS devices and few other driver fixes.

  Also users can request input events to be stamped with boot time
  timestamps, in addition to real and monotonic timestamps"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: hil_kbd - fix incorrect use of init_completion
  Input: alps - v7: document the v7 touchpad packet protocol
  Input: alps - v7: fix finger counting for > 2 fingers on clickpads
  Input: alps - v7: sometimes a single touch is reported in mt[1]
  Input: alps - v7: ignore new packets
  Input: evdev - add CLOCK_BOOTTIME support
  Input: psmouse - expose drift duration for IBM trackpoints
  Input: stmpe - bias keypad columns properly
  Input: stmpe - enforce device tree only mode
  mfd: stmpe: add pull up/down register offsets for STMPE
  Input: optimize events_per_packet count calculation
  Input: edt-ft5x06 - fixed a macro coding style issue
  Input: gpio_keys - replace timer and workqueue with delayed workqueue
  Input: gpio_keys - allow separating gpio and irq in device tree

Revert "cfg80211: make WEXT compatibility unselectable"

This reverts commit 24a0aa212ee2dbe44360288684478d76a8e20a0a.

It's causing severe userspace breakage. Namely, all the utilities from
wireless-utils which are relying on CONFIG_WEXT (which means tools like
'iwconfig', 'iwlist', etc) are not working anymore. There is a 'iw'
utility in newer wireless-tools, which is supposed to be a replacement
for all the "deprecated" binaries, but it's far away from being
massively adopted.

Please see [1] for example of the userspace breakage this is causing.

In addition to that, Larry Finger reports [2] that this patch is also
causing ipw2200 driver being impossible to build.

To me this clearly shows that CONFIG_WEXT is far, far away from being
"deprecated enough" to be removed.

[1] http://thread.gmane.org/gmane.linux.kernel/1857010
[2] http://thread.gmane.org/gmane.linux.network/343688

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix double SKB free in bluetooth 6lowpan layer, from Jukka Rissanen.

2) Fix receive checksum handling in enic driver, from Govindarajulu
    Varadarajan.

3) Fix NAPI poll list corruption in virtio_net and caif_virtio, from
    Herbert Xu.  Also, add code to detect drivers that have this mistake
    in the future.

4) Fix doorbell endianness handling in mlx4 driver, from Amir Vadai.

5) Don't clobber IP6CB() before xfrm6_policy_check() is called in TCP
    input path,f rom Nicolas Dichtel.

6) Fix MPLS action validation in openvswitch, from Pravin B Shelar.

7) Fix double SKB free in vxlan driver, also from Pravin.

8) When we scrub a packet, which happens when we are switching the
    context of the packet (namespace, etc.), we should reset the
    secmark.  From Thomas Graf.

9) ->ndo_gso_check() needs to do more than return true/false, it also
    has to allow the driver to clear netdev feature bits in order for
    the caller to be able to proceed properly.  From Jesse Gross.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
  genetlink: A genl_bind() to an out-of-range multicast group should not WARN().
  netlink/genetlink: pass network namespace to bind/unbind
  ne2k-pci: Add pci_disable_device in error handling
  bonding: change error message to debug message in __bond_release_one()
  genetlink: pass multicast bind/unbind to families
  netlink: call unbind when releasing socket
  netlink: update listeners directly when removing socket
  genetlink: pass only network namespace to genl_has_listeners()
  netlink: rename netlink_unbind() to netlink_undo_bind()
  net: Generalize ndo_gso_check to ndo_features_check
  net: incorrect use of init_completion fixup
  neigh: remove next ptr from struct neigh_table
  net: xilinx: Remove unnecessary temac_property in the driver
  net: phy: micrel: use generic config_init for KSZ8021/KSZ8031
  net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding
  openvswitch: fix odd_ptr_err.cocci warnings
  Bluetooth: Fix accepting connections when not using mgmt
  Bluetooth: Fix controller configuration with HCI_QUIRK_INVALID_BDADDR
  brcmfmac: Do not crash if platform data is not populated
  ipw2200: select CFG80211_WEXT
  ...

Merge tag 'linux-kselftest-3.19-fixes' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull kselftest fix from Shuah Khan:
"Fix exec test compile warnings"

* tag 'linux-kselftest-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/exec: Use %zu to format size_t

SCSI: fix regression in scsi_send_eh_cmnd()

Commit ac61d1955934 (scsi: set correct completion code in
scsi_send_eh_cmnd()) introduced a bug.  It changed the stored return
value from a queuecommand call, but it didn't take into account that
the return value was used again later on.  This patch fixes the bug by
changing the later usage.

There is a big comment in the middle of scsi_send_eh_cmnd() which
does a good job of explaining how the routine works.  But it mentions
a "rtn = FAILURE" value that doesn't exist in the code.  This patch
adjusts the code to match the comment (I assume the comment is right
and the code is wrong).

This fixes Bugzilla #88341.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Андрей Аладьев <aladjev.andrew@gmail.com>
Tested-by: Андрей Аладьев <aladjev.andrew@gmail.com>
Fixes: ac61d19559349e205dad7b5122b281419aa74a82
Acked-by: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

ALSA: pcm: Fix kerneldoc for params_*() functions

Fix a copy and paste error in the kernel doc description for the params_*()
functions.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge tag 'asoc-fix-v3.19-rc2' of git://git./linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v3.19

A few fixes for v3.19, a few driver specifics and one core fix which
fixes a boot crash on OMAP if deferred probing kicks in due to
attempting to modify static data.

Add USB_EHCI_EXYNOS to multi_v7_defconfig

Currently we enable Exynos devices in the multi v7 defconfig, however, when
testing on my ODROID-U3, I noticed that USB was not working. Enabling this
option causes USB to work, which enables networking support as well since the
ODROID-U3 has networking on the USB bus.

[arnd] Support for odroid-u3 was added in 3.10, so it would be nice to
backport this fix at least that far.

Signed-off-by: Steev Klimaszewski <steev@gentoo.org>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'mvebu-fixes-3.19' of git://git.infradead.org/linux-mvebu into fixes

Pull "Fixes for 3.19" from Andrew Lunn:

Jason is taking a back seat this cycle and i'm doing all the patch
wrangling for mvebu.

* tag 'mvebu-fixes-3.19' of git://git.infradead.org/linux-mvebu:
ARM: mvebu: Fix pinctrl configuration for Armada 370 DB

Also update to Linux 3.19-rc1, which this was based on.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

audit: create private file name copies when auditing inodes

Unfortunately, while commit 4a928436 ("audit: correctly record file
names with different path name types") fixed a problem where we were
not recording filenames, it created a new problem by attempting to use
these file names after they had been freed. This patch resolves the
issue by creating a copy of the filename which the audit subsystem
frees after it is done with the string.

At some point it would be nice to resolve this issue with refcounts,
or something similar, instead of having to allocate/copy strings, but
that is almost surely beyond the scope of a -rcX patch so we'll defer
that for later. On the plus side, only audit users should be impacted
by the string copying.

Reported-by: Toralf Foerster <toralf.foerster@gmx.de>
Signed-off-by: Paul Moore <pmoore@redhat.com>

fnic: IOMMU Fault occurs when IO and abort IO is out of order

When I/O is aborted by mid-layer, fnic FW will complete the I/O before
completing the abort task. In some cases abort request is completed before
the I/O, which could lead to inconsistent driver and firmware states.
In this case firmware reset would clear the inconsistent state.

Signed-off-by: Anil Chintalapati <achintal@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Hiral Shah <hishah@cisco.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

sd: tweak discard heuristics to work around QEMU SCSI issue

7985090aa020 changed the discard heuristics to give preference to the
WRITE SAME commands that (unlike UNMAP) guarantee deterministic results.

Ming Lei discovered that QEMU SCSI's WRITE SAME implementation
internally relied on limits that were only communicated for the UNMAP
case. And therefore discard commands backed by WRITE SAME would fail.

Tweak the heuristics so we still pick UNMAP in the LBPRZ=0 case and only
prefer the WRITE SAME variants if the device has the LBPRZ flag set.

Reported-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

OMAPDSS: SDI: fix output port_num

After the commit ef691ff48bc8 (OMAPDSS: DT: Get source endpoint by
matching reg-id) we look for the SDI output using the port number.
However, the SDI driver doesn't set the port number, which causes the
SDI display to not initialize.

Fix this by setting the SDI port number to 1. We use a hardcoded value,
as SDI was used only on OMAP3 and it's always port number 1 there.

Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>

video/fbdev: fix defio's fsync

fb_deferred_io_fsync() returns the value of schedule_delayed_work() as
an error code, but schedule_delayed_work() does not return an error. It
returns true/false depending on whether the work was already queued.

Fix this by ignoring the return value of schedule_delayed_work().

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: stable@vger.kernel.org

Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6

Pull CIFS fixes from Steve French:
"A set of three minor cifs fixes"

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: make new inode cache when file type is different
  Fix signed/unsigned pointer warning
  Convert MessageID in smb2_hdr to LE

Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs

Pull UDF & isofs fixes from Jan Kara:
"A couple of UDF fixes of handling of corrupted media and one iso9660
  fix of the same"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Reduce repeated dereferences
  udf: Check component length before reading it
  udf: Check path length when reading symlink
  udf: Verify symlink size before loading it
  udf: Verify i_size when loading inode
  isofs: Fix unchecked printing of ER records

Merge tag 'pm+acpi-3.19-rc3' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management and ACPI material from Rafael J Wysocki:
"These are fixes (operating performance points library, cpufreq-dt
  driver, cpufreq core, ACPI backlight, cpupower tool), cleanups
  (cpuidle), new processor IDs for the RAPL (Running Average Power
  Limit) power capping driver, and a modification of the generic power
  domains framework allowing modular drivers to call one of its helper
  functions.

  Specifics:

   - Fix for a potential NULL pointer dereference in the cpufreq core
     due to an initialization race condition (Ethan Zhao).

   - Fixes for abuse of the OPP (Operating Performance Points) API
     related to RCU and other minor issues in the OPP library and the
     cpufreq-dt driver (Dmitry Torokhov).

   - cpuidle governors cleanup making them measure idle duration in a
     better way without using the CPUIDLE_FLAG_TIME_INVALID flag which
     allows that flag to be dropped from the ACPI cpuidle driver and
     from the core too (Len Brown).

   - New ACPI backlight blacklist entries for Samsung machines without a
     working native backlight interface that need to use the ACPI
     backlight instead (Aaron Lu).

   - New CPU IDs of future Intel Xeon CPUs for the Intel RAPL power
     capping driver (Jacob Pan).

   - Generic power domains framework modification to export the
     of_genpd_get_from_provider() function to modular drivers that will
     allow future driver modifications to be based on the mainline (Amit
     Daniel Kachhap).

   - Two fixes for the cpupower tool (Michal Privoznik, Prarit
     Bhargava)"

* tag 'pm+acpi-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / video: Add some Samsung models to disable_native_backlight list
  tools / cpupower: Fix no idle state information return value
  tools / cpupower: Correctly detect if running as root
  cpufreq: fix a NULL pointer dereference in __cpufreq_governor()
  cpufreq-dt: defer probing if OPP table is not ready
  PM / OPP: take RCU lock in dev_pm_opp_get_opp_count
  PM / OPP: fix warning in of_free_opp_table()
  PM / OPP: add some lockdep annotations
  powercap / RAPL: add IDs for future Xeon CPUs
  PM / Domains: Export of_genpd_get_from_provider function
  cpuidle / ACPI: remove unused CPUIDLE_FLAG_TIME_INVALID
  cpuidle: ladder: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID
  cpuidle: menu: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID

genetlink: A genl_bind() to an out-of-range multicast group should not WARN().

Users can request to bind to arbitrary multicast groups, so warning
when the requested group number is out of range is not appropriate.

And with the warning removed, and the 'err' variable properly given
an initial value, we can remove 'found' altogether.

Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'spi-v3.19-rc2' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A few driver specific fixes here, the DMA burst size increase in the
  spfi driver is a fix to make the hardware happier in some situations"

* tag 'spi-v3.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: img-spfi: Increase DMA burst size
  spi: img-spfi: Enable controller before starting TX DMA
  spi: sh-msiof: Add runtime PM lock in initializing

Merge tag 'regulator-v3.19-rc2' of git://git./linux/kernel/git/broonie/regulator

Pull one regulator fix from Mark Brown:
"One fix here, a fix for the voltage mapping on one of the s2mps11
  regulators which broke systems using it including apparently the
  Gear 2 smartwatches"

* tag 'regulator-v3.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: s2mps11: Fix dw_mmc failure on Gear 2

Merge tag 'mmc-v3.19-2' of git://git.linaro.org/people/ulf.hansson/mmc

Pull one MMC fix from Ulf Hansson:
"MMC core:

- Fix selection of buswidth for mmc hosts supporting 1-bit only"

* tag 'mmc-v3.19-2' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: core: stop trying to switch width when only one bit is supported

Merge branch 'next' of git://git./linux/kernel/git/rzhang/linux

Pull thermal management updates from Zhang Rui:
"First of all, the most important change is the thermal cpu cooling
  fixes.  The major fix here is to have proper sequencing between
  cpufreq layer and thermal cpu cooling registration.  A take away of
  this fix is an improvement in the thermal drivers code.  Thermal
  drivers that require cpu cooling do not need to check for cpufreq
  layer.  The requirement now is to propagate the error code, if any,
  while registering cpu cooling device.  Thanks to Viresh for
  implementing the required CPUfreq changes.

  Second, a new driver is introduced for int340x processor thermal
  device.  Given that int340x thermal is disabled by default, and this
  processor thermal device is only available on limited platforms, plus
  the driver does nothing but exposes some thermal limitation
  information for user space to use, thus I think it is safe to include
  it in this pull request after missing 3.19-rc2.

  Specifics:

   - Thermal cpu cooling fixes and cleanups.

   - introduce INT340X processor thermal reporting device driver.

   - several small fixes and cleanups for int340x thermal drivers"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (43 commits)
  Thermal/int340x/int3403: Free acpi notification handler
  Thermal/int340x/processor_thermal: Fix memory leak
  Thermal/int340x/int3403: Fix memory leak
  thermal: int340x: Introduce processor reporting device
  thermal: int340x_thermal: drop owner assignment from platform_drivers
  thermal: drop owner assignment from platform_drivers
  thermal: cpu_cooling: document node in struct cpufreq_cooling_device
  thermal/powerclamp: add ids for future xeon cpus
  Thermal/int340x: Handle properly the case when _trt or _art acpi entry is missing
  thermal: cpu_cooling: return ERR_PTR() for !CPU_THERMAL or !THERMAL_OF
  thermal: cpu_cooling: small memory leak on error
  thermal: ti-soc-thermal: Do not print error message in the EPROBE_DEFER case
  thermal: db8500: Do not print error message in the EPROBE_DEFER case
  thermal: imx: Do not print error message in the EPROBE_DEFER case
  thermal: Fix cdev registration with THERMAL_NO_LIMIT on 64bit
  drivers: thermal: Remove ARCH_HAS_BANDGAP dependency for samsung
  thermal:core:fix: Check return code of the ->get_max_state() callback
  thermal: cpu_cooling: update copyright tags
  thermal: cpu_cooling: Use cpufreq_dev->freq_table for finding level/freq
  thermal: cpu_cooling: Store frequencies in descending order
  ...

mm: get rid of radix tree gfp mask for pagecache_get_page

Commit 2457aec63745 ("mm: non-atomically mark page accessed during page
cache allocation where possible") has added a separate parameter for
specifying gfp mask for radix tree allocations.

Not only this is less than optimal from the API point of view because it
is error prone, it is also buggy currently because
grab_cache_page_write_begin is using GFP_KERNEL for radix tree and if
fgp_flags doesn't contain FGP_NOFS (mostly controlled by fs by
AOP_FLAG_NOFS flag) but the mapping_gfp_mask has __GFP_FS cleared then
the radix tree allocation wouldn't obey the restriction and might
recurse into filesystem and cause deadlocks.  This is the case for most
filesystems unfortunately because only ext4 and gfs2 are using
AOP_FLAG_NOFS.

Let's simply remove radix_gfp_mask parameter because the allocation
context is same for both page cache and for the radix tree.  Just make
sure that the radix tree gets only the sane subset of the mask (e.g.  do
not pass __GFP_WRITE).

Long term it is more preferable to convert remaining users of
AOP_FLAG_NOFS to use mapping_gfp_mask instead and simplify this
interface even further.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'acpi-video'

* acpi-video:
ACPI / video: Add some Samsung models to disable_native_backlight list

Merge branches 'pm-domains', 'powercap' and 'pm-tools'

* pm-domains:
  PM / Domains: Export of_genpd_get_from_provider function

* powercap:
  powercap / RAPL: add IDs for future Xeon CPUs

* pm-tools:
  tools / cpupower: Fix no idle state information return value
  tools / cpupower: Correctly detect if running as root

Merge branches 'pm-cpufreq' and 'pm-cpuidle'

* pm-cpufreq:
  cpufreq: fix a NULL pointer dereference in __cpufreq_governor()
  cpufreq-dt: defer probing if OPP table is not ready

* pm-cpuidle:
  cpuidle / ACPI: remove unused CPUIDLE_FLAG_TIME_INVALID
  cpuidle: ladder: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID
  cpuidle: menu: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID

Merge branch 'pm-opp'

* pm-opp:
  PM / OPP: take RCU lock in dev_pm_opp_get_opp_count
  PM / OPP: fix warning in of_free_opp_table()
  PM / OPP: add some lockdep annotations

mmc: core: stop trying to switch width when only one bit is supported

mmc_select_bus_width() will try to switch to MMC_BUS_WIDTH_4 even if
MMC_CAP_4_BIT_DATA and MMC_CAP_8_BIT_DATA are not set in host->caps.
Return as soon as possible when those flags are not set

Fixes: 577fb13199b1 (mmc: rework selection of bus speed mode)
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: <stable@vger.kernel.org> # 3.17
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

vhost: relax used address alignment

virtio 1.0 only requires used address to be 4 byte aligned,
vhost required 8 bytes (size of vring_used_elem).
Fix up vhost to match that.

Additionally, while vhost correctly requires 8 byte
alignment for log, it's unconnected to used ring:
it's a consequence that log has u64 entries.
Tweak code to make that clearer.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio_ring: document alignment requirements

Host needs to know vring element alignment requirements:
simply doing alignof on structures doesn't work reliably: on some
platforms gcc has alignof(uint32_t) == 2.

Add macros for alignment as specified in virtio 1.0 cs01,
export them to userspace as well.

Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

video/logo: prevent use of logos after they have been freed

If the probe of an fb driver has been deferred due to missing
dependencies, and the probe is later ran when a module is loaded, the
fbdev framework will try to find a logo to use.

However, the logos are __initdata, and have already been freed. This
causes sometimes page faults, if the logo memory is not mapped,
sometimes other random crashes as the logo data is invalid, and
sometimes nothing, if the fbdev decides to reject the logo (e.g. the
random value depicting the logo's height is too big).

This patch adds a late_initcall function to mark the logos as freed. In
reality the logos are freed later, and fbdev probe may be ran between
this late_initcall and the freeing of the logos. In that case we will
miss drawing the logo, even if it would be possible.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: stable@vger.kernel.org

OMAPDSS: pll: NULL dereference in error handling

The regulator_disable() doesn't accept NULL pointers.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>

OMAPDSS: HDMI: remove double initializer entries

HDMI hardware parameters structs for OMAP4 and OMAP5 contained two
initializers for 'clkdco_max'. The first one was a remnant with wrong
value.

Remove the extra initializer entries.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>

Revert "powerpc: Secondary CPUs must set cpu_callin_map after setting active and online"

This reverts commit 7c5c92ed56d932b2c19c3f8aea86369509407d33.

Although this did fix the bug it was aimed at, it also broke secondary
startup on platforms that use give/take_timebase(). Unfortunately we
didn't detect that while it was in next.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/kdump: Ignore failure in enabling big endian exception during crash

In LE kernel, we currently have a hack for kexec that resets the exception
endian before starting a new kernel as the kernel that is loaded could be a
big endian or a little endian kernel. In kdump case, resetting exception
endian fails when one or more cpus is disabled. But we can ignore the failure
and still go ahead, as in most cases crashkernel will be of same endianess
as primary kernel and reseting endianess is not even needed in those cases.
This patch adds a new inline function to say if this is kdump path. This
function is used at places where such a check is needed.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
[mpe: Rename to kdump_in_progress(), use bool, and edit comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc: Wire up sys_execveat() syscall

Wire up sys_execveat(). This passes the selftests for the system call.

Check success of execveat(3, '../execveat', 0)... [OK]
Check success of execveat(5, 'execveat', 0)... [OK]
Check success of execveat(6, 'execveat', 0)... [OK]
Check success of execveat(-100, '/home/pranith/linux/...ftests/exec/execveat', 0)... [OK]
Check success of execveat(99, '/home/pranith/linux/...ftests/exec/execveat', 0)... [OK]
Check success of execveat(8, '', 4096)... [OK]
Check success of execveat(17, '', 4096)... [OK]
Check success of execveat(9, '', 4096)... [OK]
Check success of execveat(14, '', 4096)... [OK]
Check success of execveat(14, '', 4096)... [OK]
Check success of execveat(15, '', 4096)... [OK]
Check failure of execveat(8, '', 0) with ENOENT... [OK]
Check failure of execveat(8, '(null)', 4096) with EFAULT... [OK]
Check success of execveat(5, 'execveat.symlink', 0)... [OK]
Check success of execveat(6, 'execveat.symlink', 0)... [OK]
Check success of execveat(-100, '/home/pranith/linux/...xec/execveat.symlink', 0)... [OK]
Check success of execveat(10, '', 4096)... [OK]
Check success of execveat(10, '', 4352)... [OK]
Check failure of execveat(5, 'execveat.symlink', 256) with ELOOP... [OK]
Check failure of execveat(6, 'execveat.symlink', 256) with ELOOP... [OK]
Check failure of execveat(-100, '/home/pranith/linux/tools/testing/selftests/exec/execveat.symlink', 256) with ELOOP... [OK]
Check success of execveat(3, '../script', 0)... [OK]
Check success of execveat(5, 'script', 0)... [OK]
Check success of execveat(6, 'script', 0)... [OK]
Check success of execveat(-100, '/home/pranith/linux/...elftests/exec/script', 0)... [OK]
Check success of execveat(13, '', 4096)... [OK]
Check success of execveat(13, '', 4352)... [OK]
Check failure of execveat(18, '', 4096) with ENOENT... [OK]
Check failure of execveat(7, 'script', 0) with ENOENT... [OK]
Check success of execveat(16, '', 4096)... [OK]
Check success of execveat(16, '', 4096)... [OK]
Check success of execveat(4, '../script', 0)... [OK]
Check success of execveat(4, 'script', 0)... [OK]
Check success of execveat(4, '../script', 0)... [OK]
Check failure of execveat(4, 'script', 0) with ENOENT... [OK]
Check failure of execveat(5, 'execveat', 65535) with EINVAL... [OK]
Check failure of execveat(5, 'no-such-file', 0) with ENOENT... [OK]
Check failure of execveat(6, 'no-such-file', 0) with ENOENT... [OK]
Check failure of execveat(-100, 'no-such-file', 0) with ENOENT... [OK]
Check failure of execveat(5, '', 4096) with EACCES... [OK]
Check failure of execveat(5, 'Makefile', 0) with EACCES... [OK]
Check failure of execveat(11, '', 4096) with EACCES... [OK]
Check failure of execveat(12, '', 4096) with EACCES... [OK]
Check failure of execveat(99, '', 4096) with EBADF... [OK]
Check failure of execveat(99, 'execveat', 0) with EBADF... [OK]
Check failure of execveat(8, 'execveat', 0) with ENOTDIR... [OK]
Invoke copy of 'execveat' via filename of length 4093:
Check success of execveat(19, '', 4096)... [OK]
Check success of execveat(5, 'xxxxxxxxxxxxxxxxxxxx...yyyyyyyyyyyyyyyyyyyy', 0)... [OK]
Invoke copy of 'script' via filename of length 4093:
Check success of execveat(20, '', 4096)... [OK]
/bin/sh: 0: Can't open /dev/fd/5/xxxxxxx(... a long line of x's and y's, 0)... [OK]
Check success of execveat(5, 'xxxxxxxxxxxxxxxxxxxx...yyyyyyyyyyyyyyyyyyyy', 0)... [OK]

Tested on a 32-bit powerpc system.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

Linux 3.19-rc2

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
"The important fixes are for two bugs introduced by the merge window.

  On top of this, add a couple of WARN_ONs and stop spamming dmesg on
  pretty much every boot of a virtual machine"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: warn on more invariant breakage
  kvm: fix sorting of memslots with base_gfn == 0
  kvm: x86: drop severity of "generation wraparound" message
  kvm: x86: vmx: reorder some msr writing

Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs fix from Al Viro:
"An embarrassing bug in lustre patches from this cycle ;-/"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
[regression] braino in "lustre: use is_root_inode()"

kvm: warn on more invariant breakage

Modifying a non-existent slot is not allowed. Also check that the
first loop doesn't move a deleted slot beyond the used part of
the mslots array.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>