Michael Neuling [Mon, 1 Jul 2013 04:19:50 +0000 (14:19 +1000)]
powerpc/hw_brk: Fix off by one error when validating DAWR region end
The Data Address Watchpoint Register (DAWR) on POWER8 can take a 512
byte range but this range must not cross a 512 byte boundary.
Unfortunately we were off by one when calculating the end of the region,
hence we were not allowing some breakpoint regions which were actually
valid. This fixes this error.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: Edjunior Barbosa Machado <emachado@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 3.9+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aruna Balakrishnaiah [Thu, 27 Jun 2013 08:33:13 +0000 (14:03 +0530)]
powerpc/pseries: Support compression of oops text via pstore
The patch set supports compression of oops messages while writing to NVRAM,
this helps in capturing more of oops data to lnx,oops-log. The pstore file
for oops messages will be in decompressed format making it readable.
In case compression fails, the patch takes care of copying the header added
by pstore and last oops_data_sz bytes of big_oops_buf to NVRAM so that we
have recent oops messages in lnx,oops-log.
In case decompression fails, it will result in absence of oops file but still
have files (in /dev/pstore) for other partitions.
Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aruna Balakrishnaiah [Thu, 27 Jun 2013 08:33:04 +0000 (14:03 +0530)]
powerpc/pseries: Re-organise the oops compression code
nvram_compress() and zip_oops() is used by the nvram_pstore_write
API to compress oops messages hence re-organise the functions
accordingly to avoid forward declarations.
Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aruna Balakrishnaiah [Thu, 27 Jun 2013 08:32:56 +0000 (14:02 +0530)]
pstore: Pass header size in the pstore write callback
Header size is needed to distinguish between header and the dump data.
Incorporate the addition of new argument (hsize) in the pstore write
callback.
Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 1 Jul 2013 07:54:09 +0000 (17:54 +1000)]
powerpc/powernv: Fix iommu initialization again
So because those things always end up in trainwrecks... In
7846de406
we moved back the iommu initialization earlier, essentially undoing
37f02195b which was causing us endless trouble... except that in the
meantime we had merged
959c9bdd58 (to workaround the original breakage)
which is now ... broken :-)
This fixes it by doing a partial revert of the latter (we keep the
ppc_md. path which will be needed in the hotplug case, which happens
also during some EEH error recovery situations).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.10]
Benjamin Herrenschmidt [Mon, 1 Jul 2013 07:57:25 +0000 (17:57 +1000)]
Merge tag 'v3.10' into next
Merge 3.10 in order to get some of the last minute powerpc
changes, resolve conflicts and add additional fixes on top
of them.
Michael Ellerman [Fri, 28 Jun 2013 08:15:18 +0000 (18:15 +1000)]
powerpc/pseries: Inform the hypervisor we are using EBB regs
On LPAR systems we need to inform the hypervisor that we are using the
EBB registers. We do this by setting a bit in the Virtual Processor Area
(VPA) - formerly known as the lppaca.
For now we do this always, ie. we do not dynamically enable/disable.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Fri, 28 Jun 2013 08:15:17 +0000 (18:15 +1000)]
powerpc/perf: Add power8 EBB support
Add logic to the power8 PMU code to support EBB. Future processors would
also be expected to implement similar constraints. At that time we could
possibly factor these out into common code.
Finally mark the power8 PMU as supporting EBB, which is the actual
enable switch which allows EBBs to be configured.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Fri, 28 Jun 2013 08:15:16 +0000 (18:15 +1000)]
powerpc/perf: Core EBB support for 64-bit book3s
Add support for EBB (Event Based Branches) on 64-bit book3s. See the
included documentation for more details.
EBBs are a feature which allows the hardware to branch directly to a
specified user space address when a PMU event overflows. This can be
used by programs for self-monitoring with no kernel involvement in the
inner loop.
Most of the logic is in the generic book3s code, primarily to avoid a
proliferation of PMU callbacks.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Fri, 28 Jun 2013 08:15:15 +0000 (18:15 +1000)]
powerpc/perf: Drop MMCRA from thread_struct
In commit
59affcd "Context switch more PMU related SPRs" I added more
PMU SPRs to thread_struct, later modified in commit
b11ae95. To add
insult to injury it turns out we don't need to switch MMCRA as it's
only user readable, and the value is recomputed by the PMU code.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Fri, 28 Jun 2013 08:15:14 +0000 (18:15 +1000)]
powerpc/perf: Don't enable if we have zero events
In power_pmu_enable() we still enable the PMU even if we have zero
events. This should have no effect but doesn't make much sense. Instead
just return after telling the hypervisor that we are not using the PMCs.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Fri, 28 Jun 2013 08:15:13 +0000 (18:15 +1000)]
powerpc/perf: Use existing out label in power_pmu_enable()
In power_pmu_enable() we can use the existing out label to reduce the
number of return paths.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Fri, 28 Jun 2013 08:15:12 +0000 (18:15 +1000)]
powerpc/perf: Freeze PMC5/6 if we're not using them
On Power8 we can freeze PMC5 and 6 if we're not using them. Normally they
run all the time.
As noticed by Anshuman, we should unfreeze them when we disable the PMU
as there are legacy tools which expect them to run all the time.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Fri, 28 Jun 2013 08:15:11 +0000 (18:15 +1000)]
powerpc/perf: Rework disable logic in pmu_disable()
In pmu_disable() we disable the PMU by setting the FC (Freeze Counters)
bit in MMCR0. In order to do this we have to read/modify/write MMCR0.
It's possible that we read a value from MMCR0 which has PMAO (PMU Alert
Occurred) set. When we write that value back it will cause an interrupt
to occur. We will then end up in the PMU interrupt handler even though
we are supposed to have just disabled the PMU.
We can avoid this by making sure we never write PMAO back. We should not
lose interrupts because when the PMU is re-enabled the overflowed values
will cause another interrupt.
We also reorder the clearing of SAMPLE_ENABLE so that is done after the
PMU is frozen. Otherwise there is a small window between the clearing of
SAMPLE_ENABLE and the setting of FC where we could take an interrupt and
incorrectly see SAMPLE_ENABLE not set. This would for example change the
logic in perf_read_regs().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Fri, 28 Jun 2013 08:15:10 +0000 (18:15 +1000)]
powerpc/perf: Check that events only include valid bits on Power8
A mistake we have made in the past is that we pull out the fields we
need from the event code, but don't check that there are no unknown bits
set. This means that we can't ever assign meaning to those unknown bits
in future.
Although we have once again failed to do this at release, it is still
early days for Power8 so I think we can still slip this in and get away
with it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Tue, 25 Jun 2013 07:47:57 +0000 (17:47 +1000)]
powerpc: Wire up the HV facility unavailable exception
Similar to the facility unavailble exception, except the facilities are
controlled by HFSCR.
Adapt the facility_unavailable_exception() so it can be called for
either the regular or Hypervisor facility unavailable exceptions.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Tue, 25 Jun 2013 07:47:56 +0000 (17:47 +1000)]
powerpc: Rename and flesh out the facility unavailable exception handler
The exception at 0xf60 is not the TM (Transactional Memory) unavailable
exception, it is the "Facility Unavailable Exception", rename it as
such.
Flesh out the handler to acknowledge the fact that it can be called for
many reasons, one of which is TM being unavailable.
Use STD_EXCEPTION_COMMON() for the exception body, for some reason we
had it open-coded, I've checked the generated code is identical.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Tue, 25 Jun 2013 07:47:55 +0000 (17:47 +1000)]
powerpc: Remove KVMTEST from RELON exception handlers
KVMTEST is a macro which checks whether we are taking an exception from
guest context, if so we branch out of line and eventually call into the
KVM code to handle the switch.
When running real guests on bare metal (HV KVM) the hardware ensures
that we never take a relocation on exception when transitioning from
guest to host. For PR KVM we disable relocation on exceptions ourself in
kvmppc_core_init_vm(), as of commit
a413f47 "Disable relocation on
exceptions whenever PR KVM is active".
So convert all the RELON macros to use NOTEST, and drop the remaining
KVM_HANDLER() definitions we have for 0xe40 and 0xe80.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.9+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Tue, 25 Jun 2013 07:47:54 +0000 (17:47 +1000)]
powerpc: Remove unreachable relocation on exception handlers
We have relocation on exception handlers defined for h_data_storage and
h_instr_storage. However we will never take relocation on exceptions for
these because they can only come from a guest, and we never take
relocation on exceptions when we transition from guest to host.
We also have a handler for hmi_exception (Hypervisor Maintenance) which
is defined in the architecture to never be delivered with relocation on,
see see v2.07 Book III-S section 6.5.
So remove the handlers, leaving a branch to self just to be double extra
paranoid.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.9+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nathan Fontenot [Tue, 25 Jun 2013 03:08:05 +0000 (22:08 -0500)]
powerpc/numa: Do not update sysfs cpu registration from invalid context
The topology update code that updates the cpu node registration in sysfs
should not be called while in stop_machine(). The register/unregister
calls take a lock and may sleep.
This patch moves these calls outside of the call to stop_machine().
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Fri, 28 Jun 2013 13:12:14 +0000 (21:12 +0800)]
powerpc/eeh: Update MAINTAINERS
Update MAINTAINERS to reflect recent changes.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Chen Gang [Wed, 20 Mar 2013 06:30:12 +0000 (14:30 +0800)]
powerpc/smp: Section mismatch from smp_release_cpus to __initdata spinning_secondaries
the smp_release_cpus is a normal funciton and called in normal environments,
but it calls the __initdata spinning_secondaries.
need modify spinning_secondaries to match smp_release_cpus.
the related warning:
(the linker report boot_paca.33377, but it should be spinning_secondaries)
-----------------------------------------------------------------------------
WARNING: arch/powerpc/kernel/built-in.o(.text+0x23176): Section mismatch in reference from the function .smp_release_cpus() to the variable .init.data:boot_paca.33377
The function .smp_release_cpus() references
the variable __initdata boot_paca.33377.
This is often because .smp_release_cpus lacks a __initdata
annotation or the annotation of boot_paca.33377 is wrong.
WARNING: arch/powerpc/kernel/built-in.o(.text+0x231fe): Section mismatch in reference from the function .smp_release_cpus() to the variable .init.data:boot_paca.33377
The function .smp_release_cpus() references
the variable __initdata boot_paca.33377.
This is often because .smp_release_cpus lacks a __initdata
annotation or the annotation of boot_paca.33377 is wrong.
-----------------------------------------------------------------------------
Signed-off-by: Chen Gang <gang.chen@asianux.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Wolfram Sang [Tue, 21 May 2013 18:45:40 +0000 (20:45 +0200)]
macintosh/windfarm: Remove obsolete cleanup for clientdata
A few new i2c-drivers came into the kernel which clear the clientdata-pointer
on exit or error. This is obsolete meanwhile, the core will do it.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Chen Gang [Tue, 21 May 2013 09:20:50 +0000 (17:20 +0800)]
powerpc/nvram64: Need return the related error code on failure occurs
When error occurs, need return the related error code to let upper
caller know about it.
ppc_md.nvram_size() can return the error code (e.g. core99_nvram_size()
in 'arch/powerpc/platforms/powermac/nvram.c').
Also set ret value when only need it, so can save structions for normal
cases.
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Li Zhong [Thu, 16 May 2013 10:20:26 +0000 (18:20 +0800)]
powerpc: Set cpu sibling mask before online cpu
It seems following race is possible:
cpu0 cpux
smp_init->cpu_up->_cpu_up
__cpu_up
kick_cpu(1)
-------------------------------------------------------------------------
waiting online ...
... notify CPU_STARTING
set cpux active
set cpux online
-------------------------------------------------------------------------
finish waiting online
...
sched_init_smp
init_sched_domains(cpu_active_mask)
build_sched_domains
set cpux sibling info
-------------------------------------------------------------------------
Execution of cpu0 and cpux could be concurrent between two separator
lines.
So if the cpux sibling information was set too late (normally
impossible, but could be triggered by adding some delay in
start_secondary, after setting cpu online), build_sched_domains()
running on cpu0 might see cpux active, with an empty sibling mask, then
cause some bad address accessing like following:
[ 0.099855] Unable to handle kernel paging request for data at address 0xc00000038518078f
[ 0.099868] Faulting instruction address: 0xc0000000000b7a64
[ 0.099883] Oops: Kernel access of bad area, sig: 11 [#1]
[ 0.099895] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
[ 0.099922] Modules linked in:
[ 0.099940] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
3.10.0-rc1-00120-gb973425-dirty #16
[ 0.099956] task:
c0000001fed80000 ti:
c0000001fed7c000 task.ti:
c0000001fed7c000
[ 0.099971] NIP:
c0000000000b7a64 LR:
c0000000000b7a40 CTR:
c0000000000b4934
[ 0.099985] REGS:
c0000001fed7f760 TRAP: 0300 Not tainted (
3.10.0-rc1-00120-gb973425-dirty)
[ 0.099997] MSR:
8000000000009032 <SF,EE,ME,IR,DR,RI> CR:
24272828 XER:
20000003
[ 0.100045] SOFTE: 1
[ 0.100053] CFAR:
c000000000445ee8
[ 0.100064] DAR:
c00000038518078f, DSISR:
40000000
[ 0.100073]
GPR00:
0000000000000080 c0000001fed7f9e0 c000000000c84d48 0000000000000010
GPR04:
0000000000000010 0000000000000000 c0000001fc55e090 0000000000000000
GPR08:
ffffffffffffffff c000000000b80b30 c000000000c962d8 00000003845ffc5f
GPR12:
0000000000000000 c00000000f33d000 c00000000000b9e4 0000000000000000
GPR16:
0000000000000000 0000000000000000 0000000000000001 0000000000000000
GPR20:
c000000000ccf750 0000000000000000 c000000000c94d48 c0000001fc504000
GPR24:
c0000001fc504000 c0000001fecef848 c000000000c94d48 c000000000ccf000
GPR28:
c0000001fc522090 0000000000000010 c0000001fecef848 c0000001fed7fae0
[ 0.100293] NIP [
c0000000000b7a64] .get_group+0x84/0xc4
[ 0.100307] LR [
c0000000000b7a40] .get_group+0x60/0xc4
[ 0.100318] Call Trace:
[ 0.100332] [
c0000001fed7f9e0] [
c0000000000dbce4] .lock_is_held+0xa8/0xd0 (unreliable)
[ 0.100354] [
c0000001fed7fa70] [
c0000000000bf62c] .build_sched_domains+0x728/0xd14
[ 0.100375] [
c0000001fed7fbe0] [
c000000000af67bc] .sched_init_smp+0x4fc/0x654
[ 0.100394] [
c0000001fed7fce0] [
c000000000adce24] .kernel_init_freeable+0x17c/0x30c
[ 0.100413] [
c0000001fed7fdb0] [
c00000000000ba08] .kernel_init+0x24/0x12c
[ 0.100431] [
c0000001fed7fe30] [
c000000000009f74] .ret_from_kernel_thread+0x5c/0x68
[ 0.100445] Instruction dump:
[ 0.100456]
38800010 38a00000 4838e3f5 60000000 7c6307b4 2fbf0000 419e0040 3d220001
[ 0.100496]
78601f24 39491590 e93e0008 7d6a002a <
7d69582a>
f97f0000 7d4a002a e93e0010
[ 0.100559] ---[ end trace
31fd0ba7d8756001 ]---
This patch tries to move the sibling maps updating before
notify_cpu_starting() and cpu online, and a write barrier there to make
sure sibling maps are updated before active and online mask.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Geert Uytterhoeven [Sun, 30 Jun 2013 10:03:23 +0000 (12:03 +0200)]
mac: Make cuda_init_via() __init
cuda_init_via() is called from find_via_cuda() only, which is __init.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Paul Gortmaker [Mon, 24 Jun 2013 19:30:09 +0000 (15:30 -0400)]
powerpc: Delete __cpuinit usage from all users
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit
5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
This removes all the powerpc uses of the __cpuinit macros. There
are no __CPUINIT users in assembly files in powerpc.
[1] https://lkml.org/lkml/2013/5/20/589
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Joe Perches [Fri, 14 Jun 2013 02:37:37 +0000 (19:37 -0700)]
macintosh: Convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Joe Perches [Fri, 14 Jun 2013 02:37:30 +0000 (19:37 -0700)]
powerpc/idle: Convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Bjorn Helgaas [Tue, 11 Jun 2013 19:57:05 +0000 (13:57 -0600)]
powerpc/iommu: Remove unused pci_iommu_init() and pci_direct_iommu_init()
pci_iommu_init() and pci_direct_iommu_init() are not referenced anywhere,
so remove them.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kevin Hao [Thu, 27 Jun 2013 01:09:43 +0000 (09:09 +0800)]
powerpc: Don't flush/invalidate the d/icache for an unknown relocation type
For an unknown relocation type since the value of r4 is just the 8bit
relocation type, the sum of r4 and r7 may yield an invalid memory
address. For example:
In normal case:
r4 = c00xxxxx
r7 =
40000000
r4 + r7 = 000xxxxx
For an unknown relocation type:
r4 = 000000xx
r7 =
40000000
r4 + r7 = 400000xx
400000xx is an invalid memory address for a board which has just
512M memory.
And for operations such as dcbst or icbi may cause bus error for an
invalid memory address on some platforms and then cause the board
reset. So we should skip the flush/invalidate the d/icache for
an unknown relocation type.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Acked-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aaro Koskinen [Sun, 30 Jun 2013 19:00:42 +0000 (22:00 +0300)]
powerpc/windfarm: Fix overtemperature clearing
With pm81/pm91/pm121, when the overtemperature state is entered, and
when it remains on after skipped ticks, the driver will try to leave
it too soon (immediately on the next tick). This is because the active
FAILURE_OVERTEMP state is not visible in "new_failure" variable of the
current tick. Furthermore, the driver will keep trying to clear condition
in subsequent ticks as FAILURE_OVERTEMP remains set in the "last_failure"
variable. These will start to trigger WARNINGS from windfarm core:
[ 100.082735] windfarm: Clamping CPU frequency to minimum !
[ 100.108132] windfarm: Overtemp condition detected !
[ 101.952908] windfarm: Overtemp condition cleared !
[...]
[ 102.980388] WARNING: at drivers/macintosh/windfarm_core.c:463
[...]
[ 103.982227] WARNING: at drivers/macintosh/windfarm_core.c:463
[...]
[ 105.030494] WARNING: at drivers/macintosh/windfarm_core.c:463
[...]
[ 105.973666] WARNING: at drivers/macintosh/windfarm_core.c:463
[...]
[ 106.977913] WARNING: at drivers/macintosh/windfarm_core.c:463
Fix by adding a helper global variable. We leave the overtemp state only
after all failure bits have been cleared.
I saw this error on iMac G5 iSight (pm121). Also pm81/pm91 are fixed
based on the observation that these are almost identical/copy-pasted code.
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Thu, 27 Jun 2013 05:46:48 +0000 (13:46 +0800)]
powerpc/powernv: Use dev-node in PCI config accessors
Currently, we're using the combo (PCI bus + devfn) in the PCI
config accessors and PCI config accessors in EEH depends on them.
However, it's not safe to refer the PCI bus which might have been
removed during hotplug. So we're using device node in the PCI
config accessors and the corresponding backends just reuse them.
The patch also fix one potential risk: We possiblly have frozen
PE during the early PCI probe time, but we haven't setup the PE
mapping yet. So the errors should be counted to PE#0.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Thu, 27 Jun 2013 05:46:47 +0000 (13:46 +0800)]
powerpc/eeh: Avoid build warnings
The patch is for avoiding following build warnings:
The function .pnv_pci_ioda_fixup() references
the function __init .eeh_init().
This is often because .pnv_pci_ioda_fixup lacks a __init
The function .pnv_pci_ioda_fixup() references
the function __init .eeh_addr_cache_build().
This is often because .pnv_pci_ioda_fixup lacks a __init
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Thu, 27 Jun 2013 05:46:46 +0000 (13:46 +0800)]
powerpc/eeh: Refactor the output message
We needn't the the whole backtrace other than one-line message in
the error reporting interrupt handler. For errors triggered by
access PCI config space or MMIO, we replace "WARN(1, ...)" with
pr_err() and dump_stack(). The patch also adds more output messages
to indicate what EEH core is doing. Besides, some printk() are
replaced with pr_warning().
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Thu, 27 Jun 2013 05:46:45 +0000 (13:46 +0800)]
powerpc/eeh: Fix address catch for PowerNV
On the PowerNV platform, the EEH address cache isn't built correctly
because we skipped the EEH devices without binding PE. The patch
fixes that.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Thu, 27 Jun 2013 05:46:44 +0000 (13:46 +0800)]
powerpc/powernv: Replace variables with flags
We have 2 fields in "struct pnv_phb" to trace the states. The patch
replace the fields with one and introduces flags for that. The patch
doesn't impact the logic.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Thu, 27 Jun 2013 05:46:43 +0000 (13:46 +0800)]
powerpc/eeh: Check PCIe link after reset
After reset (e.g. complete reset) in order to bring the fenced PHB
back, the PCIe link might not be ready yet. The patch intends to
make sure the PCIe link is ready before accessing its subordinate
PCI devices. The patch also fixes that wrong values restored to
PCI_COMMAND register for PCI bridges.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Thu, 27 Jun 2013 05:46:42 +0000 (13:46 +0800)]
powerpc/eeh: Don't collect PCI-CFG data on PHB
When the PHB is fenced or dead, it's pointless to collect the data
from PCI config space of subordinate PCI devices since it should
return 0xFF's. The patch also fixes overwritten buffer while getting
PCI config data.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Sun, 30 Jun 2013 22:13:29 +0000 (15:13 -0700)]
Linux 3.10
Linus Torvalds [Sun, 30 Jun 2013 22:08:15 +0000 (15:08 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
Pull another powerpc fix from Benjamin Herrenschmidt:
"I mentioned that while we had fixed the kernel crashes, EEH error
recovery didn't always recover... It appears that I had a fix for
that already in powerpc-next (with a stable CC).
I cherry-picked it today and did a few tests and it seems that things
now work quite well. The patch is also pretty simple, so I see no
reason to wait before merging it."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/eeh: Fix fetching bus for single-dev-PE
Linus Torvalds [Sun, 30 Jun 2013 22:06:25 +0000 (15:06 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of seven bug fixes. Several fcoe fixes for locking
problems, initiator issues and a VLAN API change, all of which could
eventually lead to data corruption, one fix for a qla2xxx locking
problem which could lead to multiple completions of the same request
(and subsequent data corruption) and a use after free in the ipr
driver. Plus one minor MAINTAINERS file update"
(only six bugfixes in this pull, since I had already pulled the fcoe API
fix directly from Robert Love)
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] ipr: Avoid target_destroy accessing memory after it was freed
[SCSI] qla2xxx: Fix for locking issue between driver ISR and mailbox routines
MAINTAINERS: Fix fcoe mailing list
libfc: extend ex_lock to protect all of fc_seq_send
libfc: Correct check for initiator role
libfcoe: Fix Conflicting FCFs issue in the fabric
Michael Neuling [Fri, 28 Jun 2013 08:17:09 +0000 (18:17 +1000)]
powerpc/tm: Clear MSR RI in non-recoverable TM code
When we treclaim and trecheckpoint there's an unavoidable period when r1
will not be a valid kernel stack pointer.
This patch clears the MSR recoverable interrupt (RI) bit over these
regions to indicate we have an invalid kernel stack pointer.
For treclaim, the region over which we clear MSR RI is larger than
required to avoid the need for an extra costly mtmsrd.
Thanks to Paulus for suggesting this change.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
James Yang [Tue, 25 Jun 2013 16:41:05 +0000 (11:41 -0500)]
powerpc: Fix string instr. emulation for 32-bit processes on ppc64
String instruction emulation would erroneously result in a segfault if
the upper bits of the EA are set and is so high that it fails access
check. Truncate the EA to 32 bits if the process is 32-bit.
Signed-off-by: James Yang <James.Yang@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Sebastien Bessiere [Fri, 14 Jun 2013 15:57:03 +0000 (17:57 +0200)]
trivial: powerpc: Fix typo in ioei_interrupt() description
Signed-off-by: Sebastien Bessiere <sebastien.bessiere@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Wed, 5 Jun 2013 07:34:02 +0000 (15:34 +0800)]
powerpc/eeh: Fix fetching bus for single-dev-PE
While running Linux as guest on top of phyp, we possiblly have
PE that includes single PCI device. However, we didn't return
its PCI bus correctly and it leads to failure on recovery from
EEH errors for single-dev-PE. The patch fixes the issue.
Cc: <stable@vger.kernel.org> # v3.7+
Cc: Steve Best <sbest@us.ibm.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Sun, 30 Jun 2013 00:02:48 +0000 (17:02 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"We discovered some breakage in our "EEH" (PCI Error Handling) code
while doing error injection, due to a couple of regressions. One of
them is due to a patch (
37f02195bee9 "powerpc/pci: fix PCI-e devices
rescan issue on powerpc platform") that, in hindsight, I shouldn't
have merged considering that it caused more problems than it solved.
Please pull those two fixes. One for a simple EEH address cache
initialization issue. The other one is a patch from Guenter that I
had originally planned to put in 3.11 but which happens to also fix
that other regression (a kernel oops during EEH error handling and
possibly hotplug).
With those two, the couple of test machines I've hammered with error
injection are remaining up now. EEH appears to still fail to recover
on some devices, so there is another problem that Gavin is looking
into but at least it's no longer crashing the kernel."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/pci: Improve device hotplug initialization
powerpc/eeh: Add eeh_dev to the cache during boot
Olof Johansson [Sat, 29 Jun 2013 23:25:14 +0000 (16:25 -0700)]
ARM: dt: Only print warning, not WARN() on bad cpu map in device tree
Due to recent changes and expecations of proper cpu bindings, there are
now cases for many of the in-tree devicetrees where a WARN() will hit
on boot due to badly formatted /cpus nodes.
Downgrade this to a pr_warn() to be less alarmist, since it's not a
new problem.
Tested on Arndale, Cubox, Seaboard and Panda ES. Panda hits the WARN
without this, the others do not.
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Guenter Roeck [Mon, 10 Jun 2013 17:18:08 +0000 (10:18 -0700)]
powerpc/pci: Improve device hotplug initialization
Commit
37f02195b (powerpc/pci: fix PCI-e devices rescan issue on powerpc
platform) fixes a problem with interrupt and DMA initialization on hot
plugged devices. With this commit, interrupt and DMA initialization for
hot plugged devices is handled in the pci device enable function.
This approach has a couple of drawbacks. First, it creates two code paths
for device initialization, one for hot plugged devices and another for devices
known during the initial PCI scan. Second, the initialization code for hot
plugged devices is only called when the device is enabled, ie typically
in the probe function. Also, the platform specific setup code is called each
time pci_enable_device() is called, not only once during device discovery,
meaning it is actually called multiple times, once for devices discovered
during the initial scan and again each time a driver is re-loaded.
The visible result is that interrupt pins are only assigned to hot plugged
devices when the device driver is loaded. Effectively this changes the PCI
probe API, since pci_dev->irq and the device's dma configuration will now
only be valid after pci_enable() was called at least once. A more subtle
change is that platform specific PCI device setup is moved from device
discovery into the driver's probe function, more specifically into the
pci_enable_device() call.
To fix the inconsistencies, add new function pcibios_add_device.
Call pcibios_setup_device from pcibios_setup_bus_devices if device setup
is not complete, and from pcibios_add_device if bus setup is complete.
With this change, device setup code is moved back into device initialization,
and called exactly once for both static and hot plugged devices.
[ This also fixes a regression introduced by the above patch which
causes dev->irq to be overwritten under some cirumstances after
MSIs have been enabled for the device which leads to crashes due
to the MSI core "hijacking" dev->irq to store the base MSI number
and not the LSI. --BenH
]
Cc: Yuanquan Chen <Yuanquan.Chen@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hiroo Matsumoto <matsumoto.hiroo@jp.fujitsu.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Sat, 29 Jun 2013 18:34:18 +0000 (11:34 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a crash in the crypto layer exposed by an SCTP test tool"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: algboss - Hold ref count on larval
Linus Torvalds [Sat, 29 Jun 2013 18:32:05 +0000 (11:32 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm/qxl fix from Dave Airlie:
"Bad me forgot an access check, possible security issue, but since this
is the first kernel with it, should be fine to just put it in now"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/qxl: add missing access check for execbuffer ioctl
Mathieu Desnoyers [Fri, 28 Jun 2013 13:49:46 +0000 (09:49 -0400)]
Fix: kernel/ptrace.c: ptrace_peek_siginfo() missing __put_user() validation
This __put_user() could be used by unprivileged processes to write into
kernel memory. The issue here is that even if copy_siginfo_to_user()
fails, the error code is not checked before __put_user() is executed.
Luckily, ptrace_peek_siginfo() has been added within the 3.10-rc cycle,
so it has not hit a stable release yet.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 29 Jun 2013 17:31:15 +0000 (10:31 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph fix from Sage Weil:
"This is a recently spotted regression in the snapshot behavior...
It turns out several tests weren't being run in the nightlies so this
took a while to spot"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: send snapshot context with writes
Linus Torvalds [Sat, 29 Jun 2013 17:30:31 +0000 (10:30 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull ubifs fixes from Al Viro:
"A couple of ubifs readdir/lseek race fixes. Stable fodder, really
nasty..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
UBIFS: fix a horrid bug
UBIFS: prepare to fix a horrid bug
Linus Torvalds [Sat, 29 Jun 2013 17:28:52 +0000 (10:28 -0700)]
Merge tag 'for-linus-
20130628' of git://git./linux/kernel/git/dhowells/linux-mn10300
Pull two MN10300 fixes from David Howells:
"The first fixes a problem with passing arrays rather than pointers to
get_user() where __typeof__ then wants to declare and initialise an
array variable which gcc doesn't like.
The second fixes a problem whereby putting mem=xxx into the kernel
command line causes init=xxx to get an incorrect value."
* tag 'for-linus-
20130628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300:
mn10300: Use early_param() to parse "mem=" parameter
mn10300: Allow to pass array name to get_user()
Linus Torvalds [Sat, 29 Jun 2013 17:27:19 +0000 (10:27 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"Correct an ordering issue in the tick broadcast code. I really wish
we'd get compensation for pain and suffering for each line of code we
write to work around dysfunctional timer hardware."
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick: Fix tick_broadcast_pending_mask not cleared
Linus Torvalds [Sat, 29 Jun 2013 17:26:50 +0000 (10:26 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fix from Ingo Molnar:
"One more fix for a recently discovered bug"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Disable monitoring on setuid processes for regular users
Artem Bityutskiy [Fri, 28 Jun 2013 11:15:15 +0000 (14:15 +0300)]
UBIFS: fix a horrid bug
Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no
mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are
in the middle of 'ubifs_readdir()'.
This means that 'file->private_data' can be freed while 'ubifs_readdir()' uses
it, and this is a very bad bug: not only 'ubifs_readdir()' can return garbage,
but this may corrupt memory and lead to all kinds of problems like crashes an
security holes.
This patch fixes the problem by using the 'file->f_version' field, which
'->llseek()' always unconditionally sets to zero. We set it to 1 in
'ubifs_readdir()' and whenever we detect that it became 0, we know there was a
seek and it is time to clear the state saved in 'file->private_data'.
I tested this patch by writing a user-space program which runds readdir and
seek in parallell. I could easily crash the kernel without these patches, but
could not crash it with these patches.
Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Artem Bityutskiy [Fri, 28 Jun 2013 11:15:14 +0000 (14:15 +0300)]
UBIFS: prepare to fix a horrid bug
Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no
mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are
in the middle of 'ubifs_readdir()'.
First of all, this means that 'file->private_data' can be freed while
'ubifs_readdir()' uses it. But this particular patch does not fix the problem.
This patch is only a preparation, and the fix will follow next.
In this patch we make 'ubifs_readdir()' stop using 'file->f_pos' directly,
because 'file->f_pos' can be changed by '->llseek()' at any point. This may
lead 'ubifs_readdir()' to returning inconsistent data: directory entry names
may correspond to incorrect file positions.
So here we introduce a local variable 'pos', read 'file->f_pose' once at very
the beginning, and then stick to 'pos'. The result of this is that when
'ubifs_dir_llseek()' changes 'file->f_pos' while we are in the middle of
'ubifs_readdir()', the latter "wins".
Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Akira Takeuchi [Fri, 28 Jun 2013 15:53:03 +0000 (16:53 +0100)]
mn10300: Use early_param() to parse "mem=" parameter
This fixes the problem that "init=" options may not be passed to kernel
correctly.
parse_mem_cmdline() of mn10300 arch gets rid of "mem=" string from
redboot_command_line. Then init_setup() parses the "init=" options from
static_command_line, which is a copy of redboot_command_line, and keeps
the pointer to the init options in execute_command variable.
Since the commit
026cee0 upstream (params: <level>_initcall-like kernel
parameters), static_command_line becomes overwritten by saved_command_line at
do_initcall_level(). Notice that saved_command_line is a command line
which includes "mem=" string.
As a result, execute_command may point to weird string by the length of
"mem=" parameter.
I noticed this problem when using the command line like this:
mem=128M console=ttyS0,115200 init=/bin/sh
Here is the processing flow of command line parameters.
start_kernel()
setup_arch(&command_line)
parse_mem_cmdline(cmdline_p)
* strcpy(boot_command_line, redboot_command_line);
* Remove "mem=xxx" from redboot_command_line.
* *cmdline_p = redboot_command_line;
setup_command_line(command_line) <-- command_line is redboot_command_line
* strcpy(saved_command_line, boot_command_line)
* strcpy(static_command_line, command_line)
parse_early_param()
strlcpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE);
parse_early_options(tmp_cmdline);
parse_args("early options", cmdline, NULL, 0, 0, 0, do_early_param);
parse_args("Booting ..", static_command_line, ...);
init_setup() <-- save the pointer in execute_command
rest_init()
kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);
At this point, execute_command points to "/bin/sh" string.
kernel_init()
kernel_init_freeable()
do_basic_setup()
do_initcalls()
do_initcall_level()
(*) strcpy(static_command_line, saved_command_line);
Here, execute_command gets to point to "200" string !!
Signed-off-by: David Howells <dhowells@redhat.com>
Akira Takeuchi [Fri, 28 Jun 2013 15:53:01 +0000 (16:53 +0100)]
mn10300: Allow to pass array name to get_user()
This fixes the following compile error:
CC block/scsi_ioctl.o
block/scsi_ioctl.c: In function 'sg_scsi_ioctl':
block/scsi_ioctl.c:449: error: invalid initializer
Signed-off-by: David Howells <dhowells@redhat.com>
Dave Airlie [Fri, 28 Jun 2013 03:27:40 +0000 (13:27 +1000)]
drm/qxl: add missing access check for execbuffer ioctl
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Thadeu Lima de Souza Cascardo [Thu, 27 Jun 2013 21:00:10 +0000 (18:00 -0300)]
powerpc/eeh: Add eeh_dev to the cache during boot
commit
f8f7d63fd96ead101415a1302035137a866f8998 ("powerpc/eeh: Trace eeh
device from I/O cache") broke EEH on pseries for devices that were
present during boot and have not been hotplugged/DLPARed.
eeh_check_failure will get the eeh_dev from the cache, and will get
NULL. eeh_addr_cache_build adds the addresses to the cache, but eeh_dev
for the giving pci_device is not set yet. Just reordering the call to
eeh_addr_cache_insert_dev works fine. The ordering is similar to the one
in eeh_add_device_late.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Josh Durgin [Wed, 26 Jun 2013 19:56:17 +0000 (12:56 -0700)]
rbd: send snapshot context with writes
Sending the right snapshot context with each write is required for
snapshots to work. Due to the ordering of calls, the snapshot context
is never set for any requests. This causes writes to the current
version of the image to be reflected in all snapshots, which are
supposed to be read-only.
This happens because rbd_osd_req_format_write() sets the snapshot
context based on obj_request->img_request. At this point, however,
obj_request->img_request has not been set yet, to the snapshot context
is set to NULL. Fix this by moving rbd_img_obj_request_add(), which
sets obj_request->img_request, before the osd request formatting
calls.
This resolves:
http://tracker.ceph.com/issues/5465
Reported-by: Karol Jurak <karol.jurak@gmail.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
James Bottomley [Thu, 27 Jun 2013 06:08:22 +0000 (23:08 -0700)]
Merge tag 'fcoe1' into fixes
This patch fixes a critical bug that was introduced in 3.9
related to VLAN tagging FCoE frames.
James Bottomley [Thu, 27 Jun 2013 06:07:53 +0000 (23:07 -0700)]
Merge tag 'fcoe' into fixes
3.10 fixes
Linus Torvalds [Thu, 27 Jun 2013 05:24:37 +0000 (19:24 -1000)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Found via trinity:
If you connect up an ipv6 socket to an ipv4 mapped address then an
ipv6 one, sendmsg() can croak because ip6_sk_dst_check() assumes the
route cached in the socket is an ipv6 one. In this case there is an
ipv4 route attached, so it gets stomped on.
Reported by Dave Jones and Hannes Frederic Sowa, fixed by Eric
Dumazet.
2) AF_KEY notifications leak some kernel memory to userspace, fix from
Mathias Krause.
3) DLCI calls __dev_get_by_name() without proper locking, and dlci_del
doesn't validate that the device being deleted is actually a DLCI
one. Fixes from Li Zefan.
4) Length check on bluetooth l2cap information responses is wrong, each
response type has a different lenth, so we should make sure it's in
a given range rather than enforce one single valid length. From
Jaganath Kanakkassery.
5) Receive FIFO overflow is really easy to trigger in stress scenerios
in the sh_eth driver, but the event isn't being handled properly at
all. Specifically, the mask of error interrupts doesn't include the
event so we never clear it, resulting in the driver becomming wedged
processing an interrupt that never gets cleared.
Fix from Sergei Shtylyov.
6) qlcnic sleeps while holding a spinlock, use mdelay() instead of
msleep(). From Shahed Shaikh.
7) Missing curly braces causes SIP netfilter NAT module to always drop
packets. Fix from Balazs Peter Odor.
8) ipt_ULOG in netfilter passes the wrong value to timer setup, causing
the timer to dereference crap when it fires. Fix from Gao Feng.
9) Missing RCU protection around txq->axq_acq traversal in
ath_txq_schedule(). Fix from Felix Fietkau.
10) Idle state transition test in ath9k_htc_config() is reversed, fix
from Sujith Manoharan.
11) IPV6 forwarding handles unicast Router Alert packets incorrectly.
It tests the wrong option state. Previously opt->ra being non-zero
indicated a router alert marking in the SKB, but now it's indicated
by a bit in opt->flags. Fix from YOSHIFUJI Hideaki.
12) SKB leak in GRE tunnel GSO handling, from Eric Dumazet.
13) get_user_pages_fast() error handling in TUN and MACVTAP use the same
local variable for the base index and the loop iterator for page
traversal, oops! Fix from Michael S Tsirkin.
14) ipv6_get_lladdr() can fail, and we must therefore check it's return
value in inet6_set_iftoken(). For from Hannes Frederic Sowa.
15) If you change an interface name and meanwhile can sneak in something
that looks up the name (like SO_BINDTODEVICE or SIOCGIFNAME) we can
deadlock with CONFIG_PREEMPT=n. Fix this by providing a helper
function that properly uses raw_seqcount_begin(). From Nicolas
Schichan.
16) Chain noise calibration test is inverted in iwlwifi, fix from
Nikolay Martynov.
17) Properly set TX iwlwifi descriptor flags for back requests. Fix
from Emmanuel Grumbach.
18) We can't assume skb_transport_header() is set in xt_TCPOPTSTRAP
module, fix from Pablo Neira Ayuso.
19) Some crummy APs don't provide the proper High Throughput info in
association response frames. Add a workaround by assume we'll use
whatever is in the beacon/probe. Fix from Johannes Berg.
20) mac80211 call to rate_idx_match_mask() swaps two arguments (mask and
channel width). Fix from Simon Wunderlich.
21) xt_TCPMSS (like xt_TCPOPTSTRAP) must not try to handle fragmented
frames. Fix from Phil Oester.
22) Fix rate control regression causing iwlwifi/iwlegacy chips to use
1Mbit/s on pre-11n networks. From Moshe Benji and Stanslaw Gruszka.
23) Disable brcmsmac power-save functions, they cause regressions. From
Arend van Spriel.
24) Enforce a sane minimum MTU in l2cap_build_cmd() otherwise we can
easily crash. Fix from Anderson Lizardo.
25) If a learning packet arrives during vxlan_stop() we crash, easily
fixed by checking netif_running(). From Stephen Hemminger.
26) Static vxlan FDB entries should not be migrated, also from Stephen.
27) skb_clone() failures not handled in vxlan_xmit(), oops. Also from
Stephen.
28) Add minimal driver for AR816x/AR817x ethernet chips, from Johannes
Berg.
29) Fix regression in userspace VLAN acceleration control, added by the
802.1ad support changes. Fix from Fernando Luis Vazquez Cao.
30) Interval selection for MLD queries in the bridging code was
reversed. Fix from Linus Lüssing.
31) ipv6's ndisc_send_redirect() erroneously writes to the packet we
received not the packet we are building to send out. Fix from
Matthias Schiffer.
32) Don't free netdev before unregistering it, in usb_8dev can driver.
From Marc Kleine-Budde.
33) Fix nl80211 attribute buffer races, from Johannes Berg.
34) Although netlink_diag.h is under uapi/ it isn't present in Kbuild.
From Stephen Hemminger.
35) Wrong address and family passed to MD5 key lookups in TCP, from
Aydin Arik.
36) phy_type attribute created by SFC driver should not be writable.
From Ben Hutchings.
37) Receive/Transmit queue allocations in pxa168_eth and mv643xx_eth
should use kzalloc(). Otherwise if setup fails half-way, we'll
dereference garbage when trying to teardown the rings. From Lubomir
Rintel.
38) Fix double-allocation of dst (resulting in unfreeable net device) in
ipv6's init_loopback(). From Gao Feng.
39) Fix fragmentation handling SKB leak in netfilter conntrack, we were
freeing the wrong skb pointer. From Phil Oester.
40) Don't report "-1" (SPEED_UNKNOWN) in bond_miimon_commit(), from
Nikolay Aleksandrov.
41) davinci_cpdma doesn't check for DMA mapping errors, letting the
device scribble to random addresses. From Sebastian Siewior.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
dlci: validate the net device in dlci_del()
dlci: acquire rtnl_lock before calling __dev_get_by_name()
af_key: fix info leaks in notify messages
ipv6: ip6_sk_dst_check() must not assume ipv6 dst
net: fix kernel deadlock with interface rename and netdev name retrieval.
net/tg3: Avoid delay during MMIO access
ipv6: check return value of ipv6_get_lladdr
macvtap: fix recovery from gup errors
tun: fix recovery from gup errors
gre: fix a possible skb leak
ipv6: Process unicast packet with Router Alert by checking flag in skb.
ath9k_htc: Handle IDLE state transition properly
ath9k: fix an RCU issue in calling ieee80211_get_tx_rates
netfilter: ipt_ULOG: fix incorrect setting of ulog timer
netfilter: ctnetlink: send event when conntrack label was modified
netfilter: nf_nat_sip: fix mangling
qlcnic: Do not sleep while holding spinlock
drivers: net: cpsw: fix compilation error with cpsw driver
tcp: doc : fix the syncookies default value
sh_eth: fix misreporting of transmit abort
...
Linus Torvalds [Thu, 27 Jun 2013 05:23:15 +0000 (19:23 -1000)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull i915 drm fixes from Dave Airlie:
"These should be the last two fixes for i915, one is for a fence leak
killing X on some older GPUs, and one is a late regression partial
revert for an swiotlb/xen/i915 interaction, Konrad has promised to
figure out the proper answer, and this patch is the best thing to do
at this stage to avoid regressing"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: make compact dma scatter lists creation work with SWIOTLB backend.
drm/i915: Restore fences after resume and GPU resets
Zefan Li [Wed, 26 Jun 2013 07:31:58 +0000 (15:31 +0800)]
dlci: validate the net device in dlci_del()
We triggered an oops while running trinity with 3.4 kernel:
BUG: unable to handle kernel paging request at
0000000100000d07
IP: [<
ffffffffa0109738>] dlci_ioctl+0xd8/0x2d4 [dlci]
PGD
640c0d067 PUD 0
Oops: 0000 [#1] PREEMPT SMP
CPU 3
...
Pid: 7302, comm: trinity-child3 Not tainted 3.4.24.09+ 40 Huawei Technologies Co., Ltd. Tecal RH2285 /BC11BTSA
RIP: 0010:[<
ffffffffa0109738>] [<
ffffffffa0109738>] dlci_ioctl+0xd8/0x2d4 [dlci]
...
Call Trace:
[<
ffffffff8137c5c3>] sock_ioctl+0x153/0x280
[<
ffffffff81195494>] do_vfs_ioctl+0xa4/0x5e0
[<
ffffffff8118354a>] ? fget_light+0x3ea/0x490
[<
ffffffff81195a1f>] sys_ioctl+0x4f/0x80
[<
ffffffff81478b69>] system_call_fastpath+0x16/0x1b
...
It's because the net device is not a dlci device.
Reported-by: Li Jinyue <lijinyue@huawei.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Zefan Li [Wed, 26 Jun 2013 07:29:54 +0000 (15:29 +0800)]
dlci: acquire rtnl_lock before calling __dev_get_by_name()
Otherwise the net device returned can be freed at anytime.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Mathias Krause [Wed, 26 Jun 2013 21:52:30 +0000 (23:52 +0200)]
af_key: fix info leaks in notify messages
key_notify_sa_flush() and key_notify_policy_flush() miss to initialize
the sadb_msg_reserved member of the broadcasted message and thereby
leak 2 bytes of heap memory to listeners. Fix that.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 26 Jun 2013 11:15:07 +0000 (04:15 -0700)]
ipv6: ip6_sk_dst_check() must not assume ipv6 dst
It's possible to use AF_INET6 sockets and to connect to an IPv4
destination. After this, socket dst cache is a pointer to a rtable,
not rt6_info.
ip6_sk_dst_check() should check the socket dst cache is IPv6, or else
various corruptions/crashes can happen.
Dave Jones can reproduce immediate crash with
trinity -q -l off -n -c sendmsg -c connect
With help from Hannes Frederic Sowa
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Schichan [Wed, 26 Jun 2013 15:23:42 +0000 (17:23 +0200)]
net: fix kernel deadlock with interface rename and netdev name retrieval.
When the kernel (compiled with CONFIG_PREEMPT=n) is performing the
rename of a network interface, it can end up waiting for a workqueue
to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a
SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to
the fact that read_secklock_begin() will spin forever waiting for the
writer process (the one doing the interface rename) to update the
devnet_rename_seq sequence.
This patch fixes the problem by adding a helper (netdev_get_name())
and using it in the code handling the SIOCGIFNAME ioctl and
SO_BINDTODEVICE setsockopt.
The netdev_get_name() helper uses raw_seqcount_begin() to avoid
spinning forever, waiting for devnet_rename_seq->sequence to become
even. cond_resched() is used in the contended case, before retrying
the access to give the writer process a chance to finish.
The use of raw_seqcount_begin() will incur some unneeded work in the
reader process in the contended case, but this is better than
deadlocking the system.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 26 Jun 2013 19:18:37 +0000 (09:18 -1000)]
Merge tag 'regulator-v3.10-rc7' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"Fix module loading for tps6586x.
A simple one liner fix to make module loading work for distros
(product specific kernels tend to have things built in)"
* tag 'regulator-v3.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
mfd: tps6586x: correct device name of the regulator cell
Linus Torvalds [Wed, 26 Jun 2013 19:08:58 +0000 (09:08 -1000)]
Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux
Pull GPIO regression fix from Grant Likely:
"It took a while to work out the correct solution to this regression.
It is sorted now. This branch was constructed and tested by Tony.
I've verified that it builds and signed the tag"
* tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux:
gpio/omap: don't use linear domain mapping for OMAP1
Linus Torvalds [Wed, 26 Jun 2013 18:55:03 +0000 (08:55 -1000)]
Merge tag 'pm+acpi-3.10-late' of git://git./linux/kernel/git/rafael/linux-pm
Pull late power management and ACPI fixes from Rafael Wysocki:
"Sorry about the timing of this, but ACPI-based docking stations with
PCI devices on them and ATA bays would be hardly usable with 3.10
without it. We've been working on these fixes for the last couple of
weeks and everyone involved appears to be reasonably comfortable with
them now.
The PM part is one fix for a cpufreq regression introduced recently
- Fix for an ACPI dock regression introduced by the recent rework of
the ACPI-based PCI hotplug code (acpiphp) that caused it to be
initialized before the ACPI dock driver, which is incorrect (ACPI
dock has to be initialized before acpiphp so that acpiphp can
register PCI devices on docking stations with it for PCI hotplug on
re-dock to work). From Jiang Liu.
- Fix for PCI resources allocation in the ACPI-based PCI hotplug code
(acpiphp) that makes it use the same PCI resources assignment rules
during runtime hotplug that are used during boot (the BIOS' choices
are now respected in both cases). This prevents PCI resource
allocation failures during hotplug from happening in some cases.
From Jiang Liu.
- Fix for ordering and synchronization issues during hot-removal of
PCI devices on docking stations. It makes the ACPI dock code carry
out the PCI devices removal synchronously during undock instead of
spawning a separate asynchronous work item to remove each of them
without even bothering to wait for all those work items to
complete. The hot-addition part is changed analogously.
- Fix for a regression (introduced a few releases ago) that removed
the code to register a hotplug notificaion handler for for ATA
ports/devices inadvertently which prevented ATA bays hotplug from
working. The missing code is added back with some improvements.
From Aaron Lu.
- Fix for a recent cpufreq regression causing a NULL pointer
dereference to trigger in od_set_powersave_bias() in some
situations from Jacob Shin"
* tag 'pm+acpi-3.10-late' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: fix NULL pointer deference at od_set_powersave_bias()
libata-acpi: add back ACPI based hotplug functionality
ACPI / dock / PCI: Synchronous handling of dock events for PCI devices
PCI / ACPI: Use boot-time resource allocation rules during hotplug
ACPI / dock: Initialize ACPI dock subsystem upfront
Linus Torvalds [Wed, 26 Jun 2013 18:51:44 +0000 (08:51 -1000)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Three small fixlets"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()
hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)
kprobes: Fix arch_prepare_kprobe to handle copy insn failures
Linus Torvalds [Wed, 26 Jun 2013 18:50:39 +0000 (08:50 -1000)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King:
"Another round of ARM fixes. Largest one is the second half of the
PJ4B fix which was pushed in the previous -rc - this one was delayed
because its original caused a build regression while trying to fix a
regression!
As ever, noMMU gets forgotten when fixing problems on MMU, so we have
a noMMU fix for a previous fix included in this set.
A couple of fixes from Lorenzo for problems with the ARM DT CPU code,
and a one liner to remove the buggy 'wait for interrupt' with FA526
cores"
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7773/1: PJ4B: Add support for errata 4742
ARM: 7772/1: Fix missing flush_kernel_dcache_page() for noMMU
ARM: 7763/1: kernel: fix __cpu_logical_map default initialization
ARM: 7762/1: kernel: fix arm_dt_init_cpu_maps() to skip non-cpu nodes
ARM: 7760/1: cpu_fa526_do_idle: remove WFI
Linus Torvalds [Wed, 26 Jun 2013 18:48:53 +0000 (08:48 -1000)]
Merge tag 'critical_fix_for_3.9' of git://git./linux/kernel/git/rwlove/fcoe
Pull FCoE fix from Robert W Love:
"This patch fixes a critical bug that was introduced in 3.9 related to
VLAN tagging FCoE frames"
* tag 'critical_fix_for_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rwlove/fcoe:
fcoe: Use correct API to set vlan tag for FCoE Ethertype skbs
Linus Torvalds [Wed, 26 Jun 2013 18:47:46 +0000 (08:47 -1000)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph fix from Sage Weil:
"This fixes another problem with using v2 images on 3.10 due to the
order in which fields are read from the image header.
Hopefully this is the last one"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: fetch object order before using it
Stephane Eranian [Thu, 20 Jun 2013 09:36:28 +0000 (11:36 +0200)]
perf: Disable monitoring on setuid processes for regular users
There was a a bug in setup_new_exec(), whereby
the test to disabled perf monitoring was not
correct because the new credentials for the
process were not yet committed and therefore
the get_dumpable() test was never firing.
The patch fixes the problem by moving the
perf_event test until after the credentials
are committed.
Signed-off-by: Stephane Eranian <eranian@google.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Javier Martinez Canillas [Mon, 24 Jun 2013 15:13:23 +0000 (17:13 +0200)]
gpio/omap: don't use linear domain mapping for OMAP1
Commit
ede4d7a5 ("gpio/omap: convert gpio irq domain to linear mapping")
converted the OMAP GPIO driver to use a linear mapping for the GPIO IRQ
domain instead of using a legacy mapping. Not using a legacy mapping has
a number of benefits but it requires the platform to support SPARSE_IRQ
which currently is not supported on OMAP1.
So this change caused a regression on OMAP1 platforms [1].
Since this issue is not present on all OMAP2+ platforms, there is no need to
revert the driver to use legacy domain mapping for all the platforms.
[1]: http://www.mail-archive.com/linux-omap@vger.kernel.org/msg89005.html
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Gavin Shan [Tue, 25 Jun 2013 07:24:32 +0000 (15:24 +0800)]
net/tg3: Avoid delay during MMIO access
When the EEH error is the result of a fenced host bridge, MMIO accesses
can be very slow (milliseconds) to timeout and return all 1's,
thus causing the driver various timeout loops to take way too long and
trigger soft-lockup warnings (in addition to taking minutes to recover).
It might be worthwhile to check if for any of these cases,
ffffffff is
a valid possible value, and if not, bail early since that means the HW
is either gone or isolated. In the meantime, checking that the PCI channel
is offline would be workaround of the problem.
Cc: <stable@vger.kernel.org> # v3.0+
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Mon, 24 Jun 2013 19:42:40 +0000 (21:42 +0200)]
ipv6: check return value of ipv6_get_lladdr
We should check the return value of ipv6_get_lladdr in inet6_set_iftoken.
A possible situation, which could leave ll_addr unassigned is, when
the user removed her link-local address but a global scoped address was
already set. In this case the interface would still be IF_READY and not
dead. In that case the RS source address is some value from the stack.
v2: Daniel Borkmann noted a small indent inconstancy; no semantic
changes.
Cc: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Sun, 23 Jun 2013 14:26:58 +0000 (17:26 +0300)]
macvtap: fix recovery from gup errors
get user pages might fail partially in macvtap zero copy
mode. To recover we need to put all pages that we got,
but code used a wrong index resulting in double-free
errors.
Reported-by: Brad Hubbard <bhubbard@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Sun, 23 Jun 2013 14:19:03 +0000 (17:19 +0300)]
tun: fix recovery from gup errors
get user pages might fail partially in tun zero copy
mode. To recover we need to put all pages that we got,
but code used a wrong index resulting in double-free
errors.
Reported-by: Brad Hubbard <bhubbard@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill A. Shutemov [Tue, 25 Jun 2013 21:15:36 +0000 (14:15 -0700)]
mm/thp: define HPAGE_PMD_* constants as BUILD_BUG() if !THP
Currently, HPAGE_PMD_* constans rely on PMD_SHIFT regardless of
CONFIG_TRANSPARENT_HUGEPAGE. PMD_SHIFT is not defined everywhere (e.g.
arm nommu case).
It means we can't use anything like this in generic code:
if (PageTransHuge(page))
zero_huge_user(page, 0, HPAGE_PMD_SIZE);
else
clear_highpage(page);
For !THP case, PageTransHuge() is 0 and compiler can eliminate
zero_huge_user() call. But it still need to be valid C expression, means
HPAGE_PMD_SIZE has to expand to something compiler can understand.
Previously, HPAGE_PMD_* were defined to BUILD_BUG() for !THP. Let's come
back to it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Eric Dumazet [Mon, 24 Jun 2013 13:26:00 +0000 (06:26 -0700)]
gre: fix a possible skb leak
commit
68c331631143 ("v4 GRE: Add TCP segmentation offload for GRE")
added a possible skb leak, because it frees only the head of segment
list, in case a skb_linearize() call fails.
This patch adds a kfree_skb_list() helper to fix the bug.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 25 Jun 2013 23:04:35 +0000 (16:04 -0700)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless
John W. Linville says:
====================
A few more late-breaking fixes hoping for 3.10...
Regarding the Bluetooth fix, Gustavo says:
"A important fix to 3.10, this patch fixes an issues that was preventing
the l2cap info response command to be handled properly."
Also for that Bluetooth fix, Johan adds:
"Once the code gives up parsing this PDU it also gives up essential
parts of the L2CAP connection creation process, i.e. without this
patch the stack will fail to establish connections properly."
Moving onto ath9k, Felix Fietkau fixes an RCU locking issue in
the transmit path. As for ath9k_htc, Sujith Manoharan fixes some
authentication timeouts by ensuring that a chip reset is done when
IDLE is turned off.
I think these are all micro-fixes that shouldn't cause any trouble.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki / 吉藤英明 [Sat, 22 Jun 2013 02:13:13 +0000 (11:13 +0900)]
ipv6: Process unicast packet with Router Alert by checking flag in skb.
Router Alert option is marked in skb.
Previously, IP6CB(skb)->ra was set to positive value for such packets.
Since commit
dd3332bf ("ipv6: Store Router Alert option in IP6CB
directly."), IP6SKB_ROUTERALERT is set in IP6CB(skb)->flags, and
the value of Router Alert option (in network byte order) is set
to IP6CB(skb)->ra for such packets.
Multicast forwarding path uses that flag and value, but unicast
forwarding path does not use the flag and misuses IP6CB(skb)->ra
value.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafael J. Wysocki [Tue, 25 Jun 2013 20:47:07 +0000 (22:47 +0200)]
Merge branch 'pm-fixes'
* pm-fixes:
cpufreq: fix NULL pointer deference at od_set_powersave_bias()
Rafael J. Wysocki [Tue, 25 Jun 2013 20:46:47 +0000 (22:46 +0200)]
Merge branch 'acpi-fixes'
* acpi-fixes:
libata-acpi: add back ACPI based hotplug functionality
ACPI / dock / PCI: Synchronous handling of dock events for PCI devices
PCI / ACPI: Use boot-time resource allocation rules during hotplug
ACPI / dock: Initialize ACPI dock subsystem upfront
Jacob Shin [Tue, 25 Jun 2013 20:42:37 +0000 (22:42 +0200)]
cpufreq: fix NULL pointer deference at od_set_powersave_bias()
When initializing the default powersave_bias value, we need to first
make sure that this policy is running the ondemand governor.
Reported-and-tested-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Josh Durgin [Wed, 12 Jun 2013 21:43:10 +0000 (14:43 -0700)]
rbd: fetch object order before using it
rbd_dev_v2_header_onetime() fetches striping information, and
checks whether the image can be read by compariing the stripe unit
to the object size. It determines the object size by shifting
the object order, which is 0 at this point since it has not been
read yet. Move the call to get the image size and object order
before rbd_dev_v2_header_onetime() so it is set before use.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Robert Love [Wed, 19 Jun 2013 01:56:54 +0000 (01:56 +0000)]
fcoe: Use correct API to set vlan tag for FCoE Ethertype skbs
fcoe_xmit was coded such that it would skip the vlan net device/layer
and instead set some vlan flags and transmit on the real net device.
The real net device has code that would add the vlan tag for fcoe skbs.
This avoids some extra processing for data frames and provides a small
performance improvement.
Since fcoe_xmit was not using the vlan net device, __vlan_put_tag
within the real net device's xmit routine was ultimately being
called to set the vlan tag.
With the below change the behavior of __vlan_put_tag changed slightly,
it now sets the skb->protocol = vlan_proto. vlan_proto was not a field
being set by fcoe_xmit, so the skb->protocol is now not being set to
ETH_P_8021Q, as it should be.
This patch converts fcoe_xmit to use the vlan_put_tag routine which
will tag the skb and fcoe will continue to transmit fcoe skbs on the
real net device.
For reference, the below change was the one that altered the
__vlan_put_tag behavior.
commit
86a9bad3ab6b6f858fd4443b48738cabbb6d094c
Author: Patrick McHardy <kaber@trash.net>
Date: Fri Apr 19 02:04:30 2013 +0000
net: vlan: add protocol argument to packet tagging functions
Add a protocol argument to the VLAN packet tagging functions. In case of HW
tagging, we need that protocol available in the ndo_start_xmit functions,
so it is stored in a new field in the skb. The new field fits into a hole
(on 64 bit) and doesn't increase the sks's size.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Linus Torvalds [Tue, 25 Jun 2013 19:08:07 +0000 (09:08 -1000)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"A couple of last-minute fixes: a build regression for !SMP, a recent
memory detection patch caused kdump to break, a regression in regard
to sscanf vs reboot from FCP, and two fixes in the DMA mapping code
for PCI"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ipl: Fix FCP WWPN and LUN format strings for read
s390/mem_detect: fix memory hole handling
s390/dma: support debug_dma_mapping_error
s390/dma: fix mapping_error detection
s390/irq: Only define synchronize_irq() on SMP
Linus Torvalds [Tue, 25 Jun 2013 19:06:48 +0000 (09:06 -1000)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
Pull powerpc bugfix from Ben Herrenschmidt:
"This is a fix for a regression causing a freescale "83xx" based
platforms to crash on boot due to some PCI breakage"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/pci: Fix boot panic on mpc83xx (regression)
Linus Torvalds [Tue, 25 Jun 2013 19:06:04 +0000 (09:06 -1000)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse
Pull fuse bugfix from Miklos Szeredi:
"This fixes a race between fallocate() and truncate()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: hold i_mutex in fuse_file_fallocate()
John W. Linville [Tue, 25 Jun 2013 17:24:12 +0000 (13:24 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless into for-davem
Herbert Xu [Tue, 25 Jun 2013 11:15:17 +0000 (19:15 +0800)]
crypto: algboss - Hold ref count on larval
On Thu, Jun 20, 2013 at 10:00:21AM +0200, Daniel Borkmann wrote:
> After having fixed a NULL pointer dereference in SCTP
1abd165e ("net:
> sctp: fix NULL pointer dereference in socket destruction"), I ran into
> the following NULL pointer dereference in the crypto subsystem with
> the same reproducer, easily hit each time:
>
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<
ffffffff81070321>] __wake_up_common+0x31/0x90
> PGD 0
> Oops: 0000 [#1] SMP
> Modules linked in: padlock_sha(F-) sha256_generic(F) sctp(F) libcrc32c(F) [..]
> CPU: 6 PID: 3326 Comm: cryptomgr_probe Tainted: GF 3.10.0-rc5+ #1
> Hardware name: Dell Inc. PowerEdge T410/0H19HD, BIOS 1.6.3 02/01/2011
> task:
ffff88007b6cf4e0 ti:
ffff88007b7cc000 task.ti:
ffff88007b7cc000
> RIP: 0010:[<
ffffffff81070321>] [<
ffffffff81070321>] __wake_up_common+0x31/0x90
> RSP: 0018:
ffff88007b7cde08 EFLAGS:
00010082
> RAX:
ffffffffffffffe8 RBX:
ffff88003756c130 RCX:
0000000000000000
> RDX:
0000000000000000 RSI:
0000000000000003 RDI:
ffff88003756c130
> RBP:
ffff88007b7cde48 R08:
0000000000000000 R09:
ffff88012b173200
> R10:
0000000000000000 R11:
0000000000000000 R12:
0000000000000282
> R13:
ffff88003756c138 R14:
0000000000000000 R15:
0000000000000000
> FS:
0000000000000000(0000) GS:
ffff88012fc60000(0000) knlGS:
0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
> CR2:
0000000000000000 CR3:
0000000001a0b000 CR4:
00000000000007e0
> DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
> DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
> Stack:
>
ffff88007b7cde28 0000000300000000 ffff88007b7cde28 ffff88003756c130
>
0000000000000282 ffff88003756c128 ffffffff81227670 0000000000000000
>
ffff88007b7cde78 ffffffff810722b7 ffff88007cdcf000 ffffffff81a90540
> Call Trace:
> [<
ffffffff81227670>] ? crypto_alloc_pcomp+0x20/0x20
> [<
ffffffff810722b7>] complete_all+0x47/0x60
> [<
ffffffff81227708>] cryptomgr_probe+0x98/0xc0
> [<
ffffffff81227670>] ? crypto_alloc_pcomp+0x20/0x20
> [<
ffffffff8106760e>] kthread+0xce/0xe0
> [<
ffffffff81067540>] ? kthread_freezable_should_stop+0x70/0x70
> [<
ffffffff815450dc>] ret_from_fork+0x7c/0xb0
> [<
ffffffff81067540>] ? kthread_freezable_should_stop+0x70/0x70
> Code: 41 56 41 55 41 54 53 48 83 ec 18 66 66 66 66 90 89 75 cc 89 55 c8
> 4c 8d 6f 08 48 8b 57 08 41 89 cf 4d 89 c6 48 8d 42 e
> RIP [<
ffffffff81070321>] __wake_up_common+0x31/0x90
> RSP <
ffff88007b7cde08>
> CR2:
0000000000000000
> ---[ end trace
b495b19270a4d37e ]---
>
> My assumption is that the following is happening: the minimal SCTP
> tool runs under ``echo 1 > /proc/sys/net/sctp/auth_enable'', hence
> it's making use of crypto_alloc_hash() via sctp_auth_init_hmacs().
> It forks itself, heavily allocates, binds, listens and waits in
> accept on sctp sockets, and then randomly kills some of them (no
> need for an actual client in this case to hit this). Then, again,
> allocating, binding, etc, and then killing child processes.
>
> The problem that might be happening here is that cryptomgr requests
> the module to probe/load through cryptomgr_schedule_probe(), but
> before the thread handler cryptomgr_probe() returns, we return from
> the wait_for_completion_interruptible() function and probably already
> have cleared up larval, thus we run into a NULL pointer dereference
> when in cryptomgr_probe() complete_all() is being called.
>
> If we wait with wait_for_completion() instead, this panic will not
> occur anymore. This is valid, because in case a signal is pending,
> cryptomgr_probe() returns from probing anyway with properly calling
> complete_all().
The use of wait_for_completion_interruptible is intentional so that
we don't lock up the thread if a bug causes us to never wake up.
This bug is caused by the helper thread using the larval without
holding a reference count on it. If the helper thread completes
after the original thread requesting for help has gone away and
destroyed the larval, then we get the crash above.
So the fix is to hold a reference count on the larval.
Cc: <stable@vger.kernel.org> # 3.6+
Reported-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>