ARM: 7587/1: implement optimized percpu variable access
authorRob Herring <rob.herring@calxeda.com>
Thu, 29 Nov 2012 19:39:54 +0000 (20:39 +0100)
committerRussell King <rmk+kernel@arm.linux.org.uk>
Mon, 3 Dec 2012 11:16:36 +0000 (11:16 +0000)
commit14318efb322e2fe1a034c69463d725209eb9d548
treecf99b8a06eca7abdb89ac87c7f35bebb6b3254f7
parent3e99675af1b25a191c467700499b1cbe5585a778
ARM: 7587/1: implement optimized percpu variable access

Use the previously unused TPIDRPRW register to store percpu offsets.
TPIDRPRW is only accessible in PL1, so it can only be used in the kernel.

This replaces 2 loads with a mrc instruction for each percpu variable
access. With hackbench, the performance improvement is 1.4% on Cortex-A9
(highbank). Taking an average of 30 runs of "hackbench -l 1000" yields:

Before: 6.2191
After: 6.1348

Will Deacon reported similar delta on v6 with 11MPCore.

The asm "memory clobber" are needed here to ensure the percpu offset
gets reloaded. Testing by Will found that this would not happen in
__schedule() which is a bit of a special case as preemption is disabled
but the execution can move cores.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
arch/arm/include/asm/Kbuild
arch/arm/include/asm/percpu.h [new file with mode: 0644]
arch/arm/kernel/setup.c
arch/arm/kernel/smp.c