[ARM] mm: change to read-allocate as default SMP cache policy
authorGary King <gking@nvidia.com>
Wed, 15 Sep 2010 16:32:10 +0000 (09:32 -0700)
committerRebecca Schultz Zavin <rebecca@android.com>
Fri, 8 Oct 2010 22:58:59 +0000 (15:58 -0700)
the "streaming" mode optimization which skips cacheline allocation
for fully-dirty lines is frequently defeated when coherent processors
perfom stores simultaneously

this results in cachelines being allocated in SMP which are not
allocated when run in uniprocessor, resulting in a significant
reduction in aggregate write bandwidth. for example, on Tegra 2
systems with 300MHz DDR main memory, running memset over a large
buffer (i.e., L2 miss) on a single processor will achieve ~2GB/sec
of write bandwidth, but if the same operation is run in parallel on
both CPUs, the aggregate write bandwidth is just 500MB/sec

changing the cache allocation policy to read-allocate reduces some
of this performance loss on SMP systems.

Change-Id: Ice47ab0a15f2490b7e9a007b4b37800566ed7be1
Signed-off-by: Gary King <gking@nvidia.com>
arch/arm/mm/proc-v7.S

index 7563ff0141bd85cee6d4cc626b69f7210141094c..750175e95ab0ea4180e0e1c5071a30fdd41e1703 100644 (file)
@@ -295,7 +295,11 @@ __v7_setup:
         *   NOS = PRRR[24+n] = 1       - not outer shareable
         */
        ldr     r5, =0xff0a81a8                 @ PRRR
-       ldr     r6, =0x40e040e0                 @ NMRR
+#ifdef CONFIG_SMP
+       ldr     r6, =0xc0e0c0e0                 @ NMRR
+#else
+       ldr     r6, =0x40e040e0
+#endif
        mcr     p15, 0, r5, c10, c2, 0          @ write PRRR
        mcr     p15, 0, r6, c10, c2, 1          @ write NMRR
 #endif