[ARM] mm: change to read-allocate as default SMP cache policy
the "streaming" mode optimization which skips cacheline allocation
for fully-dirty lines is frequently defeated when coherent processors
perfom stores simultaneously
this results in cachelines being allocated in SMP which are not
allocated when run in uniprocessor, resulting in a significant
reduction in aggregate write bandwidth. for example, on Tegra 2
systems with 300MHz DDR main memory, running memset over a large
buffer (i.e., L2 miss) on a single processor will achieve ~2GB/sec
of write bandwidth, but if the same operation is run in parallel on
both CPUs, the aggregate write bandwidth is just 500MB/sec
changing the cache allocation policy to read-allocate reduces some
of this performance loss on SMP systems.
Change-Id: Ice47ab0a15f2490b7e9a007b4b37800566ed7be1
Signed-off-by: Gary King <gking@nvidia.com>