TODO:
* gpr0 allocation
* implement do-loop -> bdnz transform
-* Implement __builtin_trap (ISD::TRAP) as 'tw 31, 0, 0' aka 'trap'.
* lmw/stmw pass a la arm load store optimizer for prolog/epilog
===-------------------------------------------------------------------------===
===-------------------------------------------------------------------------===
-Compile this:
-
-int %f1(int %a, int %b) {
- %tmp.1 = and int %a, 15 ; <int> [#uses=1]
- %tmp.3 = and int %b, 240 ; <int> [#uses=1]
- %tmp.4 = or int %tmp.3, %tmp.1 ; <int> [#uses=1]
- ret int %tmp.4
-}
-
-without a copy. We make this currently:
-
-_f1:
- rlwinm r2, r4, 0, 24, 27
- rlwimi r2, r3, 0, 28, 31
- or r3, r2, r2
- blr
-
-The two-addr pass or RA needs to learn when it is profitable to commute an
-instruction to avoid a copy AFTER the 2-addr instruction. The 2-addr pass
-currently only commutes to avoid inserting a copy BEFORE the two addr instr.
-
-===-------------------------------------------------------------------------===
-
Compile offsets from allocas:
int *%test() {
which is more efficient and can use mfocr. See PR642 for some more context.
//===---------------------------------------------------------------------===//
+
+void foo(float *data, float d) {
+ long i;
+ for (i = 0; i < 8000; i++)
+ data[i] = d;
+}
+void foo2(float *data, float d) {
+ long i;
+ data--;
+ for (i = 0; i < 8000; i++) {
+ data[1] = d;
+ data++;
+ }
+}
+
+These compile to:
+
+_foo:
+ li r2, 0
+LBB1_1: ; bb
+ addi r4, r2, 4
+ stfsx f1, r3, r2
+ cmplwi cr0, r4, 32000
+ mr r2, r4
+ bne cr0, LBB1_1 ; bb
+ blr
+_foo2:
+ li r2, 0
+LBB2_1: ; bb
+ addi r4, r2, 4
+ stfsx f1, r3, r2
+ cmplwi cr0, r4, 32000
+ mr r2, r4
+ bne cr0, LBB2_1 ; bb
+ blr
+
+The 'mr' could be eliminated to folding the add into the cmp better.
+
+//===---------------------------------------------------------------------===//