//===---------------------------------------------------------------------===//
-We should add 'unaligned load/store' nodes, and produce them from code like
-this:
+We should produce an unaligned load from code like this:
v4sf example(float *P) {
return (v4sf){P[0], P[1], P[2], P[3] };
Turn this into a single byte store with no load (the other 3 bytes are
unmodified):
-void %test(uint* %P) {
- %tmp = load uint* %P
- %tmp14 = or uint %tmp, 3305111552
- %tmp15 = and uint %tmp14, 3321888767
- store uint %tmp15, uint* %P
+define void @test(i32* %P) {
+ %tmp = load i32* %P
+ %tmp14 = or i32 %tmp, 3305111552
+ %tmp15 = and i32 %tmp14, 3321888767
+ store i32 %tmp15, i32* %P
ret void
}
//===---------------------------------------------------------------------===//
-We should be able to evaluate this loop:
-
-int test(int x_offs) {
- while (x_offs > 4)
- x_offs -= 4;
- return x_offs;
-}
-
-//===---------------------------------------------------------------------===//
-
Reassociate should turn things like:
int factorial(int X) {
Instcombine should be able to optimize away the loads (and thus the globals).
+//===---------------------------------------------------------------------===//
+
+I saw this constant expression in real code after llvm-g++ -O2:
+
+declare extern_weak i32 @0(i64)
+
+define void @foo() {
+ br i1 icmp eq (i32 zext (i1 icmp ne (i32 (i64)* @0, i32 (i64)* null) to i32),
+i32 0), label %cond_true, label %cond_false
+cond_true:
+ ret void
+cond_false:
+ ret void
+}
+
+That branch expression should be reduced to:
+
+ i1 icmp eq (i32 (i64)* @0, i32 (i64)* null)
+
+It's probably not a perf issue, I just happened to see it while examining
+something else and didn't want to forget about it.
+
//===---------------------------------------------------------------------===//