+We don't fold (icmp (add) (add)) unless the two adds only have a single use.
+There are a lot of cases that we're refusing to fold in (e.g.) 256.bzip2, for
+example:
+
+ %indvar.next90 = add i64 %indvar89, 1 ;; Has 2 uses
+ %tmp96 = add i64 %tmp95, 1 ;; Has 1 use
+ %exitcond97 = icmp eq i64 %indvar.next90, %tmp96
+
+We don't fold this because we don't want to introduce an overlapped live range
+of the ivar. However if we can make this more aggressive without causing
+performance issues in two ways:
+
+1. If *either* the LHS or RHS has a single use, we can definitely do the
+ transformation. In the overlapping liverange case we're trading one register
+ use for one fewer operation, which is a reasonable trade. Before doing this
+ we should verify that the llc output actually shrinks for some benchmarks.
+2. If both ops have multiple uses, we can still fold it if the operations are
+ both sinkable to *after* the icmp (e.g. in a subsequent block) which doesn't
+ increase register pressure.
+
+There are a ton of icmp's we aren't simplifying because of the reg pressure
+concern. Care is warranted here though because many of these are induction
+variables and other cases that matter a lot to performance, like the above.
+Here's a blob of code that you can drop into the bottom of visitICmp to see some
+missed cases:
+
+ { Value *A, *B, *C, *D;
+ if (match(Op0, m_Add(m_Value(A), m_Value(B))) &&
+ match(Op1, m_Add(m_Value(C), m_Value(D))) &&
+ (A == C || A == D || B == C || B == D)) {
+ errs() << "OP0 = " << *Op0 << " U=" << Op0->getNumUses() << "\n";
+ errs() << "OP1 = " << *Op1 << " U=" << Op1->getNumUses() << "\n";
+ errs() << "CMP = " << I << "\n\n";
+ }
+ }