//===---------------------------------------------------------------------===//
-There are serious issues folding loads into "scalar sse" intrinsics. For
-example, this:
-
-float minss4( float x, float *y ) {
- return _mm_cvtss_f32(_mm_min_ss(_mm_set_ss(x),_mm_set_ss(*y)));
-}
-
-compiles to:
-
-_minss4:
- subl $4, %esp
- movl 12(%esp), %eax
-*** movss 8(%esp), %xmm0
-*** movss (%eax), %xmm1
-*** minss %xmm1, %xmm0
- movss %xmm0, (%esp)
- flds (%esp)
- addl $4, %esp
- ret
-
-Each operand of the minss is a load. At least one should be folded!
-
-//===---------------------------------------------------------------------===//
-
Expand libm rounding functions inline: Significant speedups possible.
http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00909.html
//===---------------------------------------------------------------------===//
-Should generate min/max for stuff like:
-
-void minf(float a, float b, float *X) {
- *X = a <= b ? a : b;
-}
-
-Make use of floating point min / max instructions. Perhaps introduce ISD::FMIN
-and ISD::FMAX node types?
-
-//===---------------------------------------------------------------------===//
-
Lower memcpy / memset to a series of SSE 128 bit move instructions when it's
feasible.
//===---------------------------------------------------------------------===//
-Better codegen for:
-
-void f(float a, float b, vector float * out) { *out = (vector float){ a, 0.0, 0.0, b}; }
-void f(float a, float b, vector float * out) { *out = (vector float){ a, b, 0.0, 0}; }
-
-For the later we generate:
-
-_f:
- pxor %xmm0, %xmm0
- movss 8(%esp), %xmm1
- movaps %xmm0, %xmm2
- unpcklps %xmm1, %xmm2
- movss 4(%esp), %xmm1
- unpcklps %xmm0, %xmm1
- unpcklps %xmm2, %xmm1
- movl 12(%esp), %eax
- movaps %xmm1, (%eax)
- ret
-
-This seems like it should use shufps, one for each of a & b.
-
-//===---------------------------------------------------------------------===//
-
How to decide when to use the "floating point version" of logical ops? Here are
some code fragments:
//===---------------------------------------------------------------------===//
-Should generate min/max for stuff like:
-
-void minf(float a, float b, float *X) {
- *X = a <= b ? a : b;
-}
-
-Make use of floating point min / max instructions. Perhaps introduce ISD::FMIN
-and ISD::FMAX node types?
-
-//===---------------------------------------------------------------------===//
-
The first BB of this code:
declare bool %foo()