lib/Target/PowerPC/README_ALTIVEC.txt

   1 //===- README_ALTIVEC.txt - Notes for improving Altivec code gen ----------===//
   2
   3 Implement TargetConstantVec, and set up PPC to custom lower ConstantVec into
   4 TargetConstantVec's if it's one of the many forms that are algorithmically
   5 computable using the spiffy altivec instructions.
   6
   7 //===----------------------------------------------------------------------===//
   8
   9 Implement PPCInstrInfo::isLoadFromStackSlot/isStoreToStackSlot for vector
  10 registers, to generate better spill code.
  11
  12 //===----------------------------------------------------------------------===//
  13
  14 Altivec support.  The first should be a single lvx from the constant pool, the
  15 second should be a xor/stvx:
  16
  17 void foo(void) {
  18   int x[8] __attribute__((aligned(128))) = { 1, 1, 1, 1, 1, 1, 1, 1 };
  19   bar (x);
  20 }
  21
  22 #include <string.h>
  23 void foo(void) {
  24   int x[8] __attribute__((aligned(128)));
  25   memset (x, 0, sizeof (x));
  26   bar (x);
  27 }
  28
  29 //===----------------------------------------------------------------------===//
  30
  31 Altivec: Codegen'ing MUL with vector FMADD should add -0.0, not 0.0:
  32 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8763
  33
  34 We need to codegen -0.0 vector efficiently (no constant pool load).
  35
  36 When -ffast-math is on, we can use 0.0.
  37
  38 //===----------------------------------------------------------------------===//
  39
  40   Consider this:
  41   v4f32 Vector;
  42   v4f32 Vector2 = { Vector.X, Vector.X, Vector.X, Vector.X };
  43
  44 Since we know that "Vector" is 16-byte aligned and we know the element offset
  45 of ".X", we should change the load into a lve*x instruction, instead of doing
  46 a load/store/lve*x sequence.
  47
  48 //===----------------------------------------------------------------------===//
  49
  50 There are a wide range of vector constants we can generate with combinations of
  51 altivec instructions.  For example, GCC does: t=vsplti*, r = t+t.
  52
  53 //===----------------------------------------------------------------------===//
  54