lib/Target/PowerPC/README_ALTIVEC.txt

   1 //===- README_ALTIVEC.txt - Notes for improving Altivec code gen ----------===//
   2
   3 Implement PPCInstrInfo::isLoadFromStackSlot/isStoreToStackSlot for vector
   4 registers, to generate better spill code.
   5
   6 //===----------------------------------------------------------------------===//
   7
   8 Altivec support.  The first should be a single lvx from the constant pool, the
   9 second should be a xor/stvx:
  10
  11 void foo(void) {
  12   int x[8] __attribute__((aligned(128))) = { 1, 1, 1, 17, 1, 1, 1, 1 };
  13   bar (x);
  14 }
  15
  16 #include <string.h>
  17 void foo(void) {
  18   int x[8] __attribute__((aligned(128)));
  19   memset (x, 0, sizeof (x));
  20   bar (x);
  21 }
  22
  23 //===----------------------------------------------------------------------===//
  24
  25 Altivec: Codegen'ing MUL with vector FMADD should add -0.0, not 0.0:
  26 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8763
  27
  28 When -ffast-math is on, we can use 0.0.
  29
  30 //===----------------------------------------------------------------------===//
  31
  32   Consider this:
  33   v4f32 Vector;
  34   v4f32 Vector2 = { Vector.X, Vector.X, Vector.X, Vector.X };
  35
  36 Since we know that "Vector" is 16-byte aligned and we know the element offset
  37 of ".X", we should change the load into a lve*x instruction, instead of doing
  38 a load/store/lve*x sequence.
  39
  40 //===----------------------------------------------------------------------===//
  41
  42 There are a wide range of vector constants we can generate with combinations of
  43 altivec instructions.
  44
  45 Examples, these work with all widths:
  46   Splat(+/- 16,18,20,22,24,28,30):  t = vspliti I/2,  r = t+t
  47   Splat(+/- 17,19,21,23,25,29):     t = vsplti +/-15, t2 = vsplti I-15, r=t + t2
  48   Splat(31):                        t = vsplti FB,  r = srl t,t
  49   Splat(256):  t = vsplti 1, r = vsldoi t, t, 1
  50
  51 Lots more are listed here:
  52 http://www.informatik.uni-bremen.de/~hobold/AltiVec.html
  53
  54 This should be added to the ISD::BUILD_VECTOR case in
  55 PPCTargetLowering::LowerOperation.
  56
  57 //===----------------------------------------------------------------------===//
  58
  59 FABS/FNEG can be codegen'd with the appropriate and/xor of -0.0.
  60
  61 //===----------------------------------------------------------------------===//
  62
  63 For functions that use altivec AND have calls, we are VRSAVE'ing all call
  64 clobbered regs.
  65
  66 //===----------------------------------------------------------------------===//
  67
  68 Implement passing vectors by value.
  69
  70 //===----------------------------------------------------------------------===//
  71
  72 GCC apparently tries to codegen { C1, C2, Variable, C3 } as a constant pool load
  73 of C1/C2/C3, then a load and vperm of Variable.
  74
  75 //===----------------------------------------------------------------------===//
  76
  77 We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a 16-byte
  78 aligned stack slot, followed by a load/vperm.  We should probably just store it
  79 to a scalar stack slot, then use lvsl/vperm to load it.  If the value is already
  80 in memory, this is a huge win.
  81
  82 //===----------------------------------------------------------------------===//
  83
  84 Do not generate the MFCR/RLWINM sequence for predicate compares when the
  85 predicate compare is used immediately by a branch.  Just branch on the right
  86 cond code on CR6.
  87
  88 //===----------------------------------------------------------------------===//
  89
  90 We need a way to teach tblgen that some operands of an intrinsic are required to
  91 be constants.  The verifier should enforce this constraint.
  92
  93 //===----------------------------------------------------------------------===//
  94
  95 Implement multiply for vector integer types, to avoid the horrible scalarized
  96 code produced by legalize.
  97
  98 void test(vector int *X, vector int *Y) {
  99   *X = *X * *Y;
 100 }
 101
 102 //===----------------------------------------------------------------------===//
 103
 104 extract_vector_elt of an arbitrary constant vector can be done with the
 105 following instructions:
 106
 107 vTemp = vec_splat(v0,2);    // 2 is the element the src is in.
 108 vec_ste(&destloc,0,vTemp);
 109
 110 We can do an arbitrary non-constant value by using lvsr/perm/ste.
 111
 112 //===----------------------------------------------------------------------===//
 113
 114 If we want to tie instruction selection into the scheduler, we can do some
 115 constant formation with different instructions.  For example, we can generate
 116 "vsplti -1" with "vcmpequw R,R" and 1,1,1,1 with "vsubcuw R,R", both of which
 117 use different execution units, thus could help scheduling.
 118
 119 This is probably only reasonable for a post-pass scheduler.
 120
 121 //===----------------------------------------------------------------------===//
 122