1 //===- README_ALTIVEC.txt - Notes for improving Altivec code gen ----------===//
3 Implement PPCInstrInfo::isLoadFromStackSlot/isStoreToStackSlot for vector
4 registers, to generate better spill code.
6 //===----------------------------------------------------------------------===//
8 Altivec support. The first should be a single lvx from the constant pool, the
9 second should be a xor/stvx:
12 int x[8] __attribute__((aligned(128))) = { 1, 1, 1, 17, 1, 1, 1, 1 };
18 int x[8] __attribute__((aligned(128)));
19 memset (x, 0, sizeof (x));
23 //===----------------------------------------------------------------------===//
25 Altivec: Codegen'ing MUL with vector FMADD should add -0.0, not 0.0:
26 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8763
28 When -ffast-math is on, we can use 0.0.
30 //===----------------------------------------------------------------------===//
34 v4f32 Vector2 = { Vector.X, Vector.X, Vector.X, Vector.X };
36 Since we know that "Vector" is 16-byte aligned and we know the element offset
37 of ".X", we should change the load into a lve*x instruction, instead of doing
38 a load/store/lve*x sequence.
40 //===----------------------------------------------------------------------===//
42 There are a wide range of vector constants we can generate with combinations of
45 Examples, these work with all widths:
46 Splat(+/- 16,18,20,22,24,28,30): t = vspliti I/2, r = t+t
47 Splat(+/- 17,19,21,23,25,29): t = vsplti +/-15, t2 = vsplti I-15, r=t + t2
48 Splat(31): t = vsplti FB, r = srl t,t
49 Splat(256): t = vsplti 1, r = vsldoi t, t, 1
51 Lots more are listed here:
52 http://www.informatik.uni-bremen.de/~hobold/AltiVec.html
54 This should be added to the ISD::BUILD_VECTOR case in
55 PPCTargetLowering::LowerOperation.
57 //===----------------------------------------------------------------------===//
59 FABS/FNEG can be codegen'd with the appropriate and/xor of -0.0.
61 //===----------------------------------------------------------------------===//
63 For functions that use altivec AND have calls, we are VRSAVE'ing all call
66 //===----------------------------------------------------------------------===//
68 Implement passing vectors by value.
70 //===----------------------------------------------------------------------===//
72 GCC apparently tries to codegen { C1, C2, Variable, C3 } as a constant pool load
73 of C1/C2/C3, then a load and vperm of Variable.
75 //===----------------------------------------------------------------------===//
77 We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a 16-byte
78 aligned stack slot, followed by a load/vperm. We should probably just store it
79 to a scalar stack slot, then use lvsl/vperm to load it. If the value is already
80 in memory, this is a huge win.
82 //===----------------------------------------------------------------------===//
84 Do not generate the MFCR/RLWINM sequence for predicate compares when the
85 predicate compare is used immediately by a branch. Just branch on the right
88 //===----------------------------------------------------------------------===//
90 We need a way to teach tblgen that some operands of an intrinsic are required to
91 be constants. The verifier should enforce this constraint.
93 //===----------------------------------------------------------------------===//
95 Implement multiply for vector integer types, to avoid the horrible scalarized
96 code produced by legalize.
98 void test(vector int *X, vector int *Y) {
102 //===----------------------------------------------------------------------===//
104 extract_vector_elt of an arbitrary constant vector can be done with the
105 following instructions:
107 vTemp = vec_splat(v0,2); // 2 is the element the src is in.
108 vec_ste(&destloc,0,vTemp);
110 We can do an arbitrary non-constant value by using lvsr/perm/ste.
112 //===----------------------------------------------------------------------===//
114 If we want to tie instruction selection into the scheduler, we can do some
115 constant formation with different instructions. For example, we can generate
116 "vsplti -1" with "vcmpequw R,R" and 1,1,1,1 with "vsubcuw R,R", both of which
117 use different execution units, thus could help scheduling.
119 This is probably only reasonable for a post-pass scheduler.
121 //===----------------------------------------------------------------------===//