From 3ee9ffb0e543328bea4052fba0f47ebc16314f79 Mon Sep 17 00:00:00 2001 From: Chris Lattner Date: Mon, 27 Mar 2006 07:41:00 +0000 Subject: [PATCH] Add a bunch of notes from my journey thus far. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@27170 91177308-0d34-0410-b5e6-96231b3b80d8 --- lib/Target/PowerPC/README_ALTIVEC.txt | 112 +++++++++++++++++++++++--- 1 file changed, 103 insertions(+), 9 deletions(-) diff --git a/lib/Target/PowerPC/README_ALTIVEC.txt b/lib/Target/PowerPC/README_ALTIVEC.txt index 5144590142a..56fd2cb29f2 100644 --- a/lib/Target/PowerPC/README_ALTIVEC.txt +++ b/lib/Target/PowerPC/README_ALTIVEC.txt @@ -1,11 +1,5 @@ //===- README_ALTIVEC.txt - Notes for improving Altivec code gen ----------===// -Implement TargetConstantVec, and set up PPC to custom lower ConstantVec into -TargetConstantVec's if it's one of the many forms that are algorithmically -computable using the spiffy altivec instructions. - -//===----------------------------------------------------------------------===// - Implement PPCInstrInfo::isLoadFromStackSlot/isStoreToStackSlot for vector registers, to generate better spill code. @@ -31,8 +25,6 @@ void foo(void) { Altivec: Codegen'ing MUL with vector FMADD should add -0.0, not 0.0: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8763 -We need to codegen -0.0 vector efficiently (no constant pool load). - When -ffast-math is on, we can use 0.0. //===----------------------------------------------------------------------===// @@ -48,7 +40,109 @@ a load/store/lve*x sequence. //===----------------------------------------------------------------------===// There are a wide range of vector constants we can generate with combinations of -altivec instructions. For example, GCC does: t=vsplti*, r = t+t. +altivec instructions. Examples + GCC does: "t=vsplti*, r = t+t" for constants it can't generate with one vsplti + + -0.0 (sign bit): vspltisw v0,-1 / vslw v0,v0,v0 + +//===----------------------------------------------------------------------===// + +Missing intrinsics: + +ds* +lve* +lvs* +lvx* +mf* +st* +vavg* +vexptefp +vlogefp +vmax* +vmhaddshs/vmhraddshs +vmin* +vmladduhm +vmr* +vmsum* +vmul* +vperm +vpk* +vr* +vsel (some aliases only accessible using builtins) +vsl* (except vsldoi) +vsr* +vsum* +vup* + +//===----------------------------------------------------------------------===// + +FABS/FNEG can be codegen'd with the appropriate and/xor of -0.0. //===----------------------------------------------------------------------===// +For functions that use altivec AND have calls, we are VRSAVE'ing all call +clobbered regs. + +//===----------------------------------------------------------------------===// + +VSPLTW and friends are expanded by the FE into insert/extract element ops. Make +sure that the dag combiner puts them back together in the appropriate +vector_shuffle node and that this gets pattern matched appropriately. + +//===----------------------------------------------------------------------===// + +Implement passing/returning vectors by value. + +//===----------------------------------------------------------------------===// + +GCC apparently tries to codegen { C1, C2, Variable, C3 } as a constant pool load +of C1/C2/C3, then a load and vperm of Variable. + +//===----------------------------------------------------------------------===// + +We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a 16-byte +aligned stack slot, followed by a lve*x/vperm. We should probably just store it +to a scalar stack slot, then use lvsl/vperm to load it. If the value is already +in memory, this is a huge win. + +//===----------------------------------------------------------------------===// + +Do not generate the MFCR/RLWINM sequence for predicate compares when the +predicate compare is used immediately by a branch. Just branch on the right +cond code on CR6. + +//===----------------------------------------------------------------------===// + +SROA should turn "vector unions" into the appropriate insert/extract element +instructions. + +//===----------------------------------------------------------------------===// + +We need an LLVM 'shuffle' instruction, that corresponds to the VECTOR_SHUFFLE +node. + +//===----------------------------------------------------------------------===// + +We need a way to teach tblgen that some operands of an intrinsic are required to +be constants. The verifier should enforce this constraint. + +//===----------------------------------------------------------------------===// + +We should instcombine the lvx/stvx intrinsics into loads/stores if we know that +the loaded address is 16-byte aligned. + +//===----------------------------------------------------------------------===// + +Instead of writting a pattern for type-agnostic operations (e.g. gen-zero, load, +store, and, ...) in every supported type, make legalize do the work. We should +have a canonical type that we want operations changed to (e.g. v4i32 for +build_vector) and legalize should change non-identical types to thse. This is +similar to what it does for operations that are only supported in some types, +e.g. x86 cmov (not supported on bytes). + +This would fix two problems: +1. Writing patterns multiple times. +2. Identical operations in different types are not getting CSE'd (e.g. + { 0U, 0U, 0U, 0U } and {0.0, 0.0, 0.0, 0.0}. + + -- 2.34.1