X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FProgrammersManual.html;h=716d364ed5699a2c73ca2c84ca93dbf3aee094be;hb=9eb698b96d8b753b2f5025baae0712167cf7fb03;hp=022d50bb68f7dc05f15d2ae18090e33d620c1fab;hpb=a4a264de3ba9df2c9043fed89e08da9f8f92217e;p=oota-llvm.git diff --git a/docs/ProgrammersManual.html b/docs/ProgrammersManual.html index 022d50bb68f..716d364ed56 100644 --- a/docs/ProgrammersManual.html +++ b/docs/ProgrammersManual.html @@ -62,6 +62,7 @@ option
  • A sorted 'vector'
  • "llvm/ADT/SmallSet.h"
  • "llvm/ADT/SmallPtrSet.h"
  • +
  • "llvm/ADT/DenseSet.h"
  • "llvm/ADT/FoldingSet.h"
  • <set>
  • "llvm/ADT/SetVector.h"
  • @@ -71,12 +72,17 @@ option
  • Map-Like Containers (std::map, DenseMap, etc)
  • +
  • BitVector-like containers +
  • Helpful Hints for Common Operations @@ -97,6 +103,8 @@ complex example
  • the same way
  • Iterating over def-use & use-def chains
  • +
  • Iterating over predecessors & +successors of blocks
  • Making simple changes @@ -106,6 +114,7 @@ use-def chains
  • Deleting Instructions
  • Replacing an Instruction with another Value
  • +
  • Deleting GlobalVariables
  • @@ -912,7 +953,7 @@ efficiently queried with a standard binary or radix search.

    -

    If you have a set-like datastructure that is usually small and whose elements +

    If you have a set-like data structure that is usually small and whose elements are reasonably small, a SmallSet<Type, N> is a good choice. This set has space for N elements in place (thus, if the set is dynamically smaller than N, no malloc traffic is required) and accesses them with a simple linear search. @@ -936,7 +977,7 @@ and erasing, but does not support iteration.

    SmallPtrSet has all the advantages of SmallSet (and a SmallSet of pointers is -transparently implemented with a SmallPtrSet), but also suports iterators. If +transparently implemented with a SmallPtrSet), but also supports iterators. If more than 'N' insertions are performed, a single quadratically probed hash table is allocated and grows as needed, providing extremely efficient access (constant time insertion/deleting/queries with low constant @@ -948,6 +989,25 @@ visited in sorted order.

    + +
    + "llvm/ADT/DenseSet.h" +
    + +
    + +

    +DenseSet is a simple quadratically probed hash table. It excels at supporting +small values: it uses a single allocation to hold all of the pairs that +are currently inserted in the set. DenseSet is a great way to unique small +values that are not simple pointers (use SmallPtrSet for pointers). Note that DenseSet has +the same requirements for the value type that DenseMap has. +

    + +
    +
    "llvm/ADT/FoldingSet.h" @@ -1016,8 +1076,9 @@ std::set is almost never a good choice.

    -

    LLVM's SetVector<Type> is actually a combination of a set along with -a Sequential Container. The important property +

    LLVM's SetVector<Type> is an adapter class that combines your choice of +a set-like container along with a Sequential +Container. The important property that this provides is efficient insertion with uniquing (duplicate elements are ignored) with iteration support. It implements this by inserting elements into both a set-like container and the sequential container, using the set-like @@ -1028,7 +1089,7 @@ container for uniquing and the sequential container for iteration. iteration is guaranteed to match the order of insertion into the SetVector. This property is really important for things like sets of pointers. Because pointer values are non-deterministic (e.g. vary across runs of the program on -different machines), iterating over the pointers in a std::set or other set will +different machines), iterating over the pointers in the set will not be in a well-defined order.

    @@ -1036,9 +1097,17 @@ The drawback of SetVector is that it requires twice as much space as a normal set and has the sum of constant factors from the set-like container and the sequential container that it uses. Use it *only* if you need to iterate over the elements in a deterministic order. SetVector is also expensive to delete -elements out of (linear time). +elements out of (linear time), unless you use it's "pop_back" method, which is +faster.

    +

    SetVector is an adapter class that defaults to using std::vector and std::set +for the underlying containers, so it is quite expensive. However, +"llvm/ADT/SetVector.h" also provides a SmallSetVector class, which +defaults to using a SmallVector and SmallSet of a specified size. If you use +this, and if your sets are dynamically smaller than N, you will save a lot of +heap traffic.

    +
    @@ -1116,7 +1185,7 @@ vectors for sets.
    - "llvm/ADT/CStringMap.h" + "llvm/ADT/StringMap.h"
    @@ -1124,12 +1193,11 @@ vectors for sets.

    Strings are commonly used as keys in maps, and they are difficult to support efficiently: they are variable length, inefficient to hash and compare when -long, expensive to copy, etc. CStringMap is a specialized container designed to -cope with these issues. It supports mapping an arbitrary range of bytes that -does not have an embedded nul character in it ("C strings") to an arbitrary -other object.

    +long, expensive to copy, etc. StringMap is a specialized container designed to +cope with these issues. It supports mapping an arbitrary range of bytes to an +arbitrary other object.

    -

    The CStringMap implementation uses a quadratically-probed hash table, where +

    The StringMap implementation uses a quadratically-probed hash table, where the buckets store a pointer to the heap allocated entries (and some other stuff). The entries in the map must be heap allocated because the strings are variable length. The string data (key) and the element object (value) are @@ -1137,15 +1205,15 @@ stored in the same allocation with the string data immediately after the element object. This container guarantees the "(char*)(&Value+1)" points to the key string for a value.

    -

    The CStringMap is very fast for several reasons: quadratic probing is very +

    The StringMap is very fast for several reasons: quadratic probing is very cache efficient for lookups, the hash value of strings in buckets is not -recomputed when lookup up an element, CStringMap rarely has to touch the +recomputed when lookup up an element, StringMap rarely has to touch the memory for unrelated objects when looking up a value (even when hash collisions happen), hash table growth does not recompute the hash values for strings already in the table, and each pair in the map is store in a single allocation (the string data is stored in the same allocation as the Value of a pair).

    -

    CStringMap also provides query methods that take byte ranges, so it only ever +

    StringMap also provides query methods that take byte ranges, so it only ever copies a string if a value is inserted into the table.

    @@ -1189,7 +1257,7 @@ iterators in a densemap are invalidated whenever an insertion occurs, unlike map. Also, because DenseMap allocates space for a large number of key/value pairs (it starts with 64 by default), it will waste a lot of space if your keys or values are large. Finally, you must implement a partial specialization of -DenseMapKeyInfo for the key that you want, if it isn't already supported. This +DenseMapInfo for the key that you want, if it isn't already supported. This is required to tell DenseMap about two special marker values (which can never be inserted into the map) that it needs internally.

    @@ -1240,6 +1308,52 @@ expensive. Element iteration does not visit elements in a useful order.

    + +
    + Bit storage containers (BitVector, SparseBitVector) +
    + +
    +

    Unlike the other containers, there are only two bit storage containers, and +choosing when to use each is relatively straightforward.

    + +

    One additional option is +std::vector<bool>: we discourage its use for two reasons 1) the +implementation in many common compilers (e.g. commonly available versions of +GCC) is extremely inefficient and 2) the C++ standards committee is likely to +deprecate this container and/or change it significantly somehow. In any case, +please don't use it.

    +
    + + +
    + BitVector +
    + +
    +

    The BitVector container provides a fixed size set of bits for manipulation. +It supports individual bit setting/testing, as well as set operations. The set +operations take time O(size of bitvector), but operations are performed one word +at a time, instead of one bit at a time. This makes the BitVector very fast for +set operations compared to other containers. Use the BitVector when you expect +the number of set bits to be high (IE a dense set). +

    +
    + + +
    + SparseBitVector +
    + +
    +

    The SparseBitVector container is much like BitVector, with one major +difference: Only the bits that are set, are stored. This makes the +SparseBitVector much more space efficient than BitVector when the set is sparse, +as well as making set operations O(number of set bits) instead of O(size of +universe). The downside to the SparseBitVector is that setting and testing of random bits is O(N), and on large SparseBitVectors, this can be slower than BitVector. In our implementation, setting or testing bits in sorted order +(either forwards or reverse) is O(1) worst case. Testing and setting bits within 128 bits (depends on size) of the current bit is also O(1). As a general statement, testing/setting bits in a SparseBitVector is O(distance away from last set bit). +

    +
    @@ -1369,21 +1483,24 @@ small example that shows how to dump all instructions in a function to the stand
     #include "llvm/Support/InstIterator.h"
     
    -// F is a ptr to a Function instance
    -for (inst_iterator i = inst_begin(F), e = inst_end(F); i != e; ++i)
    -  llvm::cerr << *i << "\n";
    +// F is a pointer to a Function instance
    +for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
    +  llvm::cerr << *I << "\n";
     

    Easy, isn't it? You can also use InstIterators to fill a -worklist with its initial contents. For example, if you wanted to -initialize a worklist to contain all instructions in a Function +work list with its initial contents. For example, if you wanted to +initialize a work list to contain all instructions in a Function F, all you would need to do is something like:

     std::set<Instruction*> worklist;
    -worklist.insert(inst_begin(F), inst_end(F));
    +// or better yet, SmallPtrSet<Instruction*, 64> worklist;
    +
    +for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
    +   worklist.insert(&*I);
     
    @@ -1424,7 +1541,7 @@ the last line of the last example,

    -Instruction* pinst = &*i;
    +Instruction *pinst = &*i;
     
    @@ -1432,7 +1549,7 @@ Instruction* pinst = &*i;
    -Instruction* pinst = i;
    +Instruction *pinst = i;
     
    @@ -1467,7 +1584,7 @@ locations in the entire module (that is, across every Function) where a certain function (i.e., some Function*) is already in scope. As you'll learn later, you may want to use an InstVisitor to accomplish this in a much more straight-forward manner, but this example will allow us to explore how -you'd do it if you didn't have InstVisitor around. In pseudocode, this +you'd do it if you didn't have InstVisitor around. In pseudo-code, this is what we want to do:

    @@ -1500,8 +1617,7 @@ class OurFunctionPass : public FunctionPass { href="#CallInst">CallInst>(&*i)) { // We know we've encountered a call instruction, so we // need to determine if it's a call to the - // function pointed to by m_func or not - + // function pointed to by m_func or not. if (callInst->getCalledFunction() == targetFunc) ++callCounter; } @@ -1510,7 +1626,7 @@ class OurFunctionPass : public FunctionPass { } private: - unsigned callCounter; + unsigned callCounter; };
    @@ -1562,7 +1678,7 @@ of F:

    -Function* F = ...;
    +Function *F = ...;
     
     for (Value::use_iterator i = F->use_begin(), e = F->use_end(); i != e; ++i)
       if (Instruction *Inst = dyn_cast<Instruction>(*i)) {
    @@ -1582,10 +1698,10 @@ the particular Instruction):

    -Instruction* pi = ...;
    +Instruction *pi = ...;
     
     for (User::op_iterator i = pi->op_begin(), e = pi->op_end(); i != e; ++i) {
    -  Value* v = *i;
    +  Value *v = *i;
       // ...
     }
     
    @@ -1598,6 +1714,36 @@ for (User::op_iterator i = pi->op_begin(), e = pi->op_end(); i != e; ++i)
    + + + +
    + +

    Iterating over the predecessors and successors of a block is quite easy +with the routines defined in "llvm/Support/CFG.h". Just use code like +this to iterate over all predecessors of BB:

    + +
    +
    +#include "llvm/Support/CFG.h"
    +BasicBlock *BB = ...;
    +
    +for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
    +  BasicBlock *Pred = *PI;
    +  // ...
    +}
    +
    +
    + +

    Similarly, to iterate over successors use +succ_iterator/succ_begin/succ_end.

    + +
    + +
    Making simple changes @@ -1630,12 +1776,12 @@ parameters. For example, an AllocaInst only requires a
    -AllocaInst* ai = new AllocaInst(Type::IntTy);
    +AllocaInst* ai = new AllocaInst(Type::Int32Ty);
     

    will create an AllocaInst instance that represents the allocation of -one integer in the current stack frame, at runtime. Each Instruction +one integer in the current stack frame, at run time. Each Instruction subclass is likely to have varying default parameters which change the semantics of the instruction, so refer to the doxygen documentation for the subclass of @@ -1649,7 +1795,7 @@ at generated LLVM machine code, you definitely want to have logical names associated with the results of instructions! By supplying a value for the Name (default) parameter of the Instruction constructor, you associate a logical name with the result of the instruction's execution at -runtime. For example, say that I'm writing a transformation that dynamically +run time. For example, say that I'm writing a transformation that dynamically allocates space for an integer on the stack, and that integer is going to be used as some kind of index by some other code. To accomplish this, I place an AllocaInst at the first point in the first BasicBlock of some @@ -1658,12 +1804,12 @@ used as some kind of index by some other code. To accomplish this, I place an

    where indexLoc is now the logical name of the instruction's -execution value, which is a pointer to an integer on the runtime stack.

    +execution value, which is a pointer to an integer on the run time stack.

    Inserting instructions

    @@ -1771,9 +1917,7 @@ erase function to remove your instruction. For example:

     Instruction *I = .. ;
    -BasicBlock *BB = I->getParent();
    -
    -BB->getInstList().erase(I);
    +I->eraseFromParent();
     
    @@ -1810,7 +1954,7 @@ AllocaInst* instToReplace = ...; BasicBlock::iterator ii(instToReplace); ReplaceInstWithValue(instToReplace->getParent()->getInstList(), ii, - Constant::getNullValue(PointerType::get(Type::IntTy))); + Constant::getNullValue(PointerType::get(Type::Int32Ty)));
  • ReplaceInstWithInst @@ -1825,7 +1969,7 @@ AllocaInst* instToReplace = ...; BasicBlock::iterator ii(instToReplace); ReplaceInstWithInst(instToReplace->getParent()->getInstList(), ii, - new AllocaInst(Type::IntTy, 0, "ptrToReplacedInt")); + new AllocaInst(Type::Int32Ty, 0, "ptrToReplacedInt"));
  • @@ -1843,6 +1987,28 @@ ReplaceInstWithValue, ReplaceInstWithInst --> + +
    + Deleting GlobalVariables +
    + +
    + +

    Deleting a global variable from a module is just as easy as deleting an +Instruction. First, you must have a pointer to the global variable that you wish + to delete. You use this pointer to erase it from its parent, the module. + For example:

    + +
    +
    +GlobalVariable *GV = .. ;
    +
    +GV->eraseFromParent();
    +
    +
    + +
    +
    Advanced Topics @@ -1877,7 +2043,7 @@ recursive types and late resolution of opaque types makes the situation very difficult to handle. Fortunately, for the most part, our implementation makes most clients able to be completely unaware of the nasty internal details. The primary case where clients are exposed to the inner workings of it are when -building a recursive type. In addition to this case, the LLVM bytecode reader, +building a recursive type. In addition to this case, the LLVM bitcode reader, assembly parser, and linker also have to be aware of the inner workings of this system.

    @@ -1922,7 +2088,7 @@ To build this, use the following LLVM APIs: PATypeHolder StructTy = OpaqueType::get(); std::vector<const Type*> Elts; Elts.push_back(PointerType::get(StructTy)); -Elts.push_back(Type::IntTy); +Elts.push_back(Type::Int32Ty); StructType *NewSTy = StructType::get(Elts); // At this point, NewSTy = "{ opaque*, i32 }". Tell VMCore that @@ -2010,12 +2176,8 @@ Type is maintained by PATypeHolder objects.

    Some data structures need more to perform more complex updates when types get -resolved. The SymbolTable class, for example, needs -move and potentially merge type planes in its representation when a pointer -changes.

    - -

    -To support this, a class can derive from the AbstractTypeUser class. This class +resolved. To support this, a class can derive from the AbstractTypeUser class. +This class allows it to get callbacks when certain types are resolved. To register to get callbacks for a particular type, the DerivedType::{add/remove}AbstractTypeUser methods can be called on a type. Note that these methods only work for @@ -2027,16 +2189,19 @@ objects) can never be refined.

    - The SymbolTable class + The ValueSymbolTable and + TypeSymbolTable classes
    -

    This class provides a symbol table that the The +ValueSymbolTable class provides a symbol table that the Function and -Module classes use for naming definitions. The symbol table can -provide a name for any Value. -SymbolTable is an abstract data type. It hides the data it contains -and provides access to it through a controlled interface.

    +Module classes use for naming value definitions. The symbol table +can provide a name for any Value. +The +TypeSymbolTable class is used by the Module class to store +names for types.

    Note that the SymbolTable class should not be directly accessed by most clients. It should only be used when iteration over the symbol table @@ -2046,159 +2211,14 @@ all LLVM an empty name) do not exist in the symbol table.

    -

    To use the SymbolTable well, you need to understand the -structure of the information it holds. The class contains two -std::map objects. The first, pmap, is a map of -Type* to maps of name (std::string) to Value*. -Thus, Values are stored in two-dimensions and accessed by Type and -name.

    - -

    The interface of this class provides three basic types of operations: -

      -
    1. Accessors. Accessors provide read-only access to information - such as finding a value for a name with the - lookup method.
    2. -
    3. Mutators. Mutators allow the user to add information to the - SymbolTable with methods like - insert.
    4. -
    5. Iterators. Iterators allow the user to traverse the content - of the symbol table in well defined ways, such as the method - plane_begin.
    6. -
    - -

    Accessors

    -
    -
    Value* lookup(const Type* Ty, const std::string& name) const: -
    -
    The lookup method searches the type plane given by the - Ty parameter for a Value with the provided name. - If a suitable Value is not found, null is returned.
    - -
    bool isEmpty() const:
    -
    This function returns true if both the value and types maps are - empty
    -
    - -

    Mutators

    -
    -
    void insert(Value *Val):
    -
    This method adds the provided value to the symbol table. The Value must - have both a name and a type which are extracted and used to place the value - in the correct type plane under the value's name.
    - -
    void insert(const std::string& Name, Value *Val):
    -
    Inserts a constant or type into the symbol table with the specified - name. There can be a many to one mapping between names and constants - or types.
    - -
    void remove(Value* Val):
    -
    This method removes a named value from the symbol table. The - type and name of the Value are extracted from \p N and used to - lookup the Value in the correct type plane. If the Value is - not in the symbol table, this method silently ignores the - request.
    - -
    Value* remove(const std::string& Name, Value *Val):
    -
    Remove a constant or type with the specified name from the - symbol table.
    - -
    Value *remove(const value_iterator& It):
    -
    Removes a specific value from the symbol table. - Returns the removed value.
    - -
    bool strip():
    -
    This method will strip the symbol table of its names leaving - the type and values.
    - -
    void clear():
    -
    Empty the symbol table completely.
    -
    - -

    Iteration

    -

    The following functions describe three types of iterators you can obtain -the beginning or end of the sequence for both const and non-const. It is -important to keep track of the different kinds of iterators. There are -three idioms worth pointing out:

    - - - - - - - - - - - -
    UnitsIteratorIdiom
    Planes Of name/Value mapsPI
    
    -for (SymbolTable::plane_const_iterator PI = ST.plane_begin(),
    -     PE = ST.plane_end(); PI != PE; ++PI ) {
    -  PI->first  // This is the Type* of the plane
    -  PI->second // This is the SymbolTable::ValueMap of name/Value pairs
    -}
    -    
    name/Value pairs in a planeVI
    
    -for (SymbolTable::value_const_iterator VI = ST.value_begin(SomeType),
    -     VE = ST.value_end(SomeType); VI != VE; ++VI ) {
    -  VI->first  // This is the name of the Value
    -  VI->second // This is the Value* value associated with the name
    -}
    -    
    - -

    Using the recommended iterator names and idioms will help you avoid -making mistakes. Of particular note, make sure that whenever you use -value_begin(SomeType) that you always compare the resulting iterator -with value_end(SomeType) not value_end(SomeOtherType) or else you -will loop infinitely.

    - -
    - -
    plane_iterator plane_begin():
    -
    Get an iterator that starts at the beginning of the type planes. - The iterator will iterate over the Type/ValueMap pairs in the - type planes.
    - -
    plane_const_iterator plane_begin() const:
    -
    Get a const_iterator that starts at the beginning of the type - planes. The iterator will iterate over the Type/ValueMap pairs - in the type planes.
    - -
    plane_iterator plane_end():
    -
    Get an iterator at the end of the type planes. This serves as - the marker for end of iteration over the type planes.
    - -
    plane_const_iterator plane_end() const:
    -
    Get a const_iterator at the end of the type planes. This serves as - the marker for end of iteration over the type planes.
    +

    These symbol tables support iteration over the values/types in the symbol +table with begin/end/iterator and supports querying to see if a +specific name is in the symbol table (with lookup). The +ValueSymbolTable class exposes no public mutator methods, instead, +simply call setName on a value, which will autoinsert it into the +appropriate symbol table. For types, use the Module::addTypeName method to +insert entries into the symbol table.

    -
    value_iterator value_begin(const Type *Typ):
    -
    Get an iterator that starts at the beginning of a type plane. - The iterator will iterate over the name/value pairs in the type plane. - Note: The type plane must already exist before using this.
    - -
    value_const_iterator value_begin(const Type *Typ) const:
    -
    Get a const_iterator that starts at the beginning of a type plane. - The iterator will iterate over the name/value pairs in the type plane. - Note: The type plane must already exist before using this.
    - -
    value_iterator value_end(const Type *Typ):
    -
    Get an iterator to the end of a type plane. This serves as the marker - for end of iteration of the type plane. - Note: The type plane must already exist before using this.
    - -
    value_const_iterator value_end(const Type *Typ) const:
    -
    Get a const_iterator to the end of a type plane. This serves as the - marker for end of iteration of the type plane. - Note: the type plane must already exist before using this.
    - -
    plane_const_iterator find(const Type* Typ ) const:
    -
    This method returns a plane_const_iterator for iteration over - the type planes starting at a specific plane, given by \p Ty.
    - -
    plane_iterator find( const Type* Typ :
    -
    This method returns a plane_iterator for iteration over the - type planes starting at a specific plane, given by \p Ty.
    - -
    @@ -2298,15 +2318,15 @@ the lib/VMCore directory.

    PointerType
    Subclass of SequentialType for pointer types.
    -
    PackedType
    -
    Subclass of SequentialType for packed (vector) types. A - packed type is similar to an ArrayType but is distinguished because it is - a first class type wherease ArrayType is not. Packed types are used for +
    VectorType
    +
    Subclass of SequentialType for vector types. A + vector type is similar to an ArrayType but is distinguished because it is + a first class type wherease ArrayType is not. Vector types are used for vector operations and are usually small vectors of of an integer or floating point type.
    StructType
    Subclass of DerivedTypes for struct types.
    -
    FunctionType
    +
    FunctionType
    Subclass of DerivedTypes for function types.
    • bool isVarArg() const: Returns true if its a vararg @@ -2500,7 +2520,7 @@ method. In addition, all LLVM values can be named. The "name" of the
    -

    The name of this instruction is "foo". NOTE +

    The name of this instruction is "foo". NOTE that the name of any value may be missing (an empty string), so names should ONLY be used for debugging (making the source code easier to read, debugging printouts), they should not be used to keep track of values or map @@ -2732,10 +2752,20 @@ a subclass, which represents the address of a global variable or function.

  • ConstantInt : This subclass of Constant represents an integer constant of any width.