X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FProgrammersManual.html;h=f6854078b6c95c51ba142307ac2d24c07f7a7f89;hb=a75ce9f5d2236d93c117e861e60e6f3f748c9555;hp=fad5edf1801bf2ab8cbfc4369345dc145d6219e8;hpb=d41720a2d79d2b8b587ebd3d97588aced40e2d9f;p=oota-llvm.git diff --git a/docs/ProgrammersManual.html b/docs/ProgrammersManual.html index fad5edf1801..f6854078b6c 100644 --- a/docs/ProgrammersManual.html +++ b/docs/ProgrammersManual.html @@ -2,6 +2,7 @@ "http://www.w3.org/TR/html4/strict.dtd"> + LLVM Programmer's Manual @@ -28,6 +29,13 @@ +
  • String-like containers +
  • BitVector-like containers
  • @@ -117,6 +132,7 @@ with another Value
  • Deleting GlobalVariables
  • +
  • How to Create Types
  • @@ -242,15 +270,14 @@ can get, so it will not be discussed in this document.

      -
    1. Dinkumware C++ Library -reference - an excellent reference for the STL and other parts of the -standard C++ library.
    2. +
    3. Dinkumware +C++ Library reference - an excellent reference for the STL and other parts +of the standard C++ library.
    4. C++ In a Nutshell - This is an -O'Reilly book in the making. It has a decent -Standard Library -Reference that rivals Dinkumware's, and is unfortunately no longer free since the book has been -published.
    5. +O'Reilly book in the making. It has a decent Standard Library +Reference that rivals Dinkumware's, and is unfortunately no longer free since the +book has been published.
    6. C++ Frequently Asked Questions
    7. @@ -283,8 +310,6 @@ to write maintainable code more than where to put your curly braces.

        -
      1. CVS -Branch and Tag Primer
      2. Using static and shared libraries across platforms
      @@ -413,6 +438,106 @@ are lots of examples in the LLVM source base.

      + + +
      + Passing strings (the StringRef +and Twine classes) +
      + +
      + +

      Although LLVM generally does not do much string manipulation, we do have +several important APIs which take strings. Two important examples are the +Value class -- which has names for instructions, functions, etc. -- and the +StringMap class which is used extensively in LLVM and Clang.

      + +

      These are generic classes, and they need to be able to accept strings which +may have embedded null characters. Therefore, they cannot simply take +a const char *, and taking a const std::string& requires +clients to perform a heap allocation which is usually unnecessary. Instead, +many LLVM APIs use a StringRef or a const Twine& for +passing strings efficiently.

      + +
      + + +
      + The StringRef class +
      + +
      + +

      The StringRef data type represents a reference to a constant string +(a character array and a length) and supports the common operations available +on std:string, but does not require heap allocation.

      + +

      It can be implicitly constructed using a C style null-terminated string, +an std::string, or explicitly with a character pointer and length. +For example, the StringRef find function is declared as:

      + +
      +  iterator find(StringRef Key);
      +
      + +

      and clients can call it using any one of:

      + +
      +  Map.find("foo");                 // Lookup "foo"
      +  Map.find(std::string("bar"));    // Lookup "bar"
      +  Map.find(StringRef("\0baz", 4)); // Lookup "\0baz"
      +
      + +

      Similarly, APIs which need to return a string may return a StringRef +instance, which can be used directly or converted to an std::string +using the str member function. See +"llvm/ADT/StringRef.h" +for more information.

      + +

      You should rarely use the StringRef class directly, because it contains +pointers to external memory it is not generally safe to store an instance of the +class (unless you know that the external storage will not be freed). StringRef is +small and pervasive enough in LLVM that it should always be passed by value.

      + +
      + + +
      + The Twine class +
      + +
      + +

      The Twine class is an efficient way for APIs to accept concatenated +strings. For example, a common LLVM paradigm is to name one instruction based on +the name of another instruction with a suffix, for example:

      + +
      +
      +    New = CmpInst::Create(..., SO->getName() + ".cmp");
      +
      +
      + +

      The Twine class is effectively a +lightweight rope +which points to temporary (stack allocated) objects. Twines can be implicitly +constructed as the result of the plus operator applied to strings (i.e., a C +strings, an std::string, or a StringRef). The twine delays the +actual concatenation of strings until it is actually required, at which point +it can be efficiently rendered directly into a character array. This avoids +unnecessary heap allocation involved in constructing the temporary results of +string concatenation. See +"llvm/ADT/Twine.h" +for more information.

      + +

      As with a StringRef, Twine objects point to external memory +and should almost never be stored or mentioned directly. They are intended +solely for use when defining a function which should be able to efficiently +accept concatenated strings.

      + +
      + +
      The DEBUG() macro and -debug option @@ -437,7 +562,7 @@ tool) is run with the '-debug' command line argument:

      -DOUT << "I am here!\n";
      +DEBUG(errs() << "I am here!\n");
       
      @@ -482,16 +607,16 @@ option as follows:

      -DOUT << "No debug type\n";
       #undef  DEBUG_TYPE
      +DEBUG(errs() << "No debug type\n");
       #define DEBUG_TYPE "foo"
      -DOUT << "'foo' debug type\n";
      +DEBUG(errs() << "'foo' debug type\n");
       #undef  DEBUG_TYPE
       #define DEBUG_TYPE "bar"
      -DOUT << "'bar' debug type\n";
      +DEBUG(errs() << "'bar' debug type\n"));
       #undef  DEBUG_TYPE
       #define DEBUG_TYPE ""
      -DOUT << "No debug type (2)\n";
      +DEBUG(errs() << "No debug type (2)\n");
       
      @@ -523,6 +648,21 @@ on when the name is specified. This allows, for example, all debug information for instruction scheduling to be enabled with -debug-type=InstrSched, even if the source lives in multiple files.

      +

      The DEBUG_WITH_TYPE macro is also available for situations where you +would like to set DEBUG_TYPE, but only for one specific DEBUG +statement. It takes an additional first parameter, which is the type to use. For +example, the preceding example could be written as:

      + + +
      +
      +DEBUG_WITH_TYPE("", errs() << "No debug type\n");
      +DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n");
      +DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n"));
      +DEBUG_WITH_TYPE("", errs() << "No debug type (2)\n");
      +
      +
      +
      @@ -715,6 +855,10 @@ access the container. Based on that, you should use:

      iteration, but do not support efficient look-up based on a key. +
    8. a string container is a specialized sequential + container or reference structure that is used for character or byte + arrays.
    9. +
    10. a bit container provides an efficient way to store and perform set operations on sets of numeric id's, while automatically eliminating duplicates. Bit containers require a maximum of 1 bit for each @@ -878,7 +1022,7 @@ not invalidate iterator or pointers to other elements in the list.

      @@ -886,15 +1030,102 @@ not invalidate iterator or pointers to other elements in the list.

      intrusive, because it requires the element to store and provide access to the prev/next pointers for the list.

      -

      ilist has the same drawbacks as std::list, and additionally requires an -ilist_traits implementation for the element type, but it provides some novel -characteristics. In particular, it can efficiently store polymorphic objects, -the traits class is informed when an element is inserted or removed from the -list, and ilists are guaranteed to support a constant-time splice operation. -

      +

      ilist has the same drawbacks as std::list, and additionally +requires an ilist_traits implementation for the element type, but it +provides some novel characteristics. In particular, it can efficiently store +polymorphic objects, the traits class is informed when an element is inserted or +removed from the list, and ilists are guaranteed to support a +constant-time splice operation.

      -

      These properties are exactly what we want for things like Instructions and -basic blocks, which is why these are implemented with ilists.

      +

      These properties are exactly what we want for things like +Instructions and basic blocks, which is why these are implemented with +ilists.

      + +Related classes of interest are explained in the following subsections: + +
      + + + + +
      +

      ilist_traits<T> is ilist<T>'s customization +mechanism. iplist<T> (and consequently ilist<T>) +publicly derive from this traits class.

      +
      + + +
      + iplist +
      + +
      +

      iplist<T> is ilist<T>'s base and as such +supports a slightly narrower interface. Notably, inserters from +T& are absent.

      + +

      ilist_traits<T> is a public base of this class and can be +used for a wide variety of customizations.

      +
      + + + + +
      +

      ilist_node<T> implements a the forward and backward links +that are expected by the ilist<T> (and analogous containers) +in the default manner.

      + +

      ilist_node<T>s are meant to be embedded in the node type +T, usually T publicly derives from +ilist_node<T>.

      +
      + + + + +
      +

      ilists have another specialty that must be considered. To be a good +citizen in the C++ ecosystem, it needs to support the standard container +operations, such as begin and end iterators, etc. Also, the +operator-- must work correctly on the end iterator in the +case of non-empty ilists.

      + +

      The only sensible solution to this problem is to allocate a so-called +sentinel along with the intrusive list, which serves as the end +iterator, providing the back-link to the last element. However conforming to the +C++ convention it is illegal to operator++ beyond the sentinel and it +also must not be dereferenced.

      + +

      These constraints allow for some implementation freedom to the ilist +how to allocate and store the sentinel. The corresponding policy is dictated +by ilist_traits<T>. By default a T gets heap-allocated +whenever the need for a sentinel arises.

      + +

      While the default policy is sufficient in most cases, it may break down when +T does not provide a default constructor. Also, in the case of many +instances of ilists, the memory overhead of the associated sentinels +is wasted. To alleviate the situation with numerous and voluminous +T-sentinels, sometimes a trick is employed, leading to ghostly +sentinels.

      + +

      Ghostly sentinels are obtained by specially-crafted ilist_traits<T> +which superpose the sentinel with the ilist instance in memory. Pointer +arithmetic is used to obtain the sentinel, which is relative to the +ilist's this pointer. The ilist is augmented by an +extra pointer, which serves as the back-link of the sentinel. This is the only +field in the ghostly sentinel which can be legally accessed.

      @@ -978,14 +1209,14 @@ and erasing, but does not support iteration.

      -

      SmallPtrSet has all the advantages of SmallSet (and a SmallSet of pointers is -transparently implemented with a SmallPtrSet), but also supports iterators. If +

      SmallPtrSet has all the advantages of SmallSet (and a SmallSet of pointers is +transparently implemented with a SmallPtrSet), but also supports iterators. If more than 'N' insertions are performed, a single quadratically probed hash table is allocated and grows as needed, providing extremely efficient access (constant time insertion/deleting/queries with low constant factors) and is very stingy with malloc traffic.

      -

      Note that, unlike std::set, the iterators of SmallPtrSet are invalidated +

      Note that, unlike std::set, the iterators of SmallPtrSet are invalidated whenever an insertion occurs. Also, the values visited by the iterators are not visited in sorted order.

      @@ -1141,21 +1372,16 @@ factors, and produces a lot of malloc traffic. It should be avoided.

      The STL provides several other options, such as std::multiset and the various -"hash_set" like containers (whether from C++ TR1 or from the SGI library).

      +"hash_set" like containers (whether from C++ TR1 or from the SGI library). We +never use hash_set and unordered_set because they are generally very expensive +(each insertion requires a malloc) and very non-portable. +

      std::multiset is useful if you're not interested in elimination of duplicates, but has all the drawbacks of std::set. A sorted vector (where you don't delete duplicate entries) or some other approach is almost always better.

      -

      The various hash_set implementations (exposed portably by -"llvm/ADT/hash_set") is a simple chained hashtable. This algorithm is as malloc -intensive as std::set (performing an allocation for each element inserted, -thus having really high constant factors) but (usually) provides O(1) -insertion/deletion of elements. This can be useful if your elements are large -(thus making the constant-factor cost relatively low) or if comparisons are -expensive. Element iteration does not visit elements in a useful order.

      -
      @@ -1209,7 +1435,7 @@ to the key string for a value.

      The StringMap is very fast for several reasons: quadratic probing is very cache efficient for lookups, the hash value of strings in buckets is not -recomputed when lookup up an element, StringMap rarely has to touch the +recomputed when looking up an element, StringMap rarely has to touch the memory for unrelated objects when looking up a value (even when hash collisions happen), hash table growth does not recompute the hash values for strings already in the table, and each pair in the map is store in a single allocation @@ -1265,6 +1491,40 @@ inserted into the map) that it needs internally.

      + + + +
      + +

      +ValueMap is a wrapper around a DenseMap mapping +Value*s (or subclasses) to another type. When a Value is deleted or RAUW'ed, +ValueMap will update itself so the new version of the key is mapped to the same +value, just as if the key were a WeakVH. You can configure exactly how this +happens, and what else happens on these two events, by passing +a Config parameter to the ValueMap template.

      + +
      + + + + +
      + +

      IntervalMap is a compact map for small keys and values. It maps key +intervals instead of single keys, and it will automatically coalesce adjacent +intervals. When then map only contains a few intervals, they are stored in the +map object itself to avoid allocations.

      + +

      The IntervalMap iterators are quite big, so they should not be passed around +as STL iterators. The heavyweight iterators allow a smaller data structure.

      + +
      +
      <map> @@ -1294,19 +1554,27 @@ another element takes place).

      The STL provides several other options, such as std::multimap and the various -"hash_map" like containers (whether from C++ TR1 or from the SGI library).

      +"hash_map" like containers (whether from C++ TR1 or from the SGI library). We +never use hash_set and unordered_set because they are generally very expensive +(each insertion requires a malloc) and very non-portable.

      std::multimap is useful if you want to map a key to multiple values, but has all the drawbacks of std::map. A sorted vector or some other approach is almost always better.

      -

      The various hash_map implementations (exposed portably by -"llvm/ADT/hash_map") are simple chained hash tables. This algorithm is as -malloc intensive as std::map (performing an allocation for each element -inserted, thus having really high constant factors) but (usually) provides O(1) -insertion/deletion of elements. This can be useful if your elements are large -(thus making the constant-factor cost relatively low) or if comparisons are -expensive. Element iteration does not visit elements in a useful order.

      +
      + + + + +
      + +

      +TODO: const char* vs stringref vs smallstring vs std::string. Describe twine, +xref to #string_apis. +

      @@ -1333,7 +1601,7 @@ please don't use it.

      -

      The BitVector container provides a fixed size set of bits for manipulation. +

      The BitVector container provides a dynamic size set of bits for manipulation. It supports individual bit setting/testing, as well as set operations. The set operations take time O(size of bitvector), but operations are performed one word at a time, instead of one bit at a time. This makes the BitVector very fast for @@ -1342,6 +1610,25 @@ the number of set bits to be high (IE a dense set).

      + + + +
      +

      The SmallBitVector container provides the same interface as BitVector, but +it is optimized for the case where only a small number of bits, less than +25 or so, are needed. It also transparently supports larger bit counts, but +slightly less efficiently than a plain BitVector, so SmallBitVector should +only be used when larger counts are rare. +

      + +

      +At this time, SmallBitVector does not support set operations (and, or, xor), +and its operator[] does not provide an assignable lvalue. +

      +
      +
      SparseBitVector @@ -1421,7 +1708,7 @@ an example that prints the name of a BasicBlock and the number of for (Function::iterator i = func->begin(), e = func->end(); i != e; ++i) // Print out the name of the basic block if it has one, and then the // number of instructions that it contains - llvm::cerr << "Basic block (name=" << i->getName() << ") has " + errs() << "Basic block (name=" << i->getName() << ") has " << i->size() << " instructions.\n";
      @@ -1454,14 +1741,14 @@ a BasicBlock:

      for (BasicBlock::iterator i = blk->begin(), e = blk->end(); i != e; ++i) // The next statement works since operator<<(ostream&,...) // is overloaded for Instruction& - llvm::cerr << *i << "\n"; + errs() << *i << "\n";

      However, this isn't really the best way to print out the contents of a BasicBlock! Since the ostream operators are overloaded for virtually anything you'll care about, you could have just invoked the print routine on the -basic block itself: llvm::cerr << *blk << "\n";.

      +basic block itself: errs() << *blk << "\n";.

      @@ -1487,7 +1774,7 @@ small example that shows how to dump all instructions in a function to the stand // F is a pointer to a Function instance for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) - llvm::cerr << *I << "\n"; + errs() << *I << "\n"; @@ -1566,11 +1853,26 @@ without actually obtaining it via iteration over some structure:

      void printNextInstruction(Instruction* inst) { BasicBlock::iterator it(inst); ++it; // After this line, it refers to the instruction after *inst - if (it != inst->getParent()->end()) llvm::cerr << *it << "\n"; + if (it != inst->getParent()->end()) errs() << *it << "\n"; } +

      Unfortunately, these implicit conversions come at a cost; they prevent +these iterators from conforming to standard iterator conventions, and thus +from being usable with standard algorithms and containers. For example, they +prevent the following code, where B is a BasicBlock, +from compiling:

      + +
      +
      +  llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end());
      +
      +
      + +

      Because of this, these implicit conversions may be removed some day, +and operator* changed to return a pointer instead of a reference.

      + @@ -1614,7 +1916,7 @@ class OurFunctionPass : public FunctionPass { virtual runOnFunction(Function& F) { for (Function::iterator b = F.begin(), be = F.end(); b != be; ++b) { - for (BasicBlock::iterator i = b->begin(); ie = b->end(); i != ie; ++i) { + for (BasicBlock::iterator i = b->begin(), ie = b->end(); i != ie; ++i) { if (CallInst* callInst = dyn_cast<CallInst>(&*i)) { // We know we've encountered a call instruction, so we @@ -1684,13 +1986,17 @@ Function *F = ...; for (Value::use_iterator i = F->use_begin(), e = F->use_end(); i != e; ++i) if (Instruction *Inst = dyn_cast<Instruction>(*i)) { - llvm::cerr << "F is used in instruction:\n"; - llvm::cerr << *Inst << "\n"; + errs() << "F is used in instruction:\n"; + errs() << *Inst << "\n"; } -

      Alternately, it's common to have an instance of the Note that dereferencing a Value::use_iterator is not a very cheap +operation. Instead of performing *i above several times, consider +doing it only once in the loop body and reusing its result.

      + +

      Alternatively, it's common to have an instance of the User Class and need to know what Values are used by it. The list of all Values used by a User is known as a use-def chain. Instances of class @@ -1709,10 +2015,13 @@ for (User::op_iterator i = pi->op_begin(), e = pi->op_end(); i != e; ++i) - +

      Declaring objects as const is an important tool of enforcing +mutation free algorithms (such as analyses, etc.). For this purpose above +iterators come in constant flavors as Value::const_use_iterator +and Value::const_op_iterator. They automatically arise when +calling use/op_begin() on const Value*s or +const User*s respectively. Upon dereferencing, they return +const Use*s. Otherwise the above patterns remain unchanged.

      @@ -1944,9 +2253,9 @@ and ReplaceInstWithInst.