X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FProgrammersManual.html;h=e8d81a25061daa2db8d873767113226f2c1b5b2e;hb=2bf4b54a800c2dd44c0a5939fe629ea120bee2ad;hp=7bec1b964dc85a9ca95caedfe790947e2f8d4e6d;hpb=2e438ca03b69ce199f9d567fc5e4ca028d1018c4;p=oota-llvm.git diff --git a/docs/ProgrammersManual.html b/docs/ProgrammersManual.html index 7bec1b964dc..e8d81a25061 100644 --- a/docs/ProgrammersManual.html +++ b/docs/ProgrammersManual.html @@ -2,6 +2,7 @@ "http://www.w3.org/TR/html4/strict.dtd"> + LLVM Programmer's Manual @@ -28,6 +29,13 @@ +
  • String-like containers +
  • BitVector-like containers
  • @@ -117,6 +131,7 @@ with another Value
  • Deleting GlobalVariables
  • +
  • How to Create Types
  • @@ -240,15 +269,14 @@ can get, so it will not be discussed in this document.

      -
    1. Dinkumware C++ Library -reference - an excellent reference for the STL and other parts of the -standard C++ library.
    2. +
    3. Dinkumware +C++ Library reference - an excellent reference for the STL and other parts +of the standard C++ library.
    4. C++ In a Nutshell - This is an -O'Reilly book in the making. It has a decent -Standard Library -Reference that rivals Dinkumware's, and is unfortunately no longer free since the book has been -published.
    5. +O'Reilly book in the making. It has a decent Standard Library +Reference that rivals Dinkumware's, and is unfortunately no longer free since the +book has been published.
    6. C++ Frequently Asked Questions
    7. @@ -281,8 +309,6 @@ to write maintainable code more than where to put your curly braces.

        -
      1. CVS -Branch and Tag Primer
      2. Using static and shared libraries across platforms
      @@ -331,7 +357,7 @@ file (note that you very rarely have to include this file directly).

      cast<>:

      The cast<> operator is a "checked cast" operation. It - converts a pointer or reference from a base class to a derived cast, causing + converts a pointer or reference from a base class to a derived class, causing an assertion failure if it is not really an instance of the right type. This should be used in cases where you have some information that makes you believe that something is of the right type. An example of the isa<> @@ -411,6 +437,106 @@ are lots of examples in the LLVM source base.

      + + +
      + Passing strings (the StringRef +and Twine classes) +
      + +
      + +

      Although LLVM generally does not do much string manipulation, we do have +several important APIs which take strings. Two important examples are the +Value class -- which has names for instructions, functions, etc. -- and the +StringMap class which is used extensively in LLVM and Clang.

      + +

      These are generic classes, and they need to be able to accept strings which +may have embedded null characters. Therefore, they cannot simply take +a const char *, and taking a const std::string& requires +clients to perform a heap allocation which is usually unnecessary. Instead, +many LLVM APIs use a StringRef or a const Twine& for +passing strings efficiently.

      + +
      + + +
      + The StringRef class +
      + +
      + +

      The StringRef data type represents a reference to a constant string +(a character array and a length) and supports the common operations available +on std:string, but does not require heap allocation.

      + +

      It can be implicitly constructed using a C style null-terminated string, +an std::string, or explicitly with a character pointer and length. +For example, the StringRef find function is declared as:

      + +
      +  iterator find(StringRef Key);
      +
      + +

      and clients can call it using any one of:

      + +
      +  Map.find("foo");                 // Lookup "foo"
      +  Map.find(std::string("bar"));    // Lookup "bar"
      +  Map.find(StringRef("\0baz", 4)); // Lookup "\0baz"
      +
      + +

      Similarly, APIs which need to return a string may return a StringRef +instance, which can be used directly or converted to an std::string +using the str member function. See +"llvm/ADT/StringRef.h" +for more information.

      + +

      You should rarely use the StringRef class directly, because it contains +pointers to external memory it is not generally safe to store an instance of the +class (unless you know that the external storage will not be freed). StringRef is +small and pervasive enough in LLVM that it should always be passed by value.

      + +
      + + +
      + The Twine class +
      + +
      + +

      The Twine class is an efficient way for APIs to accept concatenated +strings. For example, a common LLVM paradigm is to name one instruction based on +the name of another instruction with a suffix, for example:

      + +
      +
      +    New = CmpInst::Create(..., SO->getName() + ".cmp");
      +
      +
      + +

      The Twine class is effectively a +lightweight rope +which points to temporary (stack allocated) objects. Twines can be implicitly +constructed as the result of the plus operator applied to strings (i.e., a C +strings, an std::string, or a StringRef). The twine delays the +actual concatenation of strings until it is actually required, at which point +it can be efficiently rendered directly into a character array. This avoids +unnecessary heap allocation involved in constructing the temporary results of +string concatenation. See +"llvm/ADT/Twine.h" +for more information.

      + +

      As with a StringRef, Twine objects point to external memory +and should almost never be stored or mentioned directly. They are intended +solely for use when defining a function which should be able to efficiently +accept concatenated strings.

      + +
      + +
      The DEBUG() macro and -debug option @@ -435,7 +561,7 @@ tool) is run with the '-debug' command line argument:

      -DOUT << "I am here!\n";
      +DEBUG(errs() << "I am here!\n");
       
      @@ -480,16 +606,16 @@ option as follows:

      -DOUT << "No debug type\n";
       #undef  DEBUG_TYPE
      +DEBUG(errs() << "No debug type\n");
       #define DEBUG_TYPE "foo"
      -DOUT << "'foo' debug type\n";
      +DEBUG(errs() << "'foo' debug type\n");
       #undef  DEBUG_TYPE
       #define DEBUG_TYPE "bar"
      -DOUT << "'bar' debug type\n";
      +DEBUG(errs() << "'bar' debug type\n"));
       #undef  DEBUG_TYPE
       #define DEBUG_TYPE ""
      -DOUT << "No debug type (2)\n";
      +DEBUG(errs() << "No debug type (2)\n");
       
      @@ -521,6 +647,21 @@ on when the name is specified. This allows, for example, all debug information for instruction scheduling to be enabled with -debug-type=InstrSched, even if the source lives in multiple files.

      +

      The DEBUG_WITH_TYPE macro is also available for situations where you +would like to set DEBUG_TYPE, but only for one specific DEBUG +statement. It takes an additional first parameter, which is the type to use. For +example, the preceding example could be written as:

      + + +
      +
      +DEBUG_WITH_TYPE("", errs() << "No debug type\n");
      +DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n");
      +DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n"));
      +DEBUG_WITH_TYPE("", errs() << "No debug type (2)\n");
      +
      +
      +
      @@ -713,6 +854,10 @@ access the container. Based on that, you should use:

      iteration, but do not support efficient look-up based on a key. +
    8. a string container is a specialized sequential + container or reference structure that is used for character or byte + arrays.
    9. +
    10. a bit container provides an efficient way to store and perform set operations on sets of numeric id's, while automatically eliminating duplicates. Bit containers require a maximum of 1 bit for each @@ -876,7 +1021,7 @@ not invalidate iterator or pointers to other elements in the list.

      @@ -884,15 +1029,102 @@ not invalidate iterator or pointers to other elements in the list.

      intrusive, because it requires the element to store and provide access to the prev/next pointers for the list.

      -

      ilist has the same drawbacks as std::list, and additionally requires an -ilist_traits implementation for the element type, but it provides some novel -characteristics. In particular, it can efficiently store polymorphic objects, -the traits class is informed when an element is inserted or removed from the -list, and ilists are guaranteed to support a constant-time splice operation. -

      +

      ilist has the same drawbacks as std::list, and additionally +requires an ilist_traits implementation for the element type, but it +provides some novel characteristics. In particular, it can efficiently store +polymorphic objects, the traits class is informed when an element is inserted or +removed from the list, and ilists are guaranteed to support a +constant-time splice operation.

      + +

      These properties are exactly what we want for things like +Instructions and basic blocks, which is why these are implemented with +ilists.

      + +Related classes of interest are explained in the following subsections: + +
      + + + + +
      +

      ilist_traits<T> is ilist<T>'s customization +mechanism. iplist<T> (and consequently ilist<T>) +publicly derive from this traits class.

      +
      + + +
      + iplist +
      + +
      +

      iplist<T> is ilist<T>'s base and as such +supports a slightly narrower interface. Notably, inserters from +T& are absent.

      + +

      ilist_traits<T> is a public base of this class and can be +used for a wide variety of customizations.

      +
      + + + + +
      +

      ilist_node<T> implements a the forward and backward links +that are expected by the ilist<T> (and analogous containers) +in the default manner.

      -

      These properties are exactly what we want for things like Instructions and -basic blocks, which is why these are implemented with ilists.

      +

      ilist_node<T>s are meant to be embedded in the node type +T, usually T publicly derives from +ilist_node<T>.

      +
      + + + + +
      +

      ilists have another specialty that must be considered. To be a good +citizen in the C++ ecosystem, it needs to support the standard container +operations, such as begin and end iterators, etc. Also, the +operator-- must work correctly on the end iterator in the +case of non-empty ilists.

      + +

      The only sensible solution to this problem is to allocate a so-called +sentinel along with the intrusive list, which serves as the end +iterator, providing the back-link to the last element. However conforming to the +C++ convention it is illegal to operator++ beyond the sentinel and it +also must not be dereferenced.

      + +

      These constraints allow for some implementation freedom to the ilist +how to allocate and store the sentinel. The corresponding policy is dictated +by ilist_traits<T>. By default a T gets heap-allocated +whenever the need for a sentinel arises.

      + +

      While the default policy is sufficient in most cases, it may break down when +T does not provide a default constructor. Also, in the case of many +instances of ilists, the memory overhead of the associated sentinels +is wasted. To alleviate the situation with numerous and voluminous +T-sentinels, sometimes a trick is employed, leading to ghostly +sentinels.

      + +

      Ghostly sentinels are obtained by specially-crafted ilist_traits<T> +which superpose the sentinel with the ilist instance in memory. Pointer +arithmetic is used to obtain the sentinel, which is relative to the +ilist's this pointer. The ilist is augmented by an +extra pointer, which serves as the back-link of the sentinel. This is the only +field in the ghostly sentinel which can be legally accessed.

      @@ -976,14 +1208,14 @@ and erasing, but does not support iteration.

      -

      SmallPtrSet has all the advantages of SmallSet (and a SmallSet of pointers is -transparently implemented with a SmallPtrSet), but also supports iterators. If +

      SmallPtrSet has all the advantages of SmallSet (and a SmallSet of pointers is +transparently implemented with a SmallPtrSet), but also supports iterators. If more than 'N' insertions are performed, a single quadratically probed hash table is allocated and grows as needed, providing extremely efficient access (constant time insertion/deleting/queries with low constant factors) and is very stingy with malloc traffic.

      -

      Note that, unlike std::set, the iterators of SmallPtrSet are invalidated +

      Note that, unlike std::set, the iterators of SmallPtrSet are invalidated whenever an insertion occurs. Also, the values visited by the iterators are not visited in sorted order.

      @@ -1139,21 +1371,16 @@ factors, and produces a lot of malloc traffic. It should be avoided.

      The STL provides several other options, such as std::multiset and the various -"hash_set" like containers (whether from C++ TR1 or from the SGI library).

      +"hash_set" like containers (whether from C++ TR1 or from the SGI library). We +never use hash_set and unordered_set because they are generally very expensive +(each insertion requires a malloc) and very non-portable. +

      std::multiset is useful if you're not interested in elimination of duplicates, but has all the drawbacks of std::set. A sorted vector (where you don't delete duplicate entries) or some other approach is almost always better.

      -

      The various hash_set implementations (exposed portably by -"llvm/ADT/hash_set") is a simple chained hashtable. This algorithm is as malloc -intensive as std::set (performing an allocation for each element inserted, -thus having really high constant factors) but (usually) provides O(1) -insertion/deletion of elements. This can be useful if your elements are large -(thus making the constant-factor cost relatively low) or if comparisons are -expensive. Element iteration does not visit elements in a useful order.

      -
      @@ -1207,7 +1434,7 @@ to the key string for a value.

      The StringMap is very fast for several reasons: quadratic probing is very cache efficient for lookups, the hash value of strings in buckets is not -recomputed when lookup up an element, StringMap rarely has to touch the +recomputed when looking up an element, StringMap rarely has to touch the memory for unrelated objects when looking up a value (even when hash collisions happen), hash table growth does not recompute the hash values for strings already in the table, and each pair in the map is store in a single allocation @@ -1263,6 +1490,23 @@ inserted into the map) that it needs internally.

      + + + +
      + +

      +ValueMap is a wrapper around a DenseMap mapping +Value*s (or subclasses) to another type. When a Value is deleted or RAUW'ed, +ValueMap will update itself so the new version of the key is mapped to the same +value, just as if the key were a WeakVH. You can configure exactly how this +happens, and what else happens on these two events, by passing +a Config parameter to the ValueMap template.

      + +
      +
      <map> @@ -1292,19 +1536,27 @@ another element takes place).

      The STL provides several other options, such as std::multimap and the various -"hash_map" like containers (whether from C++ TR1 or from the SGI library).

      +"hash_map" like containers (whether from C++ TR1 or from the SGI library). We +never use hash_set and unordered_set because they are generally very expensive +(each insertion requires a malloc) and very non-portable.

      std::multimap is useful if you want to map a key to multiple values, but has all the drawbacks of std::map. A sorted vector or some other approach is almost always better.

      -

      The various hash_map implementations (exposed portably by -"llvm/ADT/hash_map") are simple chained hash tables. This algorithm is as -malloc intensive as std::map (performing an allocation for each element -inserted, thus having really high constant factors) but (usually) provides O(1) -insertion/deletion of elements. This can be useful if your elements are large -(thus making the constant-factor cost relatively low) or if comparisons are -expensive. Element iteration does not visit elements in a useful order.

      +
      + + + + +
      + +

      +TODO: const char* vs stringref vs smallstring vs std::string. Describe twine, +xref to #string_apis. +

      @@ -1331,7 +1583,7 @@ please don't use it.

      -

      The BitVector container provides a fixed size set of bits for manipulation. +

      The BitVector container provides a dynamic size set of bits for manipulation. It supports individual bit setting/testing, as well as set operations. The set operations take time O(size of bitvector), but operations are performed one word at a time, instead of one bit at a time. This makes the BitVector very fast for @@ -1340,6 +1592,25 @@ the number of set bits to be high (IE a dense set).

      + + + +
      +

      The SmallBitVector container provides the same interface as BitVector, but +it is optimized for the case where only a small number of bits, less than +25 or so, are needed. It also transparently supports larger bit counts, but +slightly less efficiently than a plain BitVector, so SmallBitVector should +only be used when larger counts are rare. +

      + +

      +At this time, SmallBitVector does not support set operations (and, or, xor), +and its operator[] does not provide an assignable lvalue. +

      +
      +
      SparseBitVector @@ -1419,7 +1690,7 @@ an example that prints the name of a BasicBlock and the number of for (Function::iterator i = func->begin(), e = func->end(); i != e; ++i) // Print out the name of the basic block if it has one, and then the // number of instructions that it contains - llvm::cerr << "Basic block (name=" << i->getName() << ") has " + errs() << "Basic block (name=" << i->getName() << ") has " << i->size() << " instructions.\n";
      @@ -1452,14 +1723,14 @@ a BasicBlock:

      for (BasicBlock::iterator i = blk->begin(), e = blk->end(); i != e; ++i) // The next statement works since operator<<(ostream&,...) // is overloaded for Instruction& - llvm::cerr << *i << "\n"; + errs() << *i << "\n";

      However, this isn't really the best way to print out the contents of a BasicBlock! Since the ostream operators are overloaded for virtually anything you'll care about, you could have just invoked the print routine on the -basic block itself: llvm::cerr << *blk << "\n";.

      +basic block itself: errs() << *blk << "\n";.

      @@ -1484,8 +1755,8 @@ small example that shows how to dump all instructions in a function to the stand #include "llvm/Support/InstIterator.h" // F is a pointer to a Function instance -for (inst_iterator i = inst_begin(F), e = inst_end(F); i != e; ++i) - llvm::cerr << *i << "\n"; +for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) + errs() << *I << "\n"; @@ -1497,7 +1768,10 @@ F, all you would need to do is something like:

       std::set<Instruction*> worklist;
      -worklist.insert(inst_begin(F), inst_end(F));
      +// or better yet, SmallPtrSet<Instruction*, 64> worklist;
      +
      +for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
      +   worklist.insert(&*I);
       
      @@ -1561,11 +1835,26 @@ without actually obtaining it via iteration over some structure:

      void printNextInstruction(Instruction* inst) { BasicBlock::iterator it(inst); ++it; // After this line, it refers to the instruction after *inst - if (it != inst->getParent()->end()) llvm::cerr << *it << "\n"; + if (it != inst->getParent()->end()) errs() << *it << "\n"; } +

      Unfortunately, these implicit conversions come at a cost; they prevent +these iterators from conforming to standard iterator conventions, and thus +from being usable with standard algorithms and containers. For example, they +prevent the following code, where B is a BasicBlock, +from compiling:

      + +
      +
      +  llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end());
      +
      +
      + +

      Because of this, these implicit conversions may be removed some day, +and operator* changed to return a pointer instead of a reference.

      + @@ -1609,7 +1898,7 @@ class OurFunctionPass : public FunctionPass { virtual runOnFunction(Function& F) { for (Function::iterator b = F.begin(), be = F.end(); b != be; ++b) { - for (BasicBlock::iterator i = b->begin(); ie = b->end(); i != ie; ++i) { + for (BasicBlock::iterator i = b->begin(), ie = b->end(); i != ie; ++i) { if (CallInst* callInst = dyn_cast<CallInst>(&*i)) { // We know we've encountered a call instruction, so we @@ -1679,13 +1968,17 @@ Function *F = ...; for (Value::use_iterator i = F->use_begin(), e = F->use_end(); i != e; ++i) if (Instruction *Inst = dyn_cast<Instruction>(*i)) { - llvm::cerr << "F is used in instruction:\n"; - llvm::cerr << *Inst << "\n"; + errs() << "F is used in instruction:\n"; + errs() << *Inst << "\n"; } -

      Alternately, it's common to have an instance of the Note that dereferencing a Value::use_iterator is not a very cheap +operation. Instead of performing *i above several times, consider +doing it only once in the loop body and reusing its result.

      + +

      Alternatively, it's common to have an instance of the User Class and need to know what Values are used by it. The list of all Values used by a User is known as a use-def chain. Instances of class @@ -1704,10 +1997,13 @@ for (User::op_iterator i = pi->op_begin(), e = pi->op_end(); i != e; ++i) - +

      Declaring objects as const is an important tool of enforcing +mutation free algorithms (such as analyses, etc.). For this purpose above +iterators come in constant flavors as Value::const_use_iterator +and Value::const_op_iterator. They automatically arise when +calling use/op_begin() on const Value*s or +const User*s respectively. Upon dereferencing, they return +const Use*s. Otherwise the above patterns remain unchanged.

      @@ -1914,9 +2210,7 @@ erase function to remove your instruction. For example:

       Instruction *I = .. ;
      -BasicBlock *BB = I->getParent();
      -
      -BB->getInstList().erase(I);
      +I->eraseFromParent();
       
      @@ -1941,9 +2235,9 @@ and ReplaceInstWithInst.