X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FProgrammersManual.html;h=46b82ddc21fb6f23c335d58a578b248562ae68c7;hb=95df6b3603e228cea714be21997fec82cb03011e;hp=1ac08b42f4e92e21f90e53254c4f18255d536391;hpb=261efe953b14da0446ba5bcafa7f01f247106e9f;p=oota-llvm.git diff --git a/docs/ProgrammersManual.html b/docs/ProgrammersManual.html index 1ac08b42f4e..46b82ddc21f 100644 --- a/docs/ProgrammersManual.html +++ b/docs/ProgrammersManual.html @@ -1,47 +1,90 @@ - + LLVM Programmer's Manual + - - - - - - - -
  LLVM Programmer's Manual
+ + +
+ LLVM Programmer's Manual +
+
    -
  1. Introduction
  2. +
  3. Introduction
  4. General Information
  5. Important and useful LLVM APIs
  6. +
  7. Picking the Right Data Structure for a Task + +
  8. Helpful Hints for Common Operations
  9. Making simple changes @@ -69,7 +114,9 @@ use-def chains
  10. Deleting Instructions
  11. Replacing an Instruction with another Value
  12. +
  13. Deleting GlobalVariables
  14. + +--> + +
  15. Advanced Topics +
  16. +
  17. The Core LLVM Class Hierarchy Reference -

    Written by Chris Lattner,Dinakar Dhurjati, and Joel Stanley

    -

+ +
+

Written by Chris Lattner, + Dinakar Dhurjati, + Gabor Greif, + Joel Stanley and + Reid Spencer

+
+ - - - - - - -
Introduction
- - - - - - - -
General Information
- - - - - - - - -
   The C++ Standard Template -Library
- - - - - - - - -
   Other useful references
- - - - - - - -
Important and useful LLVM -APIs
- - - - - - - - -
   The isa<>, -cast<> and dyn_cast<> templates
- - - - - - - - -
   The DEBUG() macro -& -debug option
- -

-
Fine grained debug info with DEBUG_TYPE() and the -debug-only -option

- - - - - - - - -
   The Statistic -template & -stats option
- - - - - - - -
Helpful Hints for Common -Operations
- - - - - - - - -
   Basic Inspection and -Traversal Routines
+ +
+
+   7646 bitcodewriter   - Number of normal instructions
+    725 bitcodewriter   - Number of oversized instructions
+ 129996 bitcodewriter   - Number of bitcode bytes written
+   2817 raise           - Number of insts DCEd or constprop'd
+   3213 raise           - Number of cast-of-self removed
+   5046 raise           - Number of expression trees converted
+     75 raise           - Number of other getelementptr's formed
+    138 raise           - Number of load/store peepholes
+     42 deadtypeelim    - Number of unused typenames removed from symtab
+    392 funcresolve     - Number of varargs functions resolved
+     27 globaldce       - Number of global variables removed
+      2 adce            - Number of basic blocks removed
+    134 cee             - Number of branches revectored
+     49 cee             - Number of setcc instruction eliminated
+    532 gcse            - Number of loads removed
+   2919 gcse            - Number of instructions removed
+     86 indvars         - Number of canonical indvars added
+     87 indvars         - Number of aux indvars removed
+     25 instcombine     - Number of dead inst eliminate
+    434 instcombine     - Number of insts combined
+    248 licm            - Number of load insts hoisted
+   1298 licm            - Number of insts hoisted to a loop pre-header
+      3 licm            - Number of insts hoisted to multiple loop preds (bad, no loop pre-header)
+     75 mem2reg         - Number of alloca's promoted
+   1444 cfgsimplify     - Number of blocks simplified
+
+
+ +

Obviously, with so many optimizations, having a unified framework for this +stuff is very nice. Making your pass fit well into the framework makes it more +maintainable and useful.

+ + + + +
+ Viewing graphs while debugging code +
+ +
+ +

Several of the important data structures in LLVM are graphs: for example +CFGs made out of LLVM BasicBlocks, CFGs made out of +LLVM MachineBasicBlocks, and +Instruction Selection +DAGs. In many cases, while debugging various parts of the compiler, it is +nice to instantly visualize these graphs.

+ +

LLVM provides several callbacks that are available in a debug build to do +exactly that. If you call the Function::viewCFG() method, for example, +the current LLVM tool will pop up a window containing the CFG for the function +where each basic block is a node in the graph, and each node contains the +instructions in the block. Similarly, there also exists +Function::viewCFGOnly() (does not include the instructions), the +MachineFunction::viewCFG() and MachineFunction::viewCFGOnly(), +and the SelectionDAG::viewGraph() methods. Within GDB, for example, +you can usually use something like call DAG.viewGraph() to pop +up a window. Alternatively, you can sprinkle calls to these functions in your +code in places you want to debug.

+ +

Getting this to work requires a small amount of configuration. On Unix +systems with X11, install the graphviz +toolkit, and make sure 'dot' and 'gv' are in your path. If you are running on +Mac OS/X, download and install the Mac OS/X Graphviz program, and add +/Applications/Graphviz.app/Contents/MacOS/ (or wherever you install +it) to your path. Once in your system and path are set up, rerun the LLVM +configure script and rebuild LLVM to enable this functionality.

+ +

SelectionDAG has been extended to make it easier to locate +interesting nodes in large complex graphs. From gdb, if you +call DAG.setGraphColor(node, "color"), then the +next call DAG.viewGraph() would highlight the node in the +specified color (choices of colors can be found at colors.) More +complex node attributes can be provided with call +DAG.setGraphAttrs(node, "attributes") (choices can be +found at Graph +Attributes.) If you want to restart and clear all the current graph +attributes, then you can call DAG.clearGraphAttrs().

+ +
+ + +
+ Picking the Right Data Structure for a Task +
+ + +
+ +

LLVM has a plethora of data structures in the llvm/ADT/ directory, + and we commonly use STL data structures. This section describes the trade-offs + you should consider when you pick one.

+ +

+The first step is a choose your own adventure: do you want a sequential +container, a set-like container, or a map-like container? The most important +thing when choosing a container is the algorithmic properties of how you plan to +access the container. Based on that, you should use:

+ -

-
Iterating over the BasicBlocks in a Function

-
+ + +
+ Sequential Containers (std::vector, std::list, etc) +
+ +
+There are a variety of sequential containers available for you, based on your +needs. Pick the first in this section that will do what you want. +
+ + +
+ Fixed Size Arrays +
+ +
+

Fixed size arrays are very simple and very fast. They are good if you know +exactly how many elements you have, or you have a (low) upper bound on how many +you have.

+
+ + +
+ Heap Allocated Arrays +
+ +
+

Heap allocated arrays (new[] + delete[]) are also simple. They are good if +the number of elements is variable, if you know how many elements you will need +before the array is allocated, and if the array is usually large (if not, +consider a SmallVector). The cost of a heap +allocated array is the cost of the new/delete (aka malloc/free). Also note that +if you are allocating an array of a type with a constructor, the constructor and +destructors will be run for every element in the array (re-sizable vectors only +construct those elements actually used).

+
+ + +
+ "llvm/ADT/SmallVector.h" +
+ +
+

SmallVector<Type, N> is a simple class that looks and smells +just like vector<Type>: +it supports efficient iteration, lays out elements in memory order (so you can +do pointer arithmetic between elements), supports efficient push_back/pop_back +operations, supports efficient random access to its elements, etc.

+ +

The advantage of SmallVector is that it allocates space for +some number of elements (N) in the object itself. Because of this, if +the SmallVector is dynamically smaller than N, no malloc is performed. This can +be a big win in cases where the malloc/free call is far more expensive than the +code that fiddles around with the elements.

+ +

This is good for vectors that are "usually small" (e.g. the number of +predecessors/successors of a block is usually less than 8). On the other hand, +this makes the size of the SmallVector itself large, so you don't want to +allocate lots of them (doing so will waste a lot of space). As such, +SmallVectors are most useful when on the stack.

+ +

SmallVector also provides a nice portable and efficient replacement for +alloca.

+ +
+ + +
+ <vector> +
+ +
+

+std::vector is well loved and respected. It is useful when SmallVector isn't: +when the size of the vector is often large (thus the small optimization will +rarely be a benefit) or if you will be allocating many instances of the vector +itself (which would waste space for elements that aren't in the container). +vector is also useful when interfacing with code that expects vectors :). +

+ +

One worthwhile note about std::vector: avoid code like this:

+ +
+
+for ( ... ) {
+   std::vector<foo> V;
+   use V;
+}
+
+
+ +

Instead, write this as:

+ +
+
+std::vector<foo> V;
+for ( ... ) {
+   use V;
+   V.clear();
+}
+
+
+ +

Doing so will save (at least) one heap allocation and free per iteration of +the loop.

+ +
+ + +
+ <deque> +
+ +
+

std::deque is, in some senses, a generalized version of std::vector. Like +std::vector, it provides constant time random access and other similar +properties, but it also provides efficient access to the front of the list. It +does not guarantee continuity of elements within memory.

+ +

In exchange for this extra flexibility, std::deque has significantly higher +constant factor costs than std::vector. If possible, use std::vector or +something cheaper.

+
+ + +
+ <list> +
+ +
+

std::list is an extremely inefficient class that is rarely useful. +It performs a heap allocation for every element inserted into it, thus having an +extremely high constant factor, particularly for small data types. std::list +also only supports bidirectional iteration, not random access iteration.

+ +

In exchange for this high cost, std::list supports efficient access to both +ends of the list (like std::deque, but unlike std::vector or SmallVector). In +addition, the iterator invalidation characteristics of std::list are stronger +than that of a vector class: inserting or removing an element into the list does +not invalidate iterator or pointers to other elements in the list.

+
+ + +
+ llvm/ADT/ilist +
+ +
+

ilist<T> implements an 'intrusive' doubly-linked list. It is +intrusive, because it requires the element to store and provide access to the +prev/next pointers for the list.

+ +

ilist has the same drawbacks as std::list, and additionally requires an +ilist_traits implementation for the element type, but it provides some novel +characteristics. In particular, it can efficiently store polymorphic objects, +the traits class is informed when an element is inserted or removed from the +list, and ilists are guaranteed to support a constant-time splice operation. +

+ +

These properties are exactly what we want for things like Instructions and +basic blocks, which is why these are implemented with ilists.

+
+ + +
+ Other Sequential Container options +
+ +
+

Other STL containers are available, such as std::string.

+ +

There are also various STL adapter classes such as std::queue, +std::priority_queue, std::stack, etc. These provide simplified access to an +underlying container but don't affect the cost of the container itself.

+ +
+ + + +
+ Set-Like Containers (std::set, SmallSet, SetVector, etc) +
+ +
+ +

Set-like containers are useful when you need to canonicalize multiple values +into a single representation. There are several different choices for how to do +this, providing various trade-offs.

+ +
+ + + +
+ A sorted 'vector' +
+ +
+ +

If you intend to insert a lot of elements, then do a lot of queries, a +great approach is to use a vector (or other sequential container) with +std::sort+std::unique to remove duplicates. This approach works really well if +your usage pattern has these two distinct phases (insert then query), and can be +coupled with a good choice of sequential container. +

+ +

+This combination provides the several nice properties: the result data is +contiguous in memory (good for cache locality), has few allocations, is easy to +address (iterators in the final vector are just indices or pointers), and can be +efficiently queried with a standard binary or radix search.

+ +
+ + +
+ "llvm/ADT/SmallSet.h" +
+ +
+ +

If you have a set-like data structure that is usually small and whose elements +are reasonably small, a SmallSet<Type, N> is a good choice. This set +has space for N elements in place (thus, if the set is dynamically smaller than +N, no malloc traffic is required) and accesses them with a simple linear search. +When the set grows beyond 'N' elements, it allocates a more expensive representation that +guarantees efficient access (for most types, it falls back to std::set, but for +pointers it uses something far better, SmallPtrSet).

+ +

The magic of this class is that it handles small sets extremely efficiently, +but gracefully handles extremely large sets without loss of efficiency. The +drawback is that the interface is quite small: it supports insertion, queries +and erasing, but does not support iteration.

+ +
+ + +
+ "llvm/ADT/SmallPtrSet.h" +
+ +
+ +

SmallPtrSet has all the advantages of SmallSet (and a SmallSet of pointers is +transparently implemented with a SmallPtrSet), but also supports iterators. If +more than 'N' insertions are performed, a single quadratically +probed hash table is allocated and grows as needed, providing extremely +efficient access (constant time insertion/deleting/queries with low constant +factors) and is very stingy with malloc traffic.

+ +

Note that, unlike std::set, the iterators of SmallPtrSet are invalidated +whenever an insertion occurs. Also, the values visited by the iterators are not +visited in sorted order.

+ +
+ + +
+ "llvm/ADT/DenseSet.h" +
+ +
+ +

+DenseSet is a simple quadratically probed hash table. It excels at supporting +small values: it uses a single allocation to hold all of the pairs that +are currently inserted in the set. DenseSet is a great way to unique small +values that are not simple pointers (use SmallPtrSet for pointers). Note that DenseSet has +the same requirements for the value type that DenseMap has. +

+ +
+ + +
+ "llvm/ADT/FoldingSet.h" +
+ +
+ +

+FoldingSet is an aggregate class that is really good at uniquing +expensive-to-create or polymorphic objects. It is a combination of a chained +hash table with intrusive links (uniqued objects are required to inherit from +FoldingSetNode) that uses SmallVector as part of +its ID process.

+ +

Consider a case where you want to implement a "getOrCreateFoo" method for +a complex object (for example, a node in the code generator). The client has a +description of *what* it wants to generate (it knows the opcode and all the +operands), but we don't want to 'new' a node, then try inserting it into a set +only to find out it already exists, at which point we would have to delete it +and return the node that already exists. +

+ +

To support this style of client, FoldingSet perform a query with a +FoldingSetNodeID (which wraps SmallVector) that can be used to describe the +element that we want to query for. The query either returns the element +matching the ID or it returns an opaque ID that indicates where insertion should +take place. Construction of the ID usually does not require heap traffic.

+ +

Because FoldingSet uses intrusive links, it can support polymorphic objects +in the set (for example, you can have SDNode instances mixed with LoadSDNodes). +Because the elements are individually allocated, pointers to the elements are +stable: inserting or removing elements does not invalidate any pointers to other +elements. +

+ +
+ + +
+ <set> +
+ +
+ +

std::set is a reasonable all-around set class, which is decent at +many things but great at nothing. std::set allocates memory for each element +inserted (thus it is very malloc intensive) and typically stores three pointers +per element in the set (thus adding a large amount of per-element space +overhead). It offers guaranteed log(n) performance, which is not particularly +fast from a complexity standpoint (particularly if the elements of the set are +expensive to compare, like strings), and has extremely high constant factors for +lookup, insertion and removal.

+ +

The advantages of std::set are that its iterators are stable (deleting or +inserting an element from the set does not affect iterators or pointers to other +elements) and that iteration over the set is guaranteed to be in sorted order. +If the elements in the set are large, then the relative overhead of the pointers +and malloc traffic is not a big deal, but if the elements of the set are small, +std::set is almost never a good choice.

+ +
+ + +
+ "llvm/ADT/SetVector.h" +
+ +
+

LLVM's SetVector<Type> is an adapter class that combines your choice of +a set-like container along with a Sequential +Container. The important property +that this provides is efficient insertion with uniquing (duplicate elements are +ignored) with iteration support. It implements this by inserting elements into +both a set-like container and the sequential container, using the set-like +container for uniquing and the sequential container for iteration. +

+ +

The difference between SetVector and other sets is that the order of +iteration is guaranteed to match the order of insertion into the SetVector. +This property is really important for things like sets of pointers. Because +pointer values are non-deterministic (e.g. vary across runs of the program on +different machines), iterating over the pointers in the set will +not be in a well-defined order.

+ +

+The drawback of SetVector is that it requires twice as much space as a normal +set and has the sum of constant factors from the set-like container and the +sequential container that it uses. Use it *only* if you need to iterate over +the elements in a deterministic order. SetVector is also expensive to delete +elements out of (linear time), unless you use it's "pop_back" method, which is +faster. +

+ +

SetVector is an adapter class that defaults to using std::vector and std::set +for the underlying containers, so it is quite expensive. However, +"llvm/ADT/SetVector.h" also provides a SmallSetVector class, which +defaults to using a SmallVector and SmallSet of a specified size. If you use +this, and if your sets are dynamically smaller than N, you will save a lot of +heap traffic.

+ +
+ + +
+ "llvm/ADT/UniqueVector.h" +
+ +
+ +

+UniqueVector is similar to SetVector, but it +retains a unique ID for each element inserted into the set. It internally +contains a map and a vector, and it assigns a unique ID for each value inserted +into the set.

+ +

UniqueVector is very expensive: its cost is the sum of the cost of +maintaining both the map and vector, it has high complexity, high constant +factors, and produces a lot of malloc traffic. It should be avoided.

+ +
+ + + +
+ Other Set-Like Container Options +
+ +
+ +

+The STL provides several other options, such as std::multiset and the various +"hash_set" like containers (whether from C++ TR1 or from the SGI library).

+ +

std::multiset is useful if you're not interested in elimination of +duplicates, but has all the drawbacks of std::set. A sorted vector (where you +don't delete duplicate entries) or some other approach is almost always +better.

+ +

The various hash_set implementations (exposed portably by +"llvm/ADT/hash_set") is a simple chained hashtable. This algorithm is as malloc +intensive as std::set (performing an allocation for each element inserted, +thus having really high constant factors) but (usually) provides O(1) +insertion/deletion of elements. This can be useful if your elements are large +(thus making the constant-factor cost relatively low) or if comparisons are +expensive. Element iteration does not visit elements in a useful order.

+ +
+ + +
+ Map-Like Containers (std::map, DenseMap, etc) +
+ +
+Map-like containers are useful when you want to associate data to a key. As +usual, there are a lot of different ways to do this. :) +
+ + +
+ A sorted 'vector' +
+ +
+ +

+If your usage pattern follows a strict insert-then-query approach, you can +trivially use the same approach as sorted vectors +for set-like containers. The only difference is that your query function +(which uses std::lower_bound to get efficient log(n) lookup) should only compare +the key, not both the key and value. This yields the same advantages as sorted +vectors for sets. +

+
+ + +
+ "llvm/ADT/StringMap.h" +
+ +
+ +

+Strings are commonly used as keys in maps, and they are difficult to support +efficiently: they are variable length, inefficient to hash and compare when +long, expensive to copy, etc. StringMap is a specialized container designed to +cope with these issues. It supports mapping an arbitrary range of bytes to an +arbitrary other object.

+ +

The StringMap implementation uses a quadratically-probed hash table, where +the buckets store a pointer to the heap allocated entries (and some other +stuff). The entries in the map must be heap allocated because the strings are +variable length. The string data (key) and the element object (value) are +stored in the same allocation with the string data immediately after the element +object. This container guarantees the "(char*)(&Value+1)" points +to the key string for a value.

+ +

The StringMap is very fast for several reasons: quadratic probing is very +cache efficient for lookups, the hash value of strings in buckets is not +recomputed when lookup up an element, StringMap rarely has to touch the +memory for unrelated objects when looking up a value (even when hash collisions +happen), hash table growth does not recompute the hash values for strings +already in the table, and each pair in the map is store in a single allocation +(the string data is stored in the same allocation as the Value of a pair).

+ +

StringMap also provides query methods that take byte ranges, so it only ever +copies a string if a value is inserted into the table.

+
+ + +
+ "llvm/ADT/IndexedMap.h" +
+ +
+

+IndexedMap is a specialized container for mapping small dense integers (or +values that can be mapped to small dense integers) to some other type. It is +internally implemented as a vector with a mapping function that maps the keys to +the dense integer range. +

+ +

+This is useful for cases like virtual registers in the LLVM code generator: they +have a dense mapping that is offset by a compile-time constant (the first +virtual register ID).

+ +
+ + +
+ "llvm/ADT/DenseMap.h" +
+ +
+ +

+DenseMap is a simple quadratically probed hash table. It excels at supporting +small keys and values: it uses a single allocation to hold all of the pairs that +are currently inserted in the map. DenseMap is a great way to map pointers to +pointers, or map other small types to each other. +

+ +

+There are several aspects of DenseMap that you should be aware of, however. The +iterators in a densemap are invalidated whenever an insertion occurs, unlike +map. Also, because DenseMap allocates space for a large number of key/value +pairs (it starts with 64 by default), it will waste a lot of space if your keys +or values are large. Finally, you must implement a partial specialization of +DenseMapInfo for the key that you want, if it isn't already supported. This +is required to tell DenseMap about two special marker values (which can never be +inserted into the map) that it needs internally.

+ +
+ + +
+ <map> +
+ +
+ +

+std::map has similar characteristics to std::set: it uses +a single allocation per pair inserted into the map, it offers log(n) lookup with +an extremely large constant factor, imposes a space penalty of 3 pointers per +pair in the map, etc.

+ +

std::map is most useful when your keys or values are very large, if you need +to iterate over the collection in sorted order, or if you need stable iterators +into the map (i.e. they don't get invalidated if an insertion or deletion of +another element takes place).

+ +
+ + +
+ Other Map-Like Container Options +
+ +
+ +

+The STL provides several other options, such as std::multimap and the various +"hash_map" like containers (whether from C++ TR1 or from the SGI library).

+ +

std::multimap is useful if you want to map a key to multiple values, but has +all the drawbacks of std::map. A sorted vector or some other approach is almost +always better.

+ +

The various hash_map implementations (exposed portably by +"llvm/ADT/hash_map") are simple chained hash tables. This algorithm is as +malloc intensive as std::map (performing an allocation for each element +inserted, thus having really high constant factors) but (usually) provides O(1) +insertion/deletion of elements. This can be useful if your elements are large +(thus making the constant-factor cost relatively low) or if comparisons are +expensive. Element iteration does not visit elements in a useful order.

+ +
+ + +
+ Bit storage containers (BitVector, SparseBitVector) +
+ +
+

Unlike the other containers, there are only two bit storage containers, and +choosing when to use each is relatively straightforward.

+ +

One additional option is +std::vector<bool>: we discourage its use for two reasons 1) the +implementation in many common compilers (e.g. commonly available versions of +GCC) is extremely inefficient and 2) the C++ standards committee is likely to +deprecate this container and/or change it significantly somehow. In any case, +please don't use it.

+
+ + +
+ BitVector +
+ +
+

The BitVector container provides a fixed size set of bits for manipulation. +It supports individual bit setting/testing, as well as set operations. The set +operations take time O(size of bitvector), but operations are performed one word +at a time, instead of one bit at a time. This makes the BitVector very fast for +set operations compared to other containers. Use the BitVector when you expect +the number of set bits to be high (IE a dense set). +

+
+ + +
+ SparseBitVector +
+ +
+

The SparseBitVector container is much like BitVector, with one major +difference: Only the bits that are set, are stored. This makes the +SparseBitVector much more space efficient than BitVector when the set is sparse, +as well as making set operations O(number of set bits) instead of O(size of +universe). The downside to the SparseBitVector is that setting and testing of random bits is O(N), and on large SparseBitVectors, this can be slower than BitVector. In our implementation, setting or testing bits in sorted order +(either forwards or reverse) is O(1) worst case. Testing and setting bits within 128 bits (depends on size) of the current bit is also O(1). As a general statement, testing/setting bits in a SparseBitVector is O(distance away from last set bit). +

+
+ + +
+ Helpful Hints for Common Operations +
+ + +
+ +

This section describes how to perform some very simple transformations of +LLVM code. This is meant to give examples of common idioms used, showing the +practical side of LLVM transformations.

Because this is a "how-to" section, +you should also read about the main classes that you will be working with. The +Core LLVM Class Hierarchy Reference contains details +and descriptions of the main classes that you should know about.

+ +
+ + + +
+ Basic Inspection and Traversal Routines +
+ +
+ +

The LLVM compiler infrastructure have many different data structures that may +be traversed. Following the example of the C++ standard template library, the +techniques used to traverse these various data structures are all basically the +same. For a enumerable sequence of values, the XXXbegin() function (or +method) returns an iterator to the start of the sequence, the XXXend() +function returns an iterator pointing to one past the last valid element of the +sequence, and there is some XXXiterator data type that is common +between the two operations.

+ +

Because the pattern for iteration is common across many different aspects of +the program representation, the standard template library algorithms may be used +on them, and it is easier to remember how to iterate. First we show a few common +examples of the data structures that need to be traversed. Other data +structures are traversed in very similar ways.

+ +
+ + +
+ Iterating over the BasicBlocks in a Function +
+ +
+ +

It's quite common to have a Function instance that you'd like to +transform in some way; in particular, you'd like to manipulate its +BasicBlocks. To facilitate this, you'll need to iterate over all of +the BasicBlocks that constitute the Function. The following is +an example that prints the name of a BasicBlock and the number of +Instructions it contains:

+ +
+
+// func is a pointer to a Function instance
+for (Function::iterator i = func->begin(), e = func->end(); i != e; ++i)
+  // Print out the name of the basic block if it has one, and then the
+  // number of instructions that it contains
+  llvm::cerr << "Basic block (name=" << i->getName() << ") has "
+             << i->size() << " instructions.\n";
+
+
+ +

Note that i can be used as if it were a pointer for the purposes of invoking member functions of the Instruction class. This is because the indirection operator is overloaded for the iterator classes. In the above code, the expression i->size() is -exactly equivalent to (*i).size() just like you'd expect. - -

-
Iterating over the Instructions in a BasicBlock

- -

-
Iterating over the Instructions in a Function

- -

-
Turning an iterator into a class -pointer (and vice-versa)

-
+ + +
+ Iterating over the Instructions in a BasicBlock +
+ +
+ +

Just like when dealing with BasicBlocks in Functions, it's +easy to iterate over the individual instructions that make up +BasicBlocks. Here's a code snippet that prints out each instruction in +a BasicBlock:

+ +
+
+// blk is a pointer to a BasicBlock instance
+for (BasicBlock::iterator i = blk->begin(), e = blk->end(); i != e; ++i)
+   // The next statement works since operator<<(ostream&,...)
+   // is overloaded for Instruction&
+   llvm::cerr << *i << "\n";
+
+
+ +

However, this isn't really the best way to print out the contents of a +BasicBlock! Since the ostream operators are overloaded for virtually +anything you'll care about, you could have just invoked the print routine on the +basic block itself: llvm::cerr << *blk << "\n";.

+ +
+ + +
+ Iterating over the Instructions in a Function +
+ +
+ +

If you're finding that you commonly iterate over a Function's +BasicBlocks and then that BasicBlock's Instructions, +InstIterator should be used instead. You'll need to include llvm/Support/InstIterator.h, +and then instantiate InstIterators explicitly in your code. Here's a +small example that shows how to dump all instructions in a function to the standard error stream:

+ +

+
+#include "llvm/Support/InstIterator.h"
+
+// F is a pointer to a Function instance
+for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
+  llvm::cerr << *I << "\n";
+
+
+ +

Easy, isn't it? You can also use InstIterators to fill a +work list with its initial contents. For example, if you wanted to +initialize a work list to contain all instructions in a Function +F, all you would need to do is something like:

+ +
+
+std::set<Instruction*> worklist;
+// or better yet, SmallPtrSet<Instruction*, 64> worklist;
+
+for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
+   worklist.insert(&*I);
+
+
+ +

The STL set worklist would now contain all instructions in the +Function pointed to by F.

+ +
+ + +
+ Turning an iterator into a class pointer (and + vice-versa) +
+ +
+ +

Sometimes, it'll be useful to grab a reference (or pointer) to a class instance when all you've got at hand is an iterator. Well, extracting -a reference or a pointer from an iterator is very straightforward. +a reference or a pointer from an iterator is very straight-forward. Assuming that i is a BasicBlock::iterator and j -is a BasicBlock::const_iterator: -

    Instruction& inst = *i;   // grab reference to instruction reference
Instruction* pinst = &*i; // grab pointer to instruction reference
const Instruction& inst = *j;
-However, the iterators you'll be working with in the LLVM framework are -special: they will automatically convert to a ptr-to-instance type -whenever they need to. Instead of dereferencing the iterator and then -taking the address of the result, you can simply assign the iterator to -the proper pointer type and you get the dereference and address-of -operation as a result of the assignment (behind the scenes, this is a -result of overloading casting mechanisms). Thus the last line of the -last example, -
Instruction* pinst = &*i;
-is semantically equivalent to -
Instruction* pinst = i;
-It's also possible to turn a class pointer into the corresponding -iterator. Usually, this conversion is quite inexpensive. The -following code snippet illustrates use of the conversion constructors -provided by LLVM iterators. By using these, you can explicitly grab -the iterator of something without actually obtaining it via iteration -over some structure: -
void printNextInstruction(Instruction* inst) {
BasicBlock::iterator it(inst);
++it; // after this line, it refers to the instruction after *inst.
if (it != inst->getParent()->end()) cerr << *it << "\n";
}
-Of course, this example is strictly pedagogical, because it'd be much -better to explicitly grab the next instruction directly from inst. - -

-
Finding call sites: a slightly -more complex example

-
+ - -

-
Treating calls and invokes the -same way

- -

-
Iterating over def-use & -use-def chains

- - - - - - - - -
   Making simple -changes
- -

-
Important Public Members of the GlobalValue -class

+Superclasses: Constant, +User, Value

+ +

Global values (GlobalVariables or Functions) are the only LLVM values that are +visible in the bodies of all Functions. +Because they are visible at global scope, they are also subject to linking with +other globals defined in different translation units. To control the linking +process, GlobalValues know their linkage rules. Specifically, +GlobalValues know whether they have internal or external linkage, as +defined by the LinkageTypes enumeration.

+ +

If a GlobalValue has internal linkage (equivalent to being +static in C), it is not visible to code outside the current translation +unit, and does not participate in linking. If it has external linkage, it is +visible to external code, and does participate in linking. In addition to +linkage information, GlobalValues keep track of which Module they are currently part of.

+ +

Because GlobalValues are memory objects, they are always referred to +by their address. As such, the Type of a +global is always a pointer to its contents. It is important to remember this +when using the GetElementPtrInst instruction because this pointer must +be dereferenced first. For example, if you have a GlobalVariable (a +subclass of GlobalValue) that is an array of 24 ints, type [24 x +i32], then the GlobalVariable is a pointer to that array. Although +the address of the first element of this array and the value of the +GlobalVariable are the same, they have different types. The +GlobalVariable's type is [24 x i32]. The first element's type +is i32. Because of this, accessing a global value requires you to +dereference the pointer with GetElementPtrInst first, then its elements +can be accessed. This is explained in the LLVM +Language Reference Manual.

+ + + + +
+ Important Public Members of the GlobalValue + class +
+ +
+ - - - - - - - -
   The Function -class
- -

-
Important Public Members of the Function -class

+ +
+ + +
+ The Function class +
+ +
+ +

#include "llvm/Function.h"
doxygen +info: Function Class
+Superclasses: GlobalValue, +Constant, +User, +Value

+ +

The Function class represents a single procedure in LLVM. It is +actually one of the more complex classes in the LLVM heirarchy because it must +keep track of a large amount of data. The Function class keeps track +of a list of BasicBlocks, a list of formal +Arguments, and a +SymbolTable.

+ +

The list of BasicBlocks is the most +commonly used part of Function objects. The list imposes an implicit +ordering of the blocks in the function, which indicate how the code will be +layed out by the backend. Additionally, the first BasicBlock is the implicit entry node for the +Function. It is not legal in LLVM to explicitly branch to this initial +block. There are no implicit exit nodes, and in fact there may be multiple exit +nodes from a single Function. If the BasicBlock list is empty, this indicates that +the Function is actually a function declaration: the actual body of the +function hasn't been linked in yet.

+ +

In addition to a list of BasicBlocks, the +Function class also keeps track of the list of formal Arguments that the function receives. This +container manages the lifetime of the Argument +nodes, just like the BasicBlock list does for +the BasicBlocks.

+ +

The SymbolTable is a very rarely used +LLVM feature that is only used when you have to look up a value by name. Aside +from that, the SymbolTable is used +internally to make sure that there are not conflicts between the names of Instructions, BasicBlocks, or Arguments in the function body.

+ +

Note that Function is a GlobalValue +and therefore also a Constant. The value of the function +is its address (after linking) which is guaranteed to be constant.

+
+ + +
+ Important Public Members of the Function + class +
+ +
+ - - - - - - - -
   The GlobalVariable -class
- -

-
Important Public Members of the GlobalVariable -class

+ +
+ + +
+ The GlobalVariable class +
+ +
+ +

#include "llvm/GlobalVariable.h" +
+doxygen info: GlobalVariable + Class
+Superclasses: GlobalValue, +Constant, +User, +Value

+ +

Global variables are represented with the (suprise suprise) +GlobalVariable class. Like functions, GlobalVariables are also +subclasses of GlobalValue, and as such are +always referenced by their address (global values must live in memory, so their +"name" refers to their constant address). See +GlobalValue for more on this. Global +variables may have an initial value (which must be a +Constant), and if they have an initializer, +they may be marked as "constant" themselves (indicating that their contents +never change at runtime).

+
+ + +
+ Important Public Members of the + GlobalVariable class +
+ +
+ - - - - - - - -
   The Module class
- -

-
Important Public Members of the Module -class

- -

Constructing a Module -is easy. You can optionally provide a name for it (probably based on the -name of the translation unit).

- - - - - - - - -
   The Constant -class and subclasses
- -

-
Important Public Methods

- - - - - - - - -
   The Type class and -Derived Types
- -

-
Important Public Methods

- - - - - - - - -
   The Argument -class
+ +
+ + + +
+ The BasicBlock class +
+ +
+ +

#include "llvm/BasicBlock.h"
+doxygen info: BasicBlock +Class
+Superclass: Value

+ +

This class represents a single entry multiple exit section of the code, +commonly known as a basic block by the compiler community. The +BasicBlock class maintains a list of Instructions, which form the body of the block. +Matching the language definition, the last element of this list of instructions +is always a terminator instruction (a subclass of the TerminatorInst class).

+ +

In addition to tracking the list of instructions that make up the block, the +BasicBlock class also keeps track of the Function that it is embedded into.

+ +

Note that BasicBlocks themselves are Values, because they are referenced by instructions +like branches and can go in the switch tables. BasicBlocks have type +label.

+ +
+ + +
+ Important Public Members of the BasicBlock + class +
+ +
+ +
+ + + +
+ The Argument class +
+ +
+ +

This subclass of Value defines the interface for incoming formal +arguments to a function. A Function maintains a list of its formal +arguments. An argument has a pointer to the parent Function.

+ +
+ -
-
By: Dinakar Dhurjati -and Chris Lattner
-
The LLVM -Compiler Infrastructure
- Last -modified: Fri Nov 7 13:24:22 CST 2003
+
+
+ Valid CSS! + Valid HTML 4.01 Strict + + Dinakar Dhurjati and + Chris Lattner
+ The LLVM Compiler Infrastructure
+ Last modified: $Date$ +
+