X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FStacker.html;h=81b623efa9a15eca20c51a6d4b78b7f755147b7c;hb=1a203571ca94c4770a8cada8ace7fbeb0e65799a;hp=df6aedfceab9135523270ce84c3d8d3cd3106cc9;hpb=45ab10cb6ebe1e4b7efc5ee6aecc3ebe24829d70;p=oota-llvm.git diff --git a/docs/Stacker.html b/docs/Stacker.html index df6aedfceab..81b623efa9a 100644 --- a/docs/Stacker.html +++ b/docs/Stacker.html @@ -1,12 +1,14 @@ - + - Stacker: An Example Of Using LLVM + Stacker: An Example Of Using LLVM +
Stacker: An Example Of Using LLVM
-
+
  1. Abstract
  2. Introduction
  3. @@ -19,19 +21,17 @@
  4. The Wily GetElementPtrInst
  5. Getting Linkage Types Right
  6. Constants Are Easier Than That!
  7. -
- +
  • The Stacker Lexicon
      -
    1. The Stack -
    2. Punctuation -
    3. Comments -
    4. Literals -
    5. Words -
    6. Standard Style -
    7. Built-Ins -
    -
  • +
  • The Stack
  • +
  • Punctuation
  • +
  • Comments
  • +
  • Literals
  • +
  • Words
  • +
  • Standard Style
  • +
  • Built-Ins
  • +
  • Prime: A Complete Example
  • Internal Code Details
      @@ -44,16 +44,15 @@
    1. Test Programs
    2. Exercise
    3. Things Remaining To Be Done
    4. -
    -
  • + -
    -

    Written by Reid Spencer

    -

    + +
    +

    Written by Reid Spencer

    -
    + -
    Abstract
    +
    Abstract

    This document is another way to learn about LLVM. Unlike the LLVM Reference Manual or @@ -61,7 +60,7 @@ about LLVM through the experience of creating a simple programming language named Stacker. Stacker was invented specifically as a demonstration of LLVM. The emphasis in this document is not on describing the -intricacies of LLVM itself, but on how to use it to build your own +intricacies of LLVM itself but on how to use it to build your own compiler system.

    @@ -77,11 +76,11 @@ language running when using LLVM. Furthermore, this was the first language the author ever created using LLVM. The learning curve is included in that four days.

    The language described here, Stacker, is Forth-like. Programs -are simple collections of word definitions and the only thing definitions +are simple collections of word definitions, and the only thing definitions can do is manipulate a stack or generate I/O. Stacker is not a "real" -programming language; its very simple. Although it is computationally +programming language; it's very simple. Although it is computationally complete, you wouldn't use it for your next big project. However, -the fact that it is complete, its simple, and it doesn't have +the fact that it is complete, it's simple, and it doesn't have a C-like syntax make it useful for demonstration purposes. It shows that LLVM could be applied to a wide variety of languages.

    The basic notions behind stacker is very simple. There's a stack of @@ -95,11 +94,11 @@ program in Stacker:

    : MAIN hello_world ;

    This has two "definitions" (Stacker manipulates words, not functions and words have definitions): MAIN and -hello_world. The MAIN definition is standard, it +hello_world. The MAIN definition is standard; it tells Stacker where to start. Here, MAIN is defined to simply invoke the word hello_world. The hello_world definition tells stacker to push the -"Hello, World!" string onto the stack, print it out +"Hello, World!" string on to the stack, print it out (>s), pop it off the stack (DROP), and finally print a carriage return (CR). Although hello_world uses the stack, its net effect is null. Well @@ -123,36 +122,33 @@ learned. Those lessons are described in the following subsections.

    Although I knew that LLVM uses a Single Static Assignment (SSA) format, it wasn't obvious to me how prevalent this idea was in LLVM until I really started using it. Reading the -Programmer's Manual and Language Reference +Programmer's Manual and Language Reference, I noted that most of the important LLVM IR (Intermediate Representation) C++ classes were derived from the Value class. The full power of that simple design only became fully understood once I started constructing executable expressions for Stacker.

    +

    This really makes your programming go faster. Think about compiling code for the following C/C++ expression: (a|b)*((x+1)/(y+1)). Assuming the values are on the stack in the order a, b, x, y, this could be expressed in stacker as: 1 + SWAP 1 + / ROT2 OR *. -You could write a function using LLVM that computes this expression like this:

    -
    
    +You could write a function using LLVM that computes this expression like 
    +this: 

    + +
     Value* 
     expression(BasicBlock* bb, Value* a, Value* b, Value* x, Value* y )
     {
    -    Instruction* tail = bb->getTerminator();
    -    ConstantSInt* one = ConstantSInt::get( Type::IntTy, 1);
    -    BinaryOperator* or1 = 
    -	BinaryOperator::create( Instruction::Or, a, b, "", tail );
    -    BinaryOperator* add1 = 
    -	BinaryOperator::create( Instruction::Add, x, one, "", tail );
    -    BinaryOperator* add2 =
    -	BinaryOperator::create( Instruction::Add, y, one, "", tail );
    -    BinaryOperator* div1 = 
    -	BinaryOperator::create( Instruction::Div, add1, add2, "", tail);
    -    BinaryOperator* mult1 = 
    -	BinaryOperator::create( Instruction::Mul, or1, div1, "", tail );
    -
    +    ConstantInt* one = ConstantInt::get(Type::IntTy, 1);
    +    BinaryOperator* or1 = BinaryOperator::createOr(a, b, "", bb);
    +    BinaryOperator* add1 = BinaryOperator::createAdd(x, one, "", bb);
    +    BinaryOperator* add2 = BinaryOperator::createAdd(y, one, "", bb);
    +    BinaryOperator* div1 = BinaryOperator::createDiv(add1, add2, "", bb);
    +    BinaryOperator* mult1 = BinaryOperator::createMul(or1, div1, "", bb);
         return mult1;
     }
    -
    +
    +

    "Okay, big deal," you say? It is a big deal. Here's why. Note that I didn't have to tell this function which kinds of Values are being passed in. They could be Instructions, Constants, GlobalVariables, or @@ -201,8 +197,8 @@ should be constructed. In general, here's what I learned:

    1. Create your blocks early. While writing your compiler, you will encounter several situations where you know apriori that you will - need several blocks. For example, if-then-else, switch, while and for - statements in C/C++ all need multiple blocks for expression in LVVM. + need several blocks. For example, if-then-else, switch, while, and for + statements in C/C++ all need multiple blocks for expression in LLVM. The rule is, create them early.
    2. Terminate your blocks early. This just reduces the chances that you forget to terminate your blocks which is required (go @@ -222,30 +218,30 @@ should be constructed. In general, here's what I learned: before. This makes for some very clean compiler design.

    The foregoing is such an important principal, its worth making an idiom:

    -
    
    -BasicBlock* bb = new BasicBlock();
    -bb->getInstList().push_back( new Branch( ... ) );
    +
    +BasicBlock* bb = BasicBlock::Create();
    +bb->getInstList().push_back( BranchInst::Create( ... ) );
     new Instruction(..., bb->getTerminator() );
    -
    +

    To make this clear, consider the typical if-then-else statement (see StackerCompiler::handle_if() method). We can set this up in a single function using LLVM in the following way:

     using namespace llvm;
     BasicBlock*
    -MyCompiler::handle_if( BasicBlock* bb, SetCondInst* condition )
    +MyCompiler::handle_if( BasicBlock* bb, ICmpInst* condition )
     {
         // Create the blocks to contain code in the structure of if/then/else
    -    BasicBlock* then_bb = new BasicBlock(); 
    -    BasicBlock* else_bb = new BasicBlock();
    -    BasicBlock* exit_bb = new BasicBlock();
    +    BasicBlock* then_bb = BasicBlock::Create(); 
    +    BasicBlock* else_bb = BasicBlock::Create();
    +    BasicBlock* exit_bb = BasicBlock::Create();
     
         // Insert the branch instruction for the "if"
    -    bb->getInstList().push_back( new BranchInst( then_bb, else_bb, condition ) );
    +    bb->getInstList().push_back( BranchInst::Create( then_bb, else_bb, condition ) );
     
         // Set up the terminating instructions
    -    then->getInstList().push_back( new BranchInst( exit_bb ) );
    -    else->getInstList().push_back( new BranchInst( exit_bb ) );
    +    then->getInstList().push_back( BranchInst::Create( exit_bb ) );
    +    else->getInstList().push_back( BranchInst::Create( exit_bb ) );
     
         // Fill in the then part .. details excised for brevity
         this->fill_in( then_bb );
    @@ -262,7 +258,7 @@ MyCompiler::handle_if( BasicBlock* bb, SetCondInst* condition )
     the instructions for the "then" and "else" parts. They would use the third part
     of the idiom almost exclusively (inserting new instructions before the 
     terminator). Furthermore, they could even recurse back to handle_if 
    -should they encounter another if/then/else statement and it will just work.

    +should they encounter another if/then/else statement, and it will just work.

    Note how cleanly this all works out. In particular, the push_back methods on the BasicBlock's instruction list. These are lists of type Instruction (which is also of type Value). To create @@ -271,7 +267,8 @@ arguments the blocks to branch to and the condition to branch on. The BasicBlock objects act like branch labels! This new BranchInst terminates the BasicBlock provided as an argument. To give the caller a way to keep inserting after calling -handle_if we create an exit_bb block which is returned +handle_if, we create an exit_bb block which is +returned to the caller. Note that the exit_bb block is used as the terminator for both the then_bb and the else_bb blocks. This guarantees that no matter what else handle_if @@ -286,7 +283,7 @@ One of the first things I noticed is the frequent use of the "push_back" method on the various lists. This is so common that it is worth mentioning. The "push_back" inserts a value into an STL list, vector, array, etc. at the end. The method might have also been named "insert_tail" or "append". -Althought I've used STL quite frequently, my use of push_back wasn't very +Although I've used STL quite frequently, my use of push_back wasn't very high in other programs. In LLVM, you'll use it all the time.

    @@ -295,25 +292,26 @@ high in other programs. In LLVM, you'll use it all the time.

    It took a little getting used to and several rounds of postings to the LLVM -mail list to wrap my head around this instruction correctly. Even though I had +mailing list to wrap my head around this instruction correctly. Even though I had read the Language Reference and Programmer's Manual a couple times each, I still missed a few very key points:

    This means that when you look up an element in the global variable (assuming -its a struct or array), you must deference the pointer first! For many +it's a struct or array), you must deference the pointer first! For many things, this leads to the idiom:

    -
    
    -std::vector index_vector;
    -index_vector.push_back( ConstantSInt::get( Type::LongTy, 0 );
    +
    +std::vector<Value*> index_vector;
    +index_vector.push_back( ConstantInt::get( Type::LongTy, 0 );
     // ... push other indices ...
    -GetElementPtrInst* gep = new GetElementPtrInst( ptr, index_vector );
    -
    +GetElementPtrInst* gep = GetElementPtrInst::Create( ptr, index_vector ); +

    For example, suppose we have a global variable whose type is [24 x int]. The variable itself represents a pointer to that array. To subscript the array, we need two indices, not just one. The first index (0) dereferences the @@ -322,13 +320,13 @@ will run against your grain because you'll naturally think of the global array variable and the address of its first element as the same. That tripped me up for a while until I realized that they really do differ .. by type. Remember that LLVM is strongly typed. Everything has a type. -The "type" of the global variable is [24 x int]*. That is, its +The "type" of the global variable is [24 x int]*. That is, it's a pointer to an array of 24 ints. When you dereference that global variable with a single (0) index, you now have a "[24 x int]" type. Although the pointer value of the dereferenced global and the address of the zero'th element in the array will be the same, they differ in their type. The zero'th element has type "int" while the pointer value has type "[24 x int]".

    -

    Get this one aspect of LLVM right in your head and you'll save yourself +

    Get this one aspect of LLVM right in your head, and you'll save yourself a lot of compiler writing headaches down the road.

    @@ -337,7 +335,7 @@ a lot of compiler writing headaches down the road.

    Linkage types in LLVM can be a little confusing, especially if your compiler writing mind has affixed firm concepts to particular words like "weak", "external", "global", "linkonce", etc. LLVM does not use the precise -definitions of say ELF or GCC even though they share common terms. To be fair, +definitions of, say, ELF or GCC, even though they share common terms. To be fair, the concepts are related and similar but not precisely the same. This can lead you to think you know what a linkage type represents but in fact it is slightly different. I recommend you read the @@ -345,11 +343,11 @@ different. I recommend you read the carefully. Then, read it again.

    Here are some handy tips that I discovered along the way: