X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;ds=sidebyside;f=docs%2FStacker.html;h=81b623efa9a15eca20c51a6d4b78b7f755147b7c;hb=1a203571ca94c4770a8cada8ace7fbeb0e65799a;hp=df6aedfceab9135523270ce84c3d8d3cd3106cc9;hpb=45ab10cb6ebe1e4b7efc5ee6aecc3ebe24829d70;p=oota-llvm.git diff --git a/docs/Stacker.html b/docs/Stacker.html index df6aedfceab..81b623efa9a 100644 --- a/docs/Stacker.html +++ b/docs/Stacker.html @@ -1,12 +1,14 @@ - +
-Written by Reid Spencer
-+ + -
This document is another way to learn about LLVM. Unlike the LLVM Reference Manual or @@ -61,7 +60,7 @@ about LLVM through the experience of creating a simple programming language named Stacker. Stacker was invented specifically as a demonstration of LLVM. The emphasis in this document is not on describing the -intricacies of LLVM itself, but on how to use it to build your own +intricacies of LLVM itself but on how to use it to build your own compiler system.
The language described here, Stacker, is Forth-like. Programs -are simple collections of word definitions and the only thing definitions +are simple collections of word definitions, and the only thing definitions can do is manipulate a stack or generate I/O. Stacker is not a "real" -programming language; its very simple. Although it is computationally +programming language; it's very simple. Although it is computationally complete, you wouldn't use it for your next big project. However, -the fact that it is complete, its simple, and it doesn't have +the fact that it is complete, it's simple, and it doesn't have a C-like syntax make it useful for demonstration purposes. It shows that LLVM could be applied to a wide variety of languages.
The basic notions behind stacker is very simple. There's a stack of @@ -95,11 +94,11 @@ program in Stacker:
: MAIN hello_world ;This has two "definitions" (Stacker manipulates words, not
functions and words have definitions): MAIN
and
-hello_world
. The MAIN
definition is standard, it
+hello_world. The MAIN
definition is standard; it
tells Stacker where to start. Here, MAIN
is defined to
simply invoke the word hello_world
. The
hello_world
definition tells stacker to push the
-"Hello, World!"
string onto the stack, print it out
+"Hello, World!"
string on to the stack, print it out
(>s
), pop it off the stack (DROP
), and
finally print a carriage return (CR
). Although
hello_world
uses the stack, its net effect is null. Well
@@ -123,36 +122,33 @@ learned. Those lessons are described in the following subsections.
Although I knew that LLVM uses a Single Static Assignment (SSA) format, it wasn't obvious to me how prevalent this idea was in LLVM until I really started using it. Reading the -Programmer's Manual and Language Reference +Programmer's Manual and Language Reference, I noted that most of the important LLVM IR (Intermediate Representation) C++ classes were derived from the Value class. The full power of that simple design only became fully understood once I started constructing executable expressions for Stacker.
+This really makes your programming go faster. Think about compiling code
for the following C/C++ expression: (a|b)*((x+1)/(y+1))
. Assuming
the values are on the stack in the order a, b, x, y, this could be
expressed in stacker as: 1 + SWAP 1 + / ROT2 OR *
.
-You could write a function using LLVM that computes this expression like this:
+You could write a function using LLVM that computes this expression like
+this:
+
+
Value*
expression(BasicBlock* bb, Value* a, Value* b, Value* x, Value* y )
{
- Instruction* tail = bb->getTerminator();
- ConstantSInt* one = ConstantSInt::get( Type::IntTy, 1);
- BinaryOperator* or1 =
- BinaryOperator::create( Instruction::Or, a, b, "", tail );
- BinaryOperator* add1 =
- BinaryOperator::create( Instruction::Add, x, one, "", tail );
- BinaryOperator* add2 =
- BinaryOperator::create( Instruction::Add, y, one, "", tail );
- BinaryOperator* div1 =
- BinaryOperator::create( Instruction::Div, add1, add2, "", tail);
- BinaryOperator* mult1 =
- BinaryOperator::create( Instruction::Mul, or1, div1, "", tail );
-
+ ConstantInt* one = ConstantInt::get(Type::IntTy, 1);
+ BinaryOperator* or1 = BinaryOperator::createOr(a, b, "", bb);
+ BinaryOperator* add1 = BinaryOperator::createAdd(x, one, "", bb);
+ BinaryOperator* add2 = BinaryOperator::createAdd(y, one, "", bb);
+ BinaryOperator* div1 = BinaryOperator::createDiv(add1, add2, "", bb);
+ BinaryOperator* mult1 = BinaryOperator::createMul(or1, div1, "", bb);
return mult1;
}
-
+
+
"Okay, big deal," you say? It is a big deal. Here's why. Note that I didn't
have to tell this function which kinds of Values are being passed in. They could be
Instruction
s, Constant
s, GlobalVariable
s, or
@@ -201,8 +197,8 @@ should be constructed. In general, here's what I learned:
- Create your blocks early. While writing your compiler, you
will encounter several situations where you know apriori that you will
- need several blocks. For example, if-then-else, switch, while and for
- statements in C/C++ all need multiple blocks for expression in LVVM.
+ need several blocks. For example, if-then-else, switch, while, and for
+ statements in C/C++ all need multiple blocks for expression in LLVM.
The rule is, create them early.
- Terminate your blocks early. This just reduces the chances
that you forget to terminate your blocks which is required (go
@@ -222,30 +218,30 @@ should be constructed. In general, here's what I learned:
before. This makes for some very clean compiler design.
The foregoing is such an important principal, its worth making an idiom:
-
-BasicBlock* bb = new BasicBlock();
-bb->getInstList().push_back( new Branch( ... ) );
+
+BasicBlock* bb = BasicBlock::Create();
+bb->getInstList().push_back( BranchInst::Create( ... ) );
new Instruction(..., bb->getTerminator() );
-
+
To make this clear, consider the typical if-then-else statement (see StackerCompiler::handle_if() method). We can set this up in a single function using LLVM in the following way:
using namespace llvm; BasicBlock* -MyCompiler::handle_if( BasicBlock* bb, SetCondInst* condition ) +MyCompiler::handle_if( BasicBlock* bb, ICmpInst* condition ) { // Create the blocks to contain code in the structure of if/then/else - BasicBlock* then_bb = new BasicBlock(); - BasicBlock* else_bb = new BasicBlock(); - BasicBlock* exit_bb = new BasicBlock(); + BasicBlock* then_bb = BasicBlock::Create(); + BasicBlock* else_bb = BasicBlock::Create(); + BasicBlock* exit_bb = BasicBlock::Create(); // Insert the branch instruction for the "if" - bb->getInstList().push_back( new BranchInst( then_bb, else_bb, condition ) ); + bb->getInstList().push_back( BranchInst::Create( then_bb, else_bb, condition ) ); // Set up the terminating instructions - then->getInstList().push_back( new BranchInst( exit_bb ) ); - else->getInstList().push_back( new BranchInst( exit_bb ) ); + then->getInstList().push_back( BranchInst::Create( exit_bb ) ); + else->getInstList().push_back( BranchInst::Create( exit_bb ) ); // Fill in the then part .. details excised for brevity this->fill_in( then_bb ); @@ -262,7 +258,7 @@ MyCompiler::handle_if( BasicBlock* bb, SetCondInst* condition ) the instructions for the "then" and "else" parts. They would use the third part of the idiom almost exclusively (inserting new instructions before the terminator). Furthermore, they could even recurse back tohandle_if
-should they encounter another if/then/else statement and it will just work. +should they encounter another if/then/else statement, and it will just work.Note how cleanly this all works out. In particular, the push_back methods on the
BasicBlock
's instruction list. These are lists of typeInstruction
(which is also of typeValue
). To create @@ -271,7 +267,8 @@ arguments the blocks to branch to and the condition to branch on. TheBasicBlock
objects act like branch labels! This newBranchInst
terminates theBasicBlock
provided as an argument. To give the caller a way to keep inserting after calling -handle_if
we create anexit_bb
block which is returned +handle_if
, we create anexit_bb
block which is +returned to the caller. Note that theexit_bb
block is used as the terminator for both thethen_bb
and theelse_bb
blocks. This guarantees that no matter what elsehandle_if
@@ -286,7 +283,7 @@ One of the first things I noticed is the frequent use of the "push_back" method on the various lists. This is so common that it is worth mentioning. The "push_back" inserts a value into an STL list, vector, array, etc. at the end. The method might have also been named "insert_tail" or "append". -Althought I've used STL quite frequently, my use of push_back wasn't very +Although I've used STL quite frequently, my use of push_back wasn't very high in other programs. In LLVM, you'll use it all the time.
It took a little getting used to and several rounds of postings to the LLVM -mail list to wrap my head around this instruction correctly. Even though I had +mailing list to wrap my head around this instruction correctly. Even though I had read the Language Reference and Programmer's Manual a couple times each, I still missed a few very key points:
This means that when you look up an element in the global variable (assuming -its a struct or array), you must deference the pointer first! For many +it's a struct or array), you must deference the pointer first! For many things, this leads to the idiom:
-
-std::vector index_vector;
-index_vector.push_back( ConstantSInt::get( Type::LongTy, 0 );
+
+std::vector<Value*> index_vector;
+index_vector.push_back( ConstantInt::get( Type::LongTy, 0 );
// ... push other indices ...
-GetElementPtrInst* gep = new GetElementPtrInst( ptr, index_vector );
-
+GetElementPtrInst* gep = GetElementPtrInst::Create( ptr, index_vector );
+
For example, suppose we have a global variable whose type is [24 x int]. The variable itself represents a pointer to that array. To subscript the array, we need two indices, not just one. The first index (0) dereferences the @@ -322,13 +320,13 @@ will run against your grain because you'll naturally think of the global array variable and the address of its first element as the same. That tripped me up for a while until I realized that they really do differ .. by type. Remember that LLVM is strongly typed. Everything has a type. -The "type" of the global variable is [24 x int]*. That is, its +The "type" of the global variable is [24 x int]*. That is, it's a pointer to an array of 24 ints. When you dereference that global variable with a single (0) index, you now have a "[24 x int]" type. Although the pointer value of the dereferenced global and the address of the zero'th element in the array will be the same, they differ in their type. The zero'th element has type "int" while the pointer value has type "[24 x int]".
-Get this one aspect of LLVM right in your head and you'll save yourself +
Get this one aspect of LLVM right in your head, and you'll save yourself a lot of compiler writing headaches down the road.
Linkage types in LLVM can be a little confusing, especially if your compiler writing mind has affixed firm concepts to particular words like "weak", "external", "global", "linkonce", etc. LLVM does not use the precise -definitions of say ELF or GCC even though they share common terms. To be fair, +definitions of, say, ELF or GCC, even though they share common terms. To be fair, the concepts are related and similar but not precisely the same. This can lead you to think you know what a linkage type represents but in fact it is slightly different. I recommend you read the @@ -345,11 +343,11 @@ different. I recommend you read the carefully. Then, read it again.
Here are some handy tips that I discovered along the way:
Manipulating the stack can be quite hazardous. There is no distinction given and no checking for the various types of values that can be placed on the stack. Automatic coercion between types is performed. In many -cases this is useful. For example, a boolean value placed on the stack +cases, this is useful. For example, a boolean value placed on the stack can be interpreted as an integer with good results. However, using a word that interprets that boolean value as a pointer to a string to print out will almost always yield a crash. Stacker simply leaves it @@ -412,9 +410,9 @@ is terminated by a semi-colon.
So, your typical definition will have the form:
: name ... ;
The name
is up to you but it must start with a letter and contain
-only letters numbers and underscore. Names are case sensitive and must not be
+only letters, numbers, and underscore. Names are case sensitive and must not be
the same as the name of a built-in word. The ...
is replaced by
-the stack manipulting words that you wish define name
as.
+the stack manipulating words that you wish to define name
as.
@@ -435,12 +433,12 @@ a real program.
There are three kinds of literal values in Stacker. Integer, Strings, +
There are three kinds of literal values in Stacker: Integers, Strings,
and Booleans. In each case, the stack operation is to simply push the
- value onto the stack. So, for example:
+ value on to the stack. So, for example:
42 " is the answer." TRUE
- will push three values onto the stack: the integer 42, the
- string " is the answer." and the boolean TRUE.
FORWARD
is an external symbol for
linking.
+
+TODO
+The built-in words of the Stacker language are put in several groups depending on what they do. The groups are as follows:
Definition Of Operation Of Built In Words | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
LOGICAL OPERATIONS | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Word | -Name | -Operation | -Description | -||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
< | -LT | -w1 w2 -- b | -Two values (w1 and w2) are popped off the stack and
+
The following fully documented program highlights many features of both the Stacker language and what is possible with LLVM. The program has two modes -of operations. If you provide numeric arguments to the program, it checks to see +of operation. If you provide numeric arguments to the program, it checks to see if those arguments are prime numbers and prints out the results. Without any -aruments, the program prints out any prime numbers it finds between 1 and one +arguments, the program prints out any prime numbers it finds between 1 and one million (there's a lot of them!). The source code comments below tell the remainder of the story. @@ -1050,20 +1061,20 @@ remainder of the story. ################################################################################ # Utility definitions ################################################################################ -: print >d CR ; +: print >d CR ; : it_is_a_prime TRUE ; : it_is_not_a_prime FALSE ; : continue_loop TRUE ; : exit_loop FALSE; ################################################################################ -# This definition tryies an actual division of a candidate prime number. It +# This definition tries an actual division of a candidate prime number. It # determines whether the division loop on this candidate should continue or # not. -# STACK<: +# STACK<: # div - the divisor to try # p - the prime number we are working on -# STACK>: +# STACK>: # cont - should we continue the loop ? # div - the next divisor to try # p - the prime number we are working on @@ -1089,7 +1100,7 @@ remainder of the story. # cont - should we continue the loop (ignored)? # div - the divisor to try # p - the prime number we are working on -# STACK>: +# STACK>: # cont - should we continue the loop ? # div - the next divisor to try # p - the prime number we are working on @@ -1114,10 +1125,10 @@ remainder of the story. # definition which returns a loop continuation value (which we also seed with # the value 1). After the loop, we check the divisor. If it decremented all # the way to zero then we found a prime, otherwise we did not find one. -# STACK<: +# STACK<: # p - the prime number to check -# STACK>: -# yn - boolean indiating if its a prime or not +# STACK>: +# yn - boolean indicating if its a prime or not # p - the prime number checked ################################################################################ : try_harder @@ -1137,18 +1148,18 @@ remainder of the story. ################################################################################ # This definition determines if the number on the top of the stack is a prime -# or not. It does this by testing if the value is degenerate (<= 3) and +# or not. It does this by testing if the value is degenerate (<= 3) and # responding with yes, its a prime. Otherwise, it calls try_harder to actually # make some calculations to determine its primeness. -# STACK<: +# STACK<: # p - the prime number to check -# STACK>: +# STACK>: # yn - boolean indicating if its a prime or not # p - the prime number checked ################################################################################ : is_prime DUP ( save the prime number ) - 3 >= IF ( see if its <= 3 ) + 3 >= IF ( see if its <= 3 ) it_is_a_prime ( its <= 3 just indicate its prime ) ELSE try_harder ( have to do a little more work ) @@ -1158,11 +1169,11 @@ remainder of the story. ################################################################################ # This definition is called when it is time to exit the program, after we have # found a sufficiently large number of primes. -# STACK<: ignored -# STACK>: exits +# STACK<: ignored +# STACK>: exits ################################################################################ : done - "Finished" >s CR ( say we are finished ) + "Finished" >s CR ( say we are finished ) 0 EXIT ( exit nicely ) ; @@ -1173,14 +1184,14 @@ remainder of the story. # If it is a prime, it prints it. Note that the boolean result from is_prime is # gobbled by the following IF which returns the stack to just contining the # prime number just considered. -# STACK<: +# STACK<: # p - one less than the prime number to consider -# STACK> +# STAC>K # p+1 - the prime number considered ################################################################################ : consider_prime DUP ( save the prime number to consider ) - 1000000 < IF ( check to see if we are done yet ) + 1000000 < IF ( check to see if we are done yet ) done ( we are done, call "done" ) ENDIF ++ ( increment to next prime number ) @@ -1194,11 +1205,11 @@ remainder of the story. # This definition starts at one, prints it out and continues into a loop calling # consider_prime on each iteration. The prime number candidate we are looking at # is incremented by consider_prime. -# STACK<: empty -# STACK>: empty +# STACK<: empty +# STACK>: empty ################################################################################ : find_primes - "Prime Numbers: " >s CR ( say hello ) + "Prime Numbers: " >s CR ( say hello ) DROP ( get rid of that pesky string ) 1 ( stoke the fires ) print ( print the first one, we know its prime ) @@ -1211,17 +1222,17 @@ remainder of the story. # ################################################################################ : say_yes - >d ( Print the prime number ) + >d ( Print the prime number ) " is prime." ( push string to output ) - >s ( output it ) + >s ( output it ) CR ( print carriage return ) DROP ( pop string ) ; : say_no - >d ( Print the prime number ) + >d ( Print the prime number ) " is NOT prime." ( push string to put out ) - >s ( put out the string ) + >s ( put out the string ) CR ( print carriage return ) DROP ( pop string ) ; @@ -1229,10 +1240,10 @@ remainder of the story. ################################################################################ # This definition processes a single command line argument and determines if it # is a prime number or not. -# STACK<: +# STACK<: # n - number of arguments # arg1 - the prime numbers to examine -# STACK>: +# STACK>: # n-1 - one less than number of arguments # arg2 - we processed one argument ################################################################################ @@ -1249,7 +1260,7 @@ remainder of the story. ################################################################################ # The MAIN program just prints a banner and processes its arguments. -# STACK<: +# STACK<: # n - number of arguments # ... - the arguments ################################################################################ @@ -1261,13 +1272,13 @@ remainder of the story. ################################################################################ # The MAIN program just prints a banner and processes its arguments. -# STACK<: arguments +# STACK<: arguments ################################################################################ : MAIN NIP ( get rid of the program name ) -- ( reduce number of arguments ) DUP ( save the arg counter ) - 1 <= IF ( See if we got an argument ) + 1 <= IF ( See if we got an argument ) process_arguments ( tell user if they are prime ) ELSE find_primes ( see how many we can find ) @@ -1285,13 +1296,26 @@ remainder of the story.The source code, test programs, and sample programs can all be found -under the LLVM "projects" directory. You will need to obtain the LLVM sources -to find it (either via anonymous CVS or a tarball. See the -Getting Started document). -Under the "projects" directory there is a directory named "stacker". That -directory contains everything, as follows: +in the LLVM repository named llvm-stacker This should be checked out to +the projects directory so that it will auto-configure. To do that, make +sure you have the llvm sources in llvm +(see Getting Started) and then use these +commands: + +
+
+
++% svn co http://llvm.org/svn/llvm-project/llvm-top/trunk llvm-top +% cd llvm-top +% make build MODULE=stacker ++ Under the projects/llvm-stacker directory you will find the +implementation of the Stacker compiler, as follows: +
-
+See projects/Stacker/lib/compiler/Lexer.l -See projects/llvm-stacker/lib/compiler/Lexer.l + +
-
+See projects/Stacker/lib/compiler/StackerParser.y -See projects/llvm-stacker/lib/compiler/StackerParser.y +
-
+See projects/Stacker/lib/compiler/StackerCompiler.cpp -See projects/llvm-stacker/lib/compiler/StackerCompiler.cpp +
-
+See projects/Stacker/lib/runtime/stacker_rt.c -See projects/llvm-stacker/lib/runtime/stacker_rt.c +
-
+See projects/Stacker/tools/stkrc/stkrc.cpp -See projects/llvm-stacker/tools/stkrc/stkrc.cpp +
-
+See projects/Stacker/test/*.st -See projects/llvm-stacker/test/*.st +
@@ -1343,7 +1370,7 @@ directory contains everything, as follows:
definitions, the ROLL word is not implemented. This word was left out of
Stacker on purpose so that it can be an exercise for the student. The exercise
is to implement the ROLL functionality (in your own workspace) and build a test
-program for it. If you can implement ROLL you understand Stacker and probably
+program for it. If you can implement ROLL, you understand Stacker and probably
a fair amount about LLVM since this is one of the more complicated Stacker
operations. The work will almost be completely limited to the
compiler.
@@ -1351,13 +1378,13 @@ operations. The work will almost be completely limited to the
by the compiler. That means you don't have to futz around with figuring out how
to get the keyword recognized. It already is. The part of the compiler that
you need to implement is the
-
+
ROLL case in the
-StackerCompiler::handle_word(int) method. See the implementations
-of PICk and SELECT in the same method to get some hints about how to complete
-this exercise.
+
Good luck! The initial implementation of Stacker has several deficiencies. If you're interested, here are some things that could be implemented better: @@ -1365,22 +1392,15 @@ interested, here are some things that could be implemented better:- + + + + + Reid Spencer + LLVM Compiler Infrastructure + Last modified: $Date$ + + |