From: Chris Lattner Date: Mon, 11 Aug 2008 06:13:31 +0000 (+0000) Subject: the stacker doc is way out of date. X-Git-Url: http://demsky.eecs.uci.edu/git/?a=commitdiff_plain;h=4630e4ddcf70d22e231a2f7f30774aecfe15c3a0;p=oota-llvm.git the stacker doc is way out of date. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54631 91177308-0d34-0410-b5e6-96231b3b80d8 --- diff --git a/docs/GettingStarted.html b/docs/GettingStarted.html index d057390d83c..3110ac6db4d 100644 --- a/docs/GettingStarted.html +++ b/docs/GettingStarted.html @@ -1291,8 +1291,7 @@ different tools.

This directory contains projects that are not strictly part of LLVM but are shipped with LLVM. This is also the directory where you should create your own LLVM-based projects. See llvm/projects/sample for an example of how - to set up your own project. See llvm/projects/Stacker for a fully - functional example of a compiler front end.

+ to set up your own project.

diff --git a/docs/Stacker.html b/docs/Stacker.html deleted file mode 100644 index 81b623efa9a..00000000000 --- a/docs/Stacker.html +++ /dev/null @@ -1,1428 +0,0 @@ - - - - Stacker: An Example Of Using LLVM - - - - -
Stacker: An Example Of Using LLVM
- -
    -
  1. Abstract
  2. -
  3. Introduction
  4. -
  5. Lessons I Learned About LLVM -
      -
    1. Everything's a Value!
    2. -
    3. Terminate Those Blocks!
    4. -
    5. Concrete Blocks
    6. -
    7. push_back Is Your Friend
    8. -
    9. The Wily GetElementPtrInst
    10. -
    11. Getting Linkage Types Right
    12. -
    13. Constants Are Easier Than That!
    14. -
  6. -
  7. The Stacker Lexicon -
      -
    1. The Stack
    2. -
    3. Punctuation
    4. -
    5. Comments
    6. -
    7. Literals
    8. -
    9. Words
    10. -
    11. Standard Style
    12. -
    13. Built-Ins
    14. -
  8. -
  9. Prime: A Complete Example
  10. -
  11. Internal Code Details -
      -
    1. The Directory Structure
    2. -
    3. The Lexer
    4. -
    5. The Parser
    6. -
    7. The Compiler
    8. -
    9. The Runtime
    10. -
    11. Compiler Driver
    12. -
    13. Test Programs
    14. -
    15. Exercise
    16. -
    17. Things Remaining To Be Done
    18. -
  12. -
- -
-

Written by Reid Spencer

-
- - -
Abstract
-
-

This document is another way to learn about LLVM. Unlike the -LLVM Reference Manual or -LLVM Programmer's Manual, here we learn -about LLVM through the experience of creating a simple programming language -named Stacker. Stacker was invented specifically as a demonstration of -LLVM. The emphasis in this document is not on describing the -intricacies of LLVM itself but on how to use it to build your own -compiler system.

-
- -
Introduction
-
-

Amongst other things, LLVM is a platform for compiler writers. -Because of its exceptionally clean and small IR (intermediate -representation), compiler writing with LLVM is much easier than with -other system. As proof, I wrote the entire compiler (language definition, -lexer, parser, code generator, etc.) in about four days! -That's important to know because it shows how quickly you can get a new -language running when using LLVM. Furthermore, this was the first -language the author ever created using LLVM. The learning curve is -included in that four days.

-

The language described here, Stacker, is Forth-like. Programs -are simple collections of word definitions, and the only thing definitions -can do is manipulate a stack or generate I/O. Stacker is not a "real" -programming language; it's very simple. Although it is computationally -complete, you wouldn't use it for your next big project. However, -the fact that it is complete, it's simple, and it doesn't have -a C-like syntax make it useful for demonstration purposes. It shows -that LLVM could be applied to a wide variety of languages.

-

The basic notions behind stacker is very simple. There's a stack of -integers (or character pointers) that the program manipulates. Pretty -much the only thing the program can do is manipulate the stack and do -some limited I/O operations. The language provides you with several -built-in words that manipulate the stack in interesting ways. To get -your feet wet, here's how you write the traditional "Hello, World" -program in Stacker:

-

: hello_world "Hello, World!" >s DROP CR ;
-: MAIN hello_world ;

-

This has two "definitions" (Stacker manipulates words, not -functions and words have definitions): MAIN and -hello_world. The MAIN definition is standard; it -tells Stacker where to start. Here, MAIN is defined to -simply invoke the word hello_world. The -hello_world definition tells stacker to push the -"Hello, World!" string on to the stack, print it out -(>s), pop it off the stack (DROP), and -finally print a carriage return (CR). Although -hello_world uses the stack, its net effect is null. Well -written Stacker definitions have that characteristic.

-

Exercise for the reader: how could you make this a one line program?

-
- -
Lessons I Learned About LLVM
-
-

Stacker was written for two purposes:

-
    -
  1. to get the author over the learning curve, and
  2. -
  3. to provide a simple example of how to write a compiler using LLVM.
  4. -
-

During the development of Stacker, many lessons about LLVM were -learned. Those lessons are described in the following subsections.

-

- -
Everything's a Value!
-
-

Although I knew that LLVM uses a Single Static Assignment (SSA) format, -it wasn't obvious to me how prevalent this idea was in LLVM until I really -started using it. Reading the -Programmer's Manual and Language Reference, -I noted that most of the important LLVM IR (Intermediate Representation) C++ -classes were derived from the Value class. The full power of that simple -design only became fully understood once I started constructing executable -expressions for Stacker.

- -

This really makes your programming go faster. Think about compiling code -for the following C/C++ expression: (a|b)*((x+1)/(y+1)). Assuming -the values are on the stack in the order a, b, x, y, this could be -expressed in stacker as: 1 + SWAP 1 + / ROT2 OR *. -You could write a function using LLVM that computes this expression like -this:

- -
-Value* 
-expression(BasicBlock* bb, Value* a, Value* b, Value* x, Value* y )
-{
-    ConstantInt* one = ConstantInt::get(Type::IntTy, 1);
-    BinaryOperator* or1 = BinaryOperator::createOr(a, b, "", bb);
-    BinaryOperator* add1 = BinaryOperator::createAdd(x, one, "", bb);
-    BinaryOperator* add2 = BinaryOperator::createAdd(y, one, "", bb);
-    BinaryOperator* div1 = BinaryOperator::createDiv(add1, add2, "", bb);
-    BinaryOperator* mult1 = BinaryOperator::createMul(or1, div1, "", bb);
-    return mult1;
-}
-
- -

"Okay, big deal," you say? It is a big deal. Here's why. Note that I didn't -have to tell this function which kinds of Values are being passed in. They could be -Instructions, Constants, GlobalVariables, or -any of the other subclasses of Value that LLVM supports. -Furthermore, if you specify Values that are incorrect for this sequence of -operations, LLVM will either notice right away (at compilation time) or the LLVM -Verifier will pick up the inconsistency when the compiler runs. In either case -LLVM prevents you from making a type error that gets passed through to the -generated program. This really helps you write a compiler that -always generates correct code!

-

The second point is that we don't have to worry about branching, registers, -stack variables, saving partial results, etc. The instructions we create -are the values we use. Note that all that was created in the above -code is a Constant value and five operators. Each of the instructions is -the resulting value of that instruction. This saves a lot of time.

-

The lesson is this: SSA form is very powerful: there is no difference -between a value and the instruction that created it. This is fully -enforced by the LLVM IR. Use it to your best advantage.

-
- -
Terminate Those Blocks!
-
-

I had to learn about terminating blocks the hard way: using the debugger -to figure out what the LLVM verifier was trying to tell me and begging for -help on the LLVMdev mailing list. I hope you avoid this experience.

-

Emblazon this rule in your mind:

- -

Terminating instructions are a semantic requirement of the LLVM IR. There -is no facility for implicitly chaining together blocks placed into a function -in the order they occur. Indeed, in the general case, blocks will not be -added to the function in the order of execution because of the recursive -way compilers are written.

-

Furthermore, if you don't terminate your blocks, your compiler code will -compile just fine. You won't find out about the problem until you're running -the compiler and the module you just created fails on the LLVM Verifier.

-
- -
Concrete Blocks
-
-

After a little initial fumbling around, I quickly caught on to how blocks -should be constructed. In general, here's what I learned: -

    -
  1. Create your blocks early. While writing your compiler, you - will encounter several situations where you know apriori that you will - need several blocks. For example, if-then-else, switch, while, and for - statements in C/C++ all need multiple blocks for expression in LLVM. - The rule is, create them early.
  2. -
  3. Terminate your blocks early. This just reduces the chances - that you forget to terminate your blocks which is required (go - here for more). -
  4. Use getTerminator() for instruction insertion. I noticed early on - that many of the constructors for the Instruction classes take an optional - insert_before argument. At first, I thought this was a mistake - because clearly the normal mode of inserting instructions would be one at - a time after some other instruction, not before. However, - if you hold on to your terminating instruction (or use the handy dandy - getTerminator() method on a BasicBlock), it can - always be used as the insert_before argument to your instruction - constructors. This causes the instruction to automatically be inserted in - the RightPlace™ place, just before the terminating instruction. The - nice thing about this design is that you can pass blocks around and insert - new instructions into them without ever knowing what instructions came - before. This makes for some very clean compiler design.
  5. -
-

The foregoing is such an important principal, its worth making an idiom:

-
-BasicBlock* bb = BasicBlock::Create();
-bb->getInstList().push_back( BranchInst::Create( ... ) );
-new Instruction(..., bb->getTerminator() );
-
-

To make this clear, consider the typical if-then-else statement -(see StackerCompiler::handle_if() method). We can set this up -in a single function using LLVM in the following way:

-
-using namespace llvm;
-BasicBlock*
-MyCompiler::handle_if( BasicBlock* bb, ICmpInst* condition )
-{
-    // Create the blocks to contain code in the structure of if/then/else
-    BasicBlock* then_bb = BasicBlock::Create(); 
-    BasicBlock* else_bb = BasicBlock::Create();
-    BasicBlock* exit_bb = BasicBlock::Create();
-
-    // Insert the branch instruction for the "if"
-    bb->getInstList().push_back( BranchInst::Create( then_bb, else_bb, condition ) );
-
-    // Set up the terminating instructions
-    then->getInstList().push_back( BranchInst::Create( exit_bb ) );
-    else->getInstList().push_back( BranchInst::Create( exit_bb ) );
-
-    // Fill in the then part .. details excised for brevity
-    this->fill_in( then_bb );
-
-    // Fill in the else part .. details excised for brevity
-    this->fill_in( else_bb );
-
-    // Return a block to the caller that can be filled in with the code
-    // that follows the if/then/else construct.
-    return exit_bb;
-}
-
-

Presumably in the foregoing, the calls to the "fill_in" method would add -the instructions for the "then" and "else" parts. They would use the third part -of the idiom almost exclusively (inserting new instructions before the -terminator). Furthermore, they could even recurse back to handle_if -should they encounter another if/then/else statement, and it will just work.

-

Note how cleanly this all works out. In particular, the push_back methods on -the BasicBlock's instruction list. These are lists of type -Instruction (which is also of type Value). To create -the "if" branch we merely instantiate a BranchInst that takes as -arguments the blocks to branch to and the condition to branch on. The -BasicBlock objects act like branch labels! This new -BranchInst terminates the BasicBlock provided -as an argument. To give the caller a way to keep inserting after calling -handle_if, we create an exit_bb block which is -returned -to the caller. Note that the exit_bb block is used as the -terminator for both the then_bb and the else_bb -blocks. This guarantees that no matter what else handle_if -or fill_in does, they end up at the exit_bb block. -

-
- -
push_back Is Your Friend
-
-

-One of the first things I noticed is the frequent use of the "push_back" -method on the various lists. This is so common that it is worth mentioning. -The "push_back" inserts a value into an STL list, vector, array, etc. at the -end. The method might have also been named "insert_tail" or "append". -Although I've used STL quite frequently, my use of push_back wasn't very -high in other programs. In LLVM, you'll use it all the time. -

-
- -
The Wily GetElementPtrInst
-
-

-It took a little getting used to and several rounds of postings to the LLVM -mailing list to wrap my head around this instruction correctly. Even though I had -read the Language Reference and Programmer's Manual a couple times each, I still -missed a few very key points: -

- -

This means that when you look up an element in the global variable (assuming -it's a struct or array), you must deference the pointer first! For many -things, this leads to the idiom: -

-
-std::vector<Value*> index_vector;
-index_vector.push_back( ConstantInt::get( Type::LongTy, 0 );
-// ... push other indices ...
-GetElementPtrInst* gep = GetElementPtrInst::Create( ptr, index_vector );
-
-

For example, suppose we have a global variable whose type is [24 x int]. The -variable itself represents a pointer to that array. To subscript the -array, we need two indices, not just one. The first index (0) dereferences the -pointer. The second index subscripts the array. If you're a "C" programmer, this -will run against your grain because you'll naturally think of the global array -variable and the address of its first element as the same. That tripped me up -for a while until I realized that they really do differ .. by type. -Remember that LLVM is strongly typed. Everything has a type. -The "type" of the global variable is [24 x int]*. That is, it's -a pointer to an array of 24 ints. When you dereference that global variable with -a single (0) index, you now have a "[24 x int]" type. Although -the pointer value of the dereferenced global and the address of the zero'th element -in the array will be the same, they differ in their type. The zero'th element has -type "int" while the pointer value has type "[24 x int]".

-

Get this one aspect of LLVM right in your head, and you'll save yourself -a lot of compiler writing headaches down the road.

-
- -
Getting Linkage Types Right
-
-

Linkage types in LLVM can be a little confusing, especially if your compiler -writing mind has affixed firm concepts to particular words like "weak", -"external", "global", "linkonce", etc. LLVM does not use the precise -definitions of, say, ELF or GCC, even though they share common terms. To be fair, -the concepts are related and similar but not precisely the same. This can lead -you to think you know what a linkage type represents but in fact it is slightly -different. I recommend you read the - Language Reference on this topic very -carefully. Then, read it again.

-

Here are some handy tips that I discovered along the way:

- -
- -
Constants Are Easier Than That!
-
-

-Constants in LLVM took a little getting used to until I discovered a few utility -functions in the LLVM IR that make things easier. Here's what I learned:

- -
- -
The Stacker Lexicon
-

This section describes the Stacker language

-
The Stack
-
-

Stacker definitions define what they do to the global stack. Before -proceeding, a few words about the stack are in order. The stack is simply -a global array of 32-bit integers or pointers. A global index keeps track -of the location of the top of the stack. All of this is hidden from the -programmer, but it needs to be noted because it is the foundation of the -conceptual programming model for Stacker. When you write a definition, -you are, essentially, saying how you want that definition to manipulate -the global stack.

-

Manipulating the stack can be quite hazardous. There is no distinction -given and no checking for the various types of values that can be placed -on the stack. Automatic coercion between types is performed. In many -cases, this is useful. For example, a boolean value placed on the stack -can be interpreted as an integer with good results. However, using a -word that interprets that boolean value as a pointer to a string to -print out will almost always yield a crash. Stacker simply leaves it -to the programmer to get it right without any interference or hindering -on interpretation of the stack values. You've been warned. :)

-
- -
Punctuation
-
-

Punctuation in Stacker is very simple. The colon and semi-colon -characters are used to introduce and terminate a definition -(respectively). Except for FORWARD declarations, definitions -are all you can specify in Stacker. Definitions are read left to right. -Immediately after the colon comes the name of the word being defined. -The remaining words in the definition specify what the word does. The definition -is terminated by a semi-colon.

-

So, your typical definition will have the form:

-
: name ... ;
-

The name is up to you but it must start with a letter and contain -only letters, numbers, and underscore. Names are case sensitive and must not be -the same as the name of a built-in word. The ... is replaced by -the stack manipulating words that you wish to define name as.

-

- -
Comments
-
-

Stacker supports two types of comments. A hash mark (#) starts a comment - that extends to the end of the line. It is identical to the kind of comments - commonly used in shell scripts. A pair of parentheses also surround a comment. - In both cases, the content of the comment is ignored by the Stacker compiler. The - following does nothing in Stacker. -

-

-# This is a comment to end of line
-( This is an enclosed comment )
-
-

See the example program to see comments in use in -a real program.

-
- -
Literals
-
-

There are three kinds of literal values in Stacker: Integers, Strings, - and Booleans. In each case, the stack operation is to simply push the - value on to the stack. So, for example:
- 42 " is the answer." TRUE
- will push three values on to the stack: the integer 42, the - string " is the answer.", and the boolean TRUE.

-
- -
Words
-
-

Each definition in Stacker is composed of a set of words. Words are -read and executed in order from left to right. There is very little -checking in Stacker to make sure you're doing the right thing with -the stack. It is assumed that the programmer knows how the stack -transformation he applies will affect the program.

-

Words in a definition come in two flavors: built-in and programmer -defined. Simply mentioning the name of a previously defined or declared -programmer-defined word causes that word's stack actions to be invoked. It -is somewhat like a function call in other languages. The built-in -words have various effects, described below.

-

Sometimes you need to call a word before it is defined. For this, you can -use the FORWARD declaration. It looks like this:

-

FORWARD name ;

-

This simply states to Stacker that "name" is the name of a definition -that is defined elsewhere. Generally it means the definition can be found -"forward" in the file. But, it doesn't have to be in the current compilation -unit. Anything declared with FORWARD is an external symbol for -linking.

-
- -
Standard Style
-
-

TODO

-
- -
Built In Words
-
-

The built-in words of the Stacker language are put in several groups -depending on what they do. The groups are as follows:

-
    -
  1. Logical: These words provide the logical operations for - comparing stack operands.
    The words are: < > <= >= - = <> true false.
  2. -
  3. Bitwise: These words perform bitwise computations on - their operands.
    The words are: << >> XOR AND NOT
  4. -
  5. Arithmetic: These words perform arithmetic computations on - their operands.
    The words are: ABS NEG + - * / MOD */ ++ -- MIN MAX
  6. -
  7. StackThese words manipulate the stack directly by moving - its elements around.
    The words are: DROP DROP2 NIP NIP2 DUP DUP2 - SWAP SWAP2 OVER OVER2 ROT ROT2 RROT RROT2 TUCK TUCK2 PICK SELECT ROLL
  8. -
  9. MemoryThese words allocate, free, and manipulate memory - areas outside the stack.
    The words are: MALLOC FREE GET PUT
  10. -
  11. Control: These words alter the normal left to right flow - of execution.
    The words are: IF ELSE ENDIF WHILE END RETURN EXIT RECURSE
  12. -
  13. I/O: These words perform output on the standard output - and input on the standard input. No other I/O is possible in Stacker. -
    The words are: SPACE TAB CR >s >d >c <s <d <c.
  14. -
-

While you may be familiar with many of these operations from other -programming languages, a careful review of their semantics is important -for correct programming in Stacker. Of most importance is the effect -that each of these built-in words has on the global stack. The effect is -not always intuitive. To better describe the effects, we'll borrow from Forth the idiom of -describing the effect on the stack with:

-

BEFORE -- AFTER

-

That is, to the left of the -- is a representation of the stack before -the operation. To the right of the -- is a representation of the stack -after the operation. In the table below that describes the operation of -each of the built in words, we will denote the elements of the stack -using the following construction:

-
    -
  1. b - a boolean truth value
  2. -
  3. w - a normal integer valued word.
  4. -
  5. s - a pointer to a string value
  6. -
  7. p - a pointer to a malloc'd memory block
  8. -
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Definition Of Operation Of Built In Words
LOGICAL OPERATIONS
WordNameOperationDescription
<LTw1 w2 -- bTwo values (w1 and w2) are popped off the stack and - compared. If w1 is less than w2, TRUE is pushed back on - the stack, otherwise FALSE is pushed back on the stack.
>GTw1 w2 -- bTwo values (w1 and w2) are popped off the stack and - compared. If w1 is greater than w2, TRUE is pushed back on - the stack, otherwise FALSE is pushed back on the stack.
>=GEw1 w2 -- bTwo values (w1 and w2) are popped off the stack and - compared. If w1 is greater than or equal to w2, TRUE is - pushed back on the stack, otherwise FALSE is pushed back - on the stack.
<=LEw1 w2 -- bTwo values (w1 and w2) are popped off the stack and - compared. If w1 is less than or equal to w2, TRUE is - pushed back on the stack, otherwise FALSE is pushed back - on the stack.
=EQw1 w2 -- bTwo values (w1 and w2) are popped off the stack and - compared. If w1 is equal to w2, TRUE is - pushed back on the stack, otherwise FALSE is pushed back -
<>NEw1 w2 -- bTwo values (w1 and w2) are popped off the stack and - compared. If w1 is equal to w2, TRUE is - pushed back on the stack, otherwise FALSE is pushed back -
FALSEFALSE -- bThe boolean value FALSE (0) is pushed on to the stack.
TRUETRUE -- bThe boolean value TRUE (-1) is pushed on to the stack.
BITWISE OPERATORS
WordNameOperationDescription
<<SHLw1 w2 -- w1<<w2Two values (w1 and w2) are popped off the stack. The w2 - operand is shifted left by the number of bits given by the - w1 operand. The result is pushed back to the stack.
>>SHRw1 w2 -- w1>>w2Two values (w1 and w2) are popped off the stack. The w2 - operand is shifted right by the number of bits given by the - w1 operand. The result is pushed back to the stack.
ORORw1 w2 -- w2|w1Two values (w1 and w2) are popped off the stack. The values - are bitwise OR'd together and pushed back on the stack. This is - not a logical OR. The sequence 1 2 OR yields 3 not 1.
ANDANDw1 w2 -- w2&w1Two values (w1 and w2) are popped off the stack. The values - are bitwise AND'd together and pushed back on the stack. This is - not a logical AND. The sequence 1 2 AND yields 0 not 1.
XORXORw1 w2 -- w2^w1Two values (w1 and w2) are popped off the stack. The values - are bitwise exclusive OR'd together and pushed back on the stack. - For example, The sequence 1 3 XOR yields 2.
ARITHMETIC OPERATORS
WordNameOperationDescription
ABSABSw -- |w|One value s popped off the stack; its absolute value is computed - and then pushed on to the stack. If w1 is -1 then w2 is 1. If w1 is - 1 then w2 is also 1.
NEGNEGw -- -wOne value is popped off the stack which is negated and then - pushed back on to the stack. If w1 is -1 then w2 is 1. If w1 is - 1 then w2 is -1.
+ ADDw1 w2 -- w2+w1Two values are popped off the stack. Their sum is pushed back - on to the stack
- SUBw1 w2 -- w2-w1Two values are popped off the stack. Their difference is pushed back - on to the stack
* MULw1 w2 -- w2*w1Two values are popped off the stack. Their product is pushed back - on to the stack
/ DIVw1 w2 -- w2/w1Two values are popped off the stack. Their quotient is pushed back - on to the stack
MODMODw1 w2 -- w2%w1Two values are popped off the stack. Their remainder after division - of w1 by w2 is pushed back on to the stack
*/ STAR_SLAHw1 w2 w3 -- (w3*w2)/w1Three values are popped off the stack. The product of w1 and w2 is - divided by w3. The result is pushed back on to the stack.
++ INCRw -- w+1One value is popped off the stack. It is incremented by one and then - pushed back on to the stack.
-- DECRw -- w-1One value is popped off the stack. It is decremented by one and then - pushed back on to the stack.
MINMINw1 w2 -- (w2<w1?w2:w1)Two values are popped off the stack. The larger one is pushed back - on to the stack.
MAXMAXw1 w2 -- (w2>w1?w2:w1)Two values are popped off the stack. The larger value is pushed back - on to the stack.
STACK MANIPULATION OPERATORS
WordNameOperationDescription
DROPDROPw -- One value is popped off the stack.
DROP2DROP2w1 w2 -- Two values are popped off the stack.
NIPNIPw1 w2 -- w2The second value on the stack is removed from the stack. That is, - a value is popped off the stack and retained. Then a second value is - popped and the retained value is pushed.
NIP2NIP2w1 w2 w3 w4 -- w3 w4The third and fourth values on the stack are removed from it. That is, - two values are popped and retained. Then two more values are popped and - the two retained values are pushed back on.
DUPDUPw1 -- w1 w1One value is popped off the stack. That value is then pushed on to - the stack twice to duplicate the top stack vaue.
DUP2DUP2w1 w2 -- w1 w2 w1 w2The top two values on the stack are duplicated. That is, two vaues - are popped off the stack. They are alternately pushed back on the - stack twice each.
SWAPSWAPw1 w2 -- w2 w1The top two stack items are reversed in their order. That is, two - values are popped off the stack and pushed back on to the stack in - the opposite order they were popped.
SWAP2SWAP2w1 w2 w3 w4 -- w3 w4 w2 w1The top four stack items are swapped in pairs. That is, two values - are popped and retained. Then, two more values are popped and retained. - The values are pushed back on to the stack in the reverse order but - in pairs.
OVEROVERw1 w2-- w1 w2 w1Two values are popped from the stack. They are pushed back - on to the stack in the order w1 w2 w1. This seems to cause the - top stack element to be duplicated "over" the next value.
OVER2OVER2w1 w2 w3 w4 -- w1 w2 w3 w4 w1 w2The third and fourth values on the stack are replicated on to the - top of the stack
ROTROTw1 w2 w3 -- w2 w3 w1The top three values are rotated. That is, three value are popped - off the stack. They are pushed back on to the stack in the order - w1 w3 w2.
ROT2ROT2w1 w2 w3 w4 w5 w6 -- w3 w4 w5 w6 w1 w2Like ROT but the rotation is done using three pairs instead of - three singles.
RROTRROTw1 w2 w3 -- w3 w1 w2Reverse rotation. Like ROT, but it rotates the other way around. - Essentially, the third element on the stack is moved to the top - of the stack.
RROT2RROT2w1 w2 w3 w4 w5 w6 -- w3 w4 w5 w6 w1 w2Double reverse rotation. Like RROT but the rotation is done using - three pairs instead of three singles. The fifth and sixth stack - elements are moved to the first and second positions
TUCKTUCKw1 w2 -- w2 w1 w2Similar to OVER except that the second operand is being - replicated. Essentially, the first operand is being "tucked" - in between two instances of the second operand. Logically, two - values are popped off the stack. They are placed back on the - stack in the order w2 w1 w2.
TUCK2TUCK2w1 w2 w3 w4 -- w3 w4 w1 w2 w3 w4Like TUCK but a pair of elements is tucked over two pairs. - That is, the top two elements of the stack are duplicated and - inserted into the stack at the fifth and positions.
PICKPICKx0 ... Xn n -- x0 ... Xn x0The top of the stack is used as an index into the remainder of - the stack. The element at the nth position replaces the index - (top of stack). This is useful for cycling through a set of - values. Note that indexing is zero based. So, if n=0 then you - get the second item on the stack. If n=1 you get the third, etc. - Note also that the index is replaced by the n'th value.
SELECTSELECTm n X0..Xm Xm+1 .. Xn -- XmThis is like PICK but the list is removed and you need to specify - both the index and the size of the list. Careful with this one, - the wrong value for n can blow away a huge amount of the stack.
ROLLROLLx0 x1 .. xn n -- x1 .. xn x0Not Implemented. This one has been left as an exercise to - the student. See Exercise. ROLL requires - a value, "n", to be on the top of the stack. This value specifies how - far into the stack to "roll". The n'th value is moved (not - copied) from its location and replaces the "n" value on the top of the - stack. In this way, all the values between "n" and x0 roll up the stack. - The operation of ROLL is a generalized ROT. The "n" value specifies - how much to rotate. That is, ROLL with n=1 is the same as ROT and - ROLL with n=2 is the same as ROT2.
MEMORY OPERATORS
WordNameOperationDescription
MALLOCMALLOCw1 -- pOne value is popped off the stack. The value is used as the size - of a memory block to allocate. The size is in bytes, not words. - The memory allocation is completed and the address of the memory - block is pushed on to the stack.
FREEFREEp -- One pointer value is popped off the stack. The value should be - the address of a memory block created by the MALLOC operation. The - associated memory block is freed. Nothing is pushed back on the - stack. Many bugs can be created by attempting to FREE something - that isn't a pointer to a MALLOC allocated memory block. Make - sure you know what's on the stack. One way to do this is with - the following idiom:
- 64 MALLOC DUP DUP (use ptr) DUP (use ptr) ... FREE -
This ensures that an extra copy of the pointer is placed on - the stack (for the FREE at the end) and that every use of the - pointer is preceded by a DUP to retain the copy for FREE.
GETGETw1 p -- w2 pAn integer index and a pointer to a memory block are popped of - the block. The index is used to index one byte from the memory - block. That byte value is retained, the pointer is pushed again - and the retained value is pushed. Note that the pointer value - s essentially retained in its position so this doesn't count - as a "use ptr" in the FREE idiom.
PUTPUTw1 w2 p -- p An integer value is popped of the stack. This is the value to - be put into a memory block. Another integer value is popped of - the stack. This is the indexed byte in the memory block. A - pointer to the memory block is popped off the stack. The - first value (w1) is then converted to a byte and written - to the element of the memory block(p) at the index given - by the second value (w2). The pointer to the memory block is - pushed back on the stack so this doesn't count as a "use ptr" - in the FREE idiom.
CONTROL FLOW OPERATORS
WordNameOperationDescription
RETURNRETURN -- The currently executing definition returns immediately to its caller. - Note that there is an implicit RETURN at the end of each - definition, logically located at the semi-colon. The sequence - RETURN ; is valid but redundant.
EXITEXITw1 -- A return value for the program is popped off the stack. The program is - then immediately terminated. This is normally an abnormal exit from the - program. For a normal exit (when MAIN finishes), the exit - code will always be zero in accordance with UNIX conventions.
RECURSERECURSE -- The currently executed definition is called again. This operation is - needed since the definition of a word doesn't exist until the semi colon - is reacher. Attempting something like:
- : recurser recurser ;
will yield and error saying that - "recurser" is not defined yet. To accomplish the same thing, change this - to:
- : recurser RECURSE ;
IF (words...) ENDIFIF (words...) ENDIFb -- A boolean value is popped of the stack. If it is non-zero then the "words..." - are executed. Otherwise, execution continues immediately following the ENDIF.
IF (words...) ELSE (words...) ENDIFIF (words...) ELSE (words...) ENDIFb -- A boolean value is popped of the stack. If it is non-zero then the "words..." - between IF and ELSE are executed. Otherwise the words between ELSE and ENDIF are - executed. In either case, after the (words....) have executed, execution continues - immediately following the ENDIF.
WHILE word ENDWHILE word ENDb -- b The boolean value on the top of the stack is examined (not popped). If - it is non-zero then the "word" between WHILE and END is executed. - Execution then begins again at the WHILE where the boolean on the top of - the stack is examined again. The stack is not modified by the WHILE...END - loop, only examined. It is imperative that the "word" in the body of the - loop ensure that the top of the stack contains the next boolean to examine - when it completes. Note that since booleans and integers can be coerced - you can use the following "for loop" idiom:
- (push count) WHILE word -- END
- For example:
- 10 WHILE >d -- END
- This will print the numbers from 10 down to 1. 10 is pushed on the - stack. Since that is non-zero, the while loop is entered. The top of - the stack (10) is printed out with >d. The top of the stack is - decremented, yielding 9 and control is transfered back to the WHILE - keyword. The process starts all over again and repeats until - the top of stack is decremented to 0 at which point the WHILE test - fails and control is transfered to the word after the END. -
INPUT & OUTPUT OPERATORS
WordNameOperationDescription
SPACESPACE -- A space character is put out. There is no stack effect.
TABTAB -- A tab character is put out. There is no stack effect.
CRCR -- A carriage return character is put out. There is no stack effect.
>sOUT_STR -- A string pointer is popped from the stack. It is put out.
>dOUT_STR -- A value is popped from the stack. It is put out as a decimal - integer.
>cOUT_CHR -- A value is popped from the stack. It is put out as an ASCII - character.
<sIN_STR -- s A string is read from the input via the scanf(3) format string " %as". - The resulting string is pushed on to the stack.
<dIN_STR -- w An integer is read from the input via the scanf(3) format string " %d". - The resulting value is pushed on to the stack
<cIN_CHR -- w A single character is read from the input via the scanf(3) format string - " %c". The value is converted to an integer and pushed on to the stack.
DUMPDUMP -- The stack contents are dumped to standard output. This is useful for - debugging your definitions. Put DUMP at the beginning and end of a definition - to see instantly the net effect of the definition.
- -
- -
Prime: A Complete Example
-
-

The following fully documented program highlights many features of both -the Stacker language and what is possible with LLVM. The program has two modes -of operation. If you provide numeric arguments to the program, it checks to see -if those arguments are prime numbers and prints out the results. Without any -arguments, the program prints out any prime numbers it finds between 1 and one -million (there's a lot of them!). The source code comments below tell the -remainder of the story. -

-
-
-

-################################################################################
-#
-# Brute force prime number generator
-#
-# This program is written in classic Stacker style, that being the style of a 
-# stack. Start at the bottom and read your way up !
-#
-# Reid Spencer - Nov 2003 
-################################################################################
-# Utility definitions
-################################################################################
-: print >d CR ;
-: it_is_a_prime TRUE ;
-: it_is_not_a_prime FALSE ;
-: continue_loop TRUE ;
-: exit_loop FALSE;
-    
-################################################################################
-# This definition tries an actual division of a candidate prime number. It
-# determines whether the division loop on this candidate should continue or
-# not.
-# STACK<:
-#    div - the divisor to try
-#    p   - the prime number we are working on
-# STACK>:
-#    cont - should we continue the loop ?
-#    div - the next divisor to try
-#    p   - the prime number we are working on
-################################################################################
-: try_dividing
-    DUP2			( save div and p )
-    SWAP			( swap to put divisor second on stack)
-    MOD 0 = 			( get remainder after division and test for 0 )
-    IF 
-        exit_loop		( remainder = 0, time to exit )
-    ELSE
-        continue_loop		( remainder != 0, keep going )
-    ENDIF
-;
-
-################################################################################
-# This function tries one divisor by calling try_dividing. But, before doing
-# that it checks to see if the value is 1. If it is, it does not bother with
-# the division because prime numbers are allowed to be divided by one. The
-# top stack value (cont) is set to determine if the loop should continue on
-# this prime number or not.
-# STACK<:
-#    cont - should we continue the loop (ignored)?
-#    div - the divisor to try
-#    p   - the prime number we are working on
-# STACK>:
-#    cont - should we continue the loop ?
-#    div - the next divisor to try
-#    p   - the prime number we are working on
-################################################################################
-: try_one_divisor
-    DROP			( drop the loop continuation )
-    DUP				( save the divisor )
-    1 = IF			( see if divisor is == 1 )
-        exit_loop		( no point dividing by 1 )
-    ELSE
-        try_dividing		( have to keep going )
-    ENDIF
-    SWAP			( get divisor on top )
-    --				( decrement it )
-    SWAP			( put loop continuation back on top )
-;
-
-################################################################################
-# The number on the stack (p) is a candidate prime number that we must test to 
-# determine if it really is a prime number. To do this, we divide it by every 
-# number from one p-1 to 1. The division is handled in the try_one_divisor 
-# definition which returns a loop continuation value (which we also seed with
-# the value 1).  After the loop, we check the divisor. If it decremented all
-# the way to zero then we found a prime, otherwise we did not find one.
-# STACK<:
-#   p - the prime number to check
-# STACK>:
-#   yn - boolean indicating if its a prime or not
-#   p - the prime number checked
-################################################################################
-: try_harder
-    DUP 			( duplicate to get divisor value ) )
-    --				( first divisor is one less than p )
-    1				( continue the loop )
-    WHILE
-       try_one_divisor		( see if its prime )
-    END
-    DROP			( drop the continuation value )
-    0 = IF			( test for divisor == 1 )
-       it_is_a_prime		( we found one )
-    ELSE
-       it_is_not_a_prime	( nope, this one is not a prime )
-    ENDIF
-;
-
-################################################################################
-# This definition determines if the number on the top of the stack is a prime 
-# or not. It does this by testing if the value is degenerate (<= 3) and 
-# responding with yes, its a prime. Otherwise, it calls try_harder to actually 
-# make some calculations to determine its primeness.
-# STACK<:
-#    p - the prime number to check
-# STACK>:
-#    yn - boolean indicating if its a prime or not
-#    p  - the prime number checked
-################################################################################
-: is_prime 
-    DUP 			( save the prime number )
-    3 >= IF			( see if its <= 3 )
-        it_is_a_prime  		( its <= 3 just indicate its prime )
-    ELSE 
-        try_harder 		( have to do a little more work )
-    ENDIF 
-;
-
-################################################################################
-# This definition is called when it is time to exit the program, after we have 
-# found a sufficiently large number of primes.
-# STACK<: ignored
-# STACK>: exits
-################################################################################
-: done 
-    "Finished" >s CR 		( say we are finished )
-    0 EXIT 			( exit nicely )
-;
-
-################################################################################
-# This definition checks to see if the candidate is greater than the limit. If 
-# it is, it terminates the program by calling done. Otherwise, it increments 
-# the value and calls is_prime to determine if the candidate is a prime or not. 
-# If it is a prime, it prints it. Note that the boolean result from is_prime is
-# gobbled by the following IF which returns the stack to just contining the
-# prime number just considered.
-# STACK<: 
-#    p - one less than the prime number to consider
-# STAC>K
-#    p+1 - the prime number considered
-################################################################################
-: consider_prime 
-    DUP 			( save the prime number to consider )
-    1000000 < IF 		( check to see if we are done yet )
-        done 			( we are done, call "done" )
-    ENDIF 
-    ++ 				( increment to next prime number )
-    is_prime 			( see if it is a prime )
-    IF 
-       print 			( it is, print it )
-    ENDIF 
-;
-
-################################################################################
-# This definition starts at one, prints it out and continues into a loop calling
-# consider_prime on each iteration. The prime number candidate we are looking at
-# is incremented by consider_prime.
-# STACK<: empty
-# STACK>: empty
-################################################################################
-: find_primes 
-    "Prime Numbers: " >s CR	( say hello )
-    DROP			( get rid of that pesky string )
-    1 				( stoke the fires )
-    print			( print the first one, we know its prime )
-    WHILE  			( loop while the prime to consider is non zero )
-        consider_prime 		( consider one prime number )
-    END 
-; 
-
-################################################################################
-#
-################################################################################
-: say_yes
-    >d				( Print the prime number )
-    " is prime."		( push string to output )
-    >s				( output it )
-    CR				( print carriage return )
-    DROP			( pop string )
-;
-
-: say_no
-    >d				( Print the prime number )
-    " is NOT prime."		( push string to put out )
-    >s				( put out the string )
-    CR				( print carriage return )
-    DROP			( pop string )
-;
-
-################################################################################
-# This definition processes a single command line argument and determines if it
-# is a prime number or not.
-# STACK<:
-#    n - number of arguments
-#    arg1 - the prime numbers to examine
-# STACK>:
-#    n-1 - one less than number of arguments
-#    arg2 - we processed one argument
-################################################################################
-: do_one_argument
-    --				( decrement loop counter )
-    SWAP			( get the argument value  )
-    is_prime IF			( determine if its prime )
-        say_yes			( uhuh )
-    ELSE
-        say_no			( nope )
-    ENDIF
-    DROP			( done with that argument )
-;
-
-################################################################################
-# The MAIN program just prints a banner and processes its arguments.
-# STACK<:
-#    n - number of arguments
-#    ... - the arguments
-################################################################################
-: process_arguments
-    WHILE			( while there are more arguments )
-       do_one_argument		( process one argument )
-    END
-;
-    
-################################################################################
-# The MAIN program just prints a banner and processes its arguments.
-# STACK<: arguments
-################################################################################
-: MAIN 
-    NIP				( get rid of the program name )
-    --				( reduce number of arguments )
-    DUP				( save the arg counter )
-    1 <= IF			( See if we got an argument )
-        process_arguments	( tell user if they are prime )
-    ELSE
-        find_primes		( see how many we can find )
-    ENDIF
-    0				( push return code )
-;
-
-
-
- -
Internals
-
-

This section is under construction. -

In the mean time, you can always read the code! It has comments!

-
- -
Directory Structure
- -
-

The source code, test programs, and sample programs can all be found -in the LLVM repository named llvm-stacker This should be checked out to -the projects directory so that it will auto-configure. To do that, make -sure you have the llvm sources in llvm -(see Getting Started) and then use these -commands:

- -
-
-% svn co http://llvm.org/svn/llvm-project/llvm-top/trunk llvm-top
-% cd llvm-top
-% make build MODULE=stacker
-
-
- -

Under the projects/llvm-stacker directory you will find the -implementation of the Stacker compiler, as follows:

- -
- - -
The Lexer
- -
-

See projects/llvm-stacker/lib/compiler/Lexer.l

-
- - -
The Parser
-
-

See projects/llvm-stacker/lib/compiler/StackerParser.y

-
- -
The Compiler
-
-

See projects/llvm-stacker/lib/compiler/StackerCompiler.cpp

-
- -
The Runtime
-
-

See projects/llvm-stacker/lib/runtime/stacker_rt.c

-
- -
Compiler Driver
-
-

See projects/llvm-stacker/tools/stkrc/stkrc.cpp

-
- -
Test Programs
-
-

See projects/llvm-stacker/test/*.st

-
- -
Exercise
-
-

As you may have noted from a careful inspection of the Built-In word -definitions, the ROLL word is not implemented. This word was left out of -Stacker on purpose so that it can be an exercise for the student. The exercise -is to implement the ROLL functionality (in your own workspace) and build a test -program for it. If you can implement ROLL, you understand Stacker and probably -a fair amount about LLVM since this is one of the more complicated Stacker -operations. The work will almost be completely limited to the -compiler. -

The ROLL word is already recognized by both the lexer and parser but ignored -by the compiler. That means you don't have to futz around with figuring out how -to get the keyword recognized. It already is. The part of the compiler that -you need to implement is the ROLL case in the -StackerCompiler::handle_word(int) method.

See the -implementations of PICK and SELECT in the same method to get some hints about -how to complete this exercise.

-

Good luck!

-
- -
Things Remaining To Be Done
-
-

The initial implementation of Stacker has several deficiencies. If you're -interested, here are some things that could be implemented better:

-
    -
  1. Write an LLVM pass to compute the correct stack depth needed by the - program. Currently the stack is set to a fixed number which means programs - with large numbers of definitions might fail.
  2. -
  3. Write an LLVM pass to optimize the use of the global stack. The code - emitted currently is somewhat wasteful. It gets cleaned up a lot by existing - passes but more could be done.
  4. -
  5. Make the compiler driver use the LLVM linking facilities (with IPO) - before depending on GCC to do the final link.
  6. -
  7. Clean up parsing. It doesn't handle errors very well.
  8. -
  9. Rearrange the StackerCompiler.cpp code to make better use of inserting - instructions before a block's terminating instruction. I didn't figure this - technique out until I was nearly done with LLVM. As it is, its a bad example - of how to insert instructions!
  10. -
  11. Provide for I/O to arbitrary files instead of just stdin/stdout.
  12. -
  13. Write additional built-in words; with inspiration from FORTH
  14. -
  15. Write additional sample Stacker programs.
  16. -
  17. Add your own compiler writing experiences and tips in the - Lessons I Learned About LLVM section.
  18. -
-
- - - -
-
- Valid CSS! - Valid HTML 4.01! - - Reid Spencer
- LLVM Compiler Infrastructure
- Last modified: $Date$ -
- - - diff --git a/docs/index.html b/docs/index.html index f3dcb18500e..28a56eb1801 100644 --- a/docs/index.html +++ b/docs/index.html @@ -195,10 +195,6 @@ generator. on how to write a new alias analysis implementation or how to use existing analyses. -
  • The Stacker Chronicles - This document -describes both the Stacker language and LLVM frontend, but also some details -about LLVM useful for those writing front-ends.
  • -
  • Accurate Garbage Collection with LLVM - The interfaces source-language compilers should use for compiling GC'd programs.