From: Chris Lattner
This directory contains projects that are not strictly part of LLVM but are shipped with LLVM. This is also the directory where you should create your own LLVM-based projects. See llvm/projects/sample for an example of how - to set up your own project. See llvm/projects/Stacker for a fully - functional example of a compiler front end.
+ to set up your own project. diff --git a/docs/Stacker.html b/docs/Stacker.html deleted file mode 100644 index 81b623efa9a..00000000000 --- a/docs/Stacker.html +++ /dev/null @@ -1,1428 +0,0 @@ - - - -This document is another way to learn about LLVM. Unlike the -LLVM Reference Manual or -LLVM Programmer's Manual, here we learn -about LLVM through the experience of creating a simple programming language -named Stacker. Stacker was invented specifically as a demonstration of -LLVM. The emphasis in this document is not on describing the -intricacies of LLVM itself but on how to use it to build your own -compiler system.
-Amongst other things, LLVM is a platform for compiler writers. -Because of its exceptionally clean and small IR (intermediate -representation), compiler writing with LLVM is much easier than with -other system. As proof, I wrote the entire compiler (language definition, -lexer, parser, code generator, etc.) in about four days! -That's important to know because it shows how quickly you can get a new -language running when using LLVM. Furthermore, this was the first -language the author ever created using LLVM. The learning curve is -included in that four days.
-The language described here, Stacker, is Forth-like. Programs -are simple collections of word definitions, and the only thing definitions -can do is manipulate a stack or generate I/O. Stacker is not a "real" -programming language; it's very simple. Although it is computationally -complete, you wouldn't use it for your next big project. However, -the fact that it is complete, it's simple, and it doesn't have -a C-like syntax make it useful for demonstration purposes. It shows -that LLVM could be applied to a wide variety of languages.
-The basic notions behind stacker is very simple. There's a stack of -integers (or character pointers) that the program manipulates. Pretty -much the only thing the program can do is manipulate the stack and do -some limited I/O operations. The language provides you with several -built-in words that manipulate the stack in interesting ways. To get -your feet wet, here's how you write the traditional "Hello, World" -program in Stacker:
-: hello_world "Hello, World!" >s DROP CR ;
-: MAIN hello_world ;
This has two "definitions" (Stacker manipulates words, not
-functions and words have definitions): MAIN
and
-hello_world
. The MAIN
definition is standard; it
-tells Stacker where to start. Here, MAIN
is defined to
-simply invoke the word hello_world
. The
-hello_world
definition tells stacker to push the
-"Hello, World!"
string on to the stack, print it out
-(>s
), pop it off the stack (DROP
), and
-finally print a carriage return (CR
). Although
-hello_world
uses the stack, its net effect is null. Well
-written Stacker definitions have that characteristic.
Exercise for the reader: how could you make this a one line program?
-Stacker was written for two purposes:
-During the development of Stacker, many lessons about LLVM were -learned. Those lessons are described in the following subsections.
-
Although I knew that LLVM uses a Single Static Assignment (SSA) format, -it wasn't obvious to me how prevalent this idea was in LLVM until I really -started using it. Reading the -Programmer's Manual and Language Reference, -I noted that most of the important LLVM IR (Intermediate Representation) C++ -classes were derived from the Value class. The full power of that simple -design only became fully understood once I started constructing executable -expressions for Stacker.
- -This really makes your programming go faster. Think about compiling code
-for the following C/C++ expression: (a|b)*((x+1)/(y+1))
. Assuming
-the values are on the stack in the order a, b, x, y, this could be
-expressed in stacker as: 1 + SWAP 1 + / ROT2 OR *
.
-You could write a function using LLVM that computes this expression like
-this:
-Value* -expression(BasicBlock* bb, Value* a, Value* b, Value* x, Value* y ) -{ - ConstantInt* one = ConstantInt::get(Type::IntTy, 1); - BinaryOperator* or1 = BinaryOperator::createOr(a, b, "", bb); - BinaryOperator* add1 = BinaryOperator::createAdd(x, one, "", bb); - BinaryOperator* add2 = BinaryOperator::createAdd(y, one, "", bb); - BinaryOperator* div1 = BinaryOperator::createDiv(add1, add2, "", bb); - BinaryOperator* mult1 = BinaryOperator::createMul(or1, div1, "", bb); - return mult1; -} -
"Okay, big deal," you say? It is a big deal. Here's why. Note that I didn't
-have to tell this function which kinds of Values are being passed in. They could be
-Instruction
s, Constant
s, GlobalVariable
s, or
-any of the other subclasses of Value
that LLVM supports.
-Furthermore, if you specify Values that are incorrect for this sequence of
-operations, LLVM will either notice right away (at compilation time) or the LLVM
-Verifier will pick up the inconsistency when the compiler runs. In either case
-LLVM prevents you from making a type error that gets passed through to the
-generated program. This really helps you write a compiler that
-always generates correct code!
-
The second point is that we don't have to worry about branching, registers, -stack variables, saving partial results, etc. The instructions we create -are the values we use. Note that all that was created in the above -code is a Constant value and five operators. Each of the instructions is -the resulting value of that instruction. This saves a lot of time.
-The lesson is this: SSA form is very powerful: there is no difference -between a value and the instruction that created it. This is fully -enforced by the LLVM IR. Use it to your best advantage.
-I had to learn about terminating blocks the hard way: using the debugger -to figure out what the LLVM verifier was trying to tell me and begging for -help on the LLVMdev mailing list. I hope you avoid this experience.
-Emblazon this rule in your mind:
-BasicBlock
s in your compiler must be
- terminated with a terminating instruction (branch, return, etc.).
- Terminating instructions are a semantic requirement of the LLVM IR. There -is no facility for implicitly chaining together blocks placed into a function -in the order they occur. Indeed, in the general case, blocks will not be -added to the function in the order of execution because of the recursive -way compilers are written.
-Furthermore, if you don't terminate your blocks, your compiler code will -compile just fine. You won't find out about the problem until you're running -the compiler and the module you just created fails on the LLVM Verifier.
-After a little initial fumbling around, I quickly caught on to how blocks -should be constructed. In general, here's what I learned: -
insert_before
argument. At first, I thought this was a mistake
- because clearly the normal mode of inserting instructions would be one at
- a time after some other instruction, not before. However,
- if you hold on to your terminating instruction (or use the handy dandy
- getTerminator()
method on a BasicBlock
), it can
- always be used as the insert_before
argument to your instruction
- constructors. This causes the instruction to automatically be inserted in
- the RightPlace™ place, just before the terminating instruction. The
- nice thing about this design is that you can pass blocks around and insert
- new instructions into them without ever knowing what instructions came
- before. This makes for some very clean compiler design.The foregoing is such an important principal, its worth making an idiom:
--BasicBlock* bb = BasicBlock::Create(); -bb->getInstList().push_back( BranchInst::Create( ... ) ); -new Instruction(..., bb->getTerminator() ); --
To make this clear, consider the typical if-then-else statement -(see StackerCompiler::handle_if() method). We can set this up -in a single function using LLVM in the following way:
--using namespace llvm; -BasicBlock* -MyCompiler::handle_if( BasicBlock* bb, ICmpInst* condition ) -{ - // Create the blocks to contain code in the structure of if/then/else - BasicBlock* then_bb = BasicBlock::Create(); - BasicBlock* else_bb = BasicBlock::Create(); - BasicBlock* exit_bb = BasicBlock::Create(); - - // Insert the branch instruction for the "if" - bb->getInstList().push_back( BranchInst::Create( then_bb, else_bb, condition ) ); - - // Set up the terminating instructions - then->getInstList().push_back( BranchInst::Create( exit_bb ) ); - else->getInstList().push_back( BranchInst::Create( exit_bb ) ); - - // Fill in the then part .. details excised for brevity - this->fill_in( then_bb ); - - // Fill in the else part .. details excised for brevity - this->fill_in( else_bb ); - - // Return a block to the caller that can be filled in with the code - // that follows the if/then/else construct. - return exit_bb; -} --
Presumably in the foregoing, the calls to the "fill_in" method would add
-the instructions for the "then" and "else" parts. They would use the third part
-of the idiom almost exclusively (inserting new instructions before the
-terminator). Furthermore, they could even recurse back to handle_if
-should they encounter another if/then/else statement, and it will just work.
Note how cleanly this all works out. In particular, the push_back methods on
-the BasicBlock
's instruction list. These are lists of type
-Instruction
(which is also of type Value
). To create
-the "if" branch we merely instantiate a BranchInst
that takes as
-arguments the blocks to branch to and the condition to branch on. The
-BasicBlock
objects act like branch labels! This new
-BranchInst
terminates the BasicBlock
provided
-as an argument. To give the caller a way to keep inserting after calling
-handle_if
, we create an exit_bb
block which is
-returned
-to the caller. Note that the exit_bb
block is used as the
-terminator for both the then_bb
and the else_bb
-blocks. This guarantees that no matter what else handle_if
-or fill_in
does, they end up at the exit_bb
block.
-
-One of the first things I noticed is the frequent use of the "push_back" -method on the various lists. This is so common that it is worth mentioning. -The "push_back" inserts a value into an STL list, vector, array, etc. at the -end. The method might have also been named "insert_tail" or "append". -Although I've used STL quite frequently, my use of push_back wasn't very -high in other programs. In LLVM, you'll use it all the time. -
--It took a little getting used to and several rounds of postings to the LLVM -mailing list to wrap my head around this instruction correctly. Even though I had -read the Language Reference and Programmer's Manual a couple times each, I still -missed a few very key points: -
-This means that when you look up an element in the global variable (assuming -it's a struct or array), you must deference the pointer first! For many -things, this leads to the idiom: -
--std::vector<Value*> index_vector; -index_vector.push_back( ConstantInt::get( Type::LongTy, 0 ); -// ... push other indices ... -GetElementPtrInst* gep = GetElementPtrInst::Create( ptr, index_vector ); --
For example, suppose we have a global variable whose type is [24 x int]. The -variable itself represents a pointer to that array. To subscript the -array, we need two indices, not just one. The first index (0) dereferences the -pointer. The second index subscripts the array. If you're a "C" programmer, this -will run against your grain because you'll naturally think of the global array -variable and the address of its first element as the same. That tripped me up -for a while until I realized that they really do differ .. by type. -Remember that LLVM is strongly typed. Everything has a type. -The "type" of the global variable is [24 x int]*. That is, it's -a pointer to an array of 24 ints. When you dereference that global variable with -a single (0) index, you now have a "[24 x int]" type. Although -the pointer value of the dereferenced global and the address of the zero'th element -in the array will be the same, they differ in their type. The zero'th element has -type "int" while the pointer value has type "[24 x int]".
-Get this one aspect of LLVM right in your head, and you'll save yourself -a lot of compiler writing headaches down the road.
-Linkage types in LLVM can be a little confusing, especially if your compiler -writing mind has affixed firm concepts to particular words like "weak", -"external", "global", "linkonce", etc. LLVM does not use the precise -definitions of, say, ELF or GCC, even though they share common terms. To be fair, -the concepts are related and similar but not precisely the same. This can lead -you to think you know what a linkage type represents but in fact it is slightly -different. I recommend you read the - Language Reference on this topic very -carefully. Then, read it again.
-
Here are some handy tips that I discovered along the way:
--Constants in LLVM took a little getting used to until I discovered a few utility -functions in the LLVM IR that make things easier. Here's what I learned:
-This section describes the Stacker language
Stacker definitions define what they do to the global stack. Before -proceeding, a few words about the stack are in order. The stack is simply -a global array of 32-bit integers or pointers. A global index keeps track -of the location of the top of the stack. All of this is hidden from the -programmer, but it needs to be noted because it is the foundation of the -conceptual programming model for Stacker. When you write a definition, -you are, essentially, saying how you want that definition to manipulate -the global stack.
-Manipulating the stack can be quite hazardous. There is no distinction -given and no checking for the various types of values that can be placed -on the stack. Automatic coercion between types is performed. In many -cases, this is useful. For example, a boolean value placed on the stack -can be interpreted as an integer with good results. However, using a -word that interprets that boolean value as a pointer to a string to -print out will almost always yield a crash. Stacker simply leaves it -to the programmer to get it right without any interference or hindering -on interpretation of the stack values. You've been warned. :)
-Punctuation in Stacker is very simple. The colon and semi-colon -characters are used to introduce and terminate a definition -(respectively). Except for FORWARD declarations, definitions -are all you can specify in Stacker. Definitions are read left to right. -Immediately after the colon comes the name of the word being defined. -The remaining words in the definition specify what the word does. The definition -is terminated by a semi-colon.
-So, your typical definition will have the form:
-: name ... ;
-The name
is up to you but it must start with a letter and contain
-only letters, numbers, and underscore. Names are case sensitive and must not be
-the same as the name of a built-in word. The ...
is replaced by
-the stack manipulating words that you wish to define name
as.
-
Stacker supports two types of comments. A hash mark (#) starts a comment - that extends to the end of the line. It is identical to the kind of comments - commonly used in shell scripts. A pair of parentheses also surround a comment. - In both cases, the content of the comment is ignored by the Stacker compiler. The - following does nothing in Stacker. -
-
-# This is a comment to end of line
-( This is an enclosed comment )
-
-See the example program to see comments in use in -a real program.
-There are three kinds of literal values in Stacker: Integers, Strings,
- and Booleans. In each case, the stack operation is to simply push the
- value on to the stack. So, for example:
- 42 " is the answer." TRUE
- will push three values on to the stack: the integer 42, the
- string " is the answer.", and the boolean TRUE.
Each definition in Stacker is composed of a set of words. Words are -read and executed in order from left to right. There is very little -checking in Stacker to make sure you're doing the right thing with -the stack. It is assumed that the programmer knows how the stack -transformation he applies will affect the program.
-Words in a definition come in two flavors: built-in and programmer -defined. Simply mentioning the name of a previously defined or declared -programmer-defined word causes that word's stack actions to be invoked. It -is somewhat like a function call in other languages. The built-in -words have various effects, described below.
-Sometimes you need to call a word before it is defined. For this, you can
-use the FORWARD
declaration. It looks like this:
FORWARD name ;
This simply states to Stacker that "name" is the name of a definition
-that is defined elsewhere. Generally it means the definition can be found
-"forward" in the file. But, it doesn't have to be in the current compilation
-unit. Anything declared with FORWARD
is an external symbol for
-linking.
TODO
-The built-in words of the Stacker language are put in several groups -depending on what they do. The groups are as follows:
-While you may be familiar with many of these operations from other -programming languages, a careful review of their semantics is important -for correct programming in Stacker. Of most importance is the effect -that each of these built-in words has on the global stack. The effect is -not always intuitive. To better describe the effects, we'll borrow from Forth the idiom of -describing the effect on the stack with:
- BEFORE -- AFTER
That is, to the left of the -- is a representation of the stack before -the operation. To the right of the -- is a representation of the stack -after the operation. In the table below that describes the operation of -each of the built in words, we will denote the elements of the stack -using the following construction:
-Definition Of Operation Of Built In Words | |||
---|---|---|---|
LOGICAL OPERATIONS | |||
Word | -Name | -Operation | -Description | -
< | -LT | -w1 w2 -- b | -Two values (w1 and w2) are popped off the stack and - compared. If w1 is less than w2, TRUE is pushed back on - the stack, otherwise FALSE is pushed back on the stack. | -
> | -GT | -w1 w2 -- b | -Two values (w1 and w2) are popped off the stack and - compared. If w1 is greater than w2, TRUE is pushed back on - the stack, otherwise FALSE is pushed back on the stack. | -
>= | -GE | -w1 w2 -- b | -Two values (w1 and w2) are popped off the stack and - compared. If w1 is greater than or equal to w2, TRUE is - pushed back on the stack, otherwise FALSE is pushed back - on the stack. | -
<= | -LE | -w1 w2 -- b | -Two values (w1 and w2) are popped off the stack and - compared. If w1 is less than or equal to w2, TRUE is - pushed back on the stack, otherwise FALSE is pushed back - on the stack. | -
= | -EQ | -w1 w2 -- b | -Two values (w1 and w2) are popped off the stack and - compared. If w1 is equal to w2, TRUE is - pushed back on the stack, otherwise FALSE is pushed back - | -
<> | -NE | -w1 w2 -- b | -Two values (w1 and w2) are popped off the stack and - compared. If w1 is equal to w2, TRUE is - pushed back on the stack, otherwise FALSE is pushed back - | -
FALSE | -FALSE | --- b | -The boolean value FALSE (0) is pushed on to the stack. | -
TRUE | -TRUE | --- b | -The boolean value TRUE (-1) is pushed on to the stack. | -
BITWISE OPERATORS | |||
Word | -Name | -Operation | -Description | -
<< | -SHL | -w1 w2 -- w1<<w2 | -Two values (w1 and w2) are popped off the stack. The w2 - operand is shifted left by the number of bits given by the - w1 operand. The result is pushed back to the stack. | -
>> | -SHR | -w1 w2 -- w1>>w2 | -Two values (w1 and w2) are popped off the stack. The w2 - operand is shifted right by the number of bits given by the - w1 operand. The result is pushed back to the stack. | -
OR | -OR | -w1 w2 -- w2|w1 | -Two values (w1 and w2) are popped off the stack. The values - are bitwise OR'd together and pushed back on the stack. This is - not a logical OR. The sequence 1 2 OR yields 3 not 1. | -
AND | -AND | -w1 w2 -- w2&w1 | -Two values (w1 and w2) are popped off the stack. The values - are bitwise AND'd together and pushed back on the stack. This is - not a logical AND. The sequence 1 2 AND yields 0 not 1. | -
XOR | -XOR | -w1 w2 -- w2^w1 | -Two values (w1 and w2) are popped off the stack. The values - are bitwise exclusive OR'd together and pushed back on the stack. - For example, The sequence 1 3 XOR yields 2. | -
ARITHMETIC OPERATORS | |||
Word | -Name | -Operation | -Description | -
ABS | -ABS | -w -- |w| | -One value s popped off the stack; its absolute value is computed - and then pushed on to the stack. If w1 is -1 then w2 is 1. If w1 is - 1 then w2 is also 1. | -
NEG | -NEG | -w -- -w | -One value is popped off the stack which is negated and then - pushed back on to the stack. If w1 is -1 then w2 is 1. If w1 is - 1 then w2 is -1. | -
+ | -ADD | -w1 w2 -- w2+w1 | -Two values are popped off the stack. Their sum is pushed back - on to the stack | -
- | -SUB | -w1 w2 -- w2-w1 | -Two values are popped off the stack. Their difference is pushed back - on to the stack | -
* | -MUL | -w1 w2 -- w2*w1 | -Two values are popped off the stack. Their product is pushed back - on to the stack | -
/ | -DIV | -w1 w2 -- w2/w1 | -Two values are popped off the stack. Their quotient is pushed back - on to the stack | -
MOD | -MOD | -w1 w2 -- w2%w1 | -Two values are popped off the stack. Their remainder after division - of w1 by w2 is pushed back on to the stack | -
*/ | -STAR_SLAH | -w1 w2 w3 -- (w3*w2)/w1 | -Three values are popped off the stack. The product of w1 and w2 is - divided by w3. The result is pushed back on to the stack. | -
++ | -INCR | -w -- w+1 | -One value is popped off the stack. It is incremented by one and then - pushed back on to the stack. | -
-- | -DECR | -w -- w-1 | -One value is popped off the stack. It is decremented by one and then - pushed back on to the stack. | -
MIN | -MIN | -w1 w2 -- (w2<w1?w2:w1) | -Two values are popped off the stack. The larger one is pushed back - on to the stack. | -
MAX | -MAX | -w1 w2 -- (w2>w1?w2:w1) | -Two values are popped off the stack. The larger value is pushed back - on to the stack. | -
STACK MANIPULATION OPERATORS | |||
Word | -Name | -Operation | -Description | -
DROP | -DROP | -w -- | -One value is popped off the stack. | -
DROP2 | -DROP2 | -w1 w2 -- | -Two values are popped off the stack. | -
NIP | -NIP | -w1 w2 -- w2 | -The second value on the stack is removed from the stack. That is, - a value is popped off the stack and retained. Then a second value is - popped and the retained value is pushed. | -
NIP2 | -NIP2 | -w1 w2 w3 w4 -- w3 w4 | -The third and fourth values on the stack are removed from it. That is, - two values are popped and retained. Then two more values are popped and - the two retained values are pushed back on. | -
DUP | -DUP | -w1 -- w1 w1 | -One value is popped off the stack. That value is then pushed on to - the stack twice to duplicate the top stack vaue. | -
DUP2 | -DUP2 | -w1 w2 -- w1 w2 w1 w2 | -The top two values on the stack are duplicated. That is, two vaues - are popped off the stack. They are alternately pushed back on the - stack twice each. | -
SWAP | -SWAP | -w1 w2 -- w2 w1 | -The top two stack items are reversed in their order. That is, two - values are popped off the stack and pushed back on to the stack in - the opposite order they were popped. | -
SWAP2 | -SWAP2 | -w1 w2 w3 w4 -- w3 w4 w2 w1 | -The top four stack items are swapped in pairs. That is, two values - are popped and retained. Then, two more values are popped and retained. - The values are pushed back on to the stack in the reverse order but - in pairs. | -
OVER | -OVER | -w1 w2-- w1 w2 w1 | -Two values are popped from the stack. They are pushed back - on to the stack in the order w1 w2 w1. This seems to cause the - top stack element to be duplicated "over" the next value. | -
OVER2 | -OVER2 | -w1 w2 w3 w4 -- w1 w2 w3 w4 w1 w2 | -The third and fourth values on the stack are replicated on to the - top of the stack | -
ROT | -ROT | -w1 w2 w3 -- w2 w3 w1 | -The top three values are rotated. That is, three value are popped - off the stack. They are pushed back on to the stack in the order - w1 w3 w2. | -
ROT2 | -ROT2 | -w1 w2 w3 w4 w5 w6 -- w3 w4 w5 w6 w1 w2 | -Like ROT but the rotation is done using three pairs instead of - three singles. | -
RROT | -RROT | -w1 w2 w3 -- w3 w1 w2 | -Reverse rotation. Like ROT, but it rotates the other way around. - Essentially, the third element on the stack is moved to the top - of the stack. | -
RROT2 | -RROT2 | -w1 w2 w3 w4 w5 w6 -- w3 w4 w5 w6 w1 w2 | -Double reverse rotation. Like RROT but the rotation is done using - three pairs instead of three singles. The fifth and sixth stack - elements are moved to the first and second positions | -
TUCK | -TUCK | -w1 w2 -- w2 w1 w2 | -Similar to OVER except that the second operand is being - replicated. Essentially, the first operand is being "tucked" - in between two instances of the second operand. Logically, two - values are popped off the stack. They are placed back on the - stack in the order w2 w1 w2. | -
TUCK2 | -TUCK2 | -w1 w2 w3 w4 -- w3 w4 w1 w2 w3 w4 | -Like TUCK but a pair of elements is tucked over two pairs. - That is, the top two elements of the stack are duplicated and - inserted into the stack at the fifth and positions. | -
PICK | -PICK | -x0 ... Xn n -- x0 ... Xn x0 | -The top of the stack is used as an index into the remainder of - the stack. The element at the nth position replaces the index - (top of stack). This is useful for cycling through a set of - values. Note that indexing is zero based. So, if n=0 then you - get the second item on the stack. If n=1 you get the third, etc. - Note also that the index is replaced by the n'th value. | -
SELECT | -SELECT | -m n X0..Xm Xm+1 .. Xn -- Xm | -This is like PICK but the list is removed and you need to specify - both the index and the size of the list. Careful with this one, - the wrong value for n can blow away a huge amount of the stack. | -
ROLL | -ROLL | -x0 x1 .. xn n -- x1 .. xn x0 | -Not Implemented. This one has been left as an exercise to - the student. See Exercise. ROLL requires - a value, "n", to be on the top of the stack. This value specifies how - far into the stack to "roll". The n'th value is moved (not - copied) from its location and replaces the "n" value on the top of the - stack. In this way, all the values between "n" and x0 roll up the stack. - The operation of ROLL is a generalized ROT. The "n" value specifies - how much to rotate. That is, ROLL with n=1 is the same as ROT and - ROLL with n=2 is the same as ROT2. | -
MEMORY OPERATORS | |||
Word | -Name | -Operation | -Description | -
MALLOC | -MALLOC | -w1 -- p | -One value is popped off the stack. The value is used as the size - of a memory block to allocate. The size is in bytes, not words. - The memory allocation is completed and the address of the memory - block is pushed on to the stack. | -
FREE | -FREE | -p -- | -One pointer value is popped off the stack. The value should be
- the address of a memory block created by the MALLOC operation. The
- associated memory block is freed. Nothing is pushed back on the
- stack. Many bugs can be created by attempting to FREE something
- that isn't a pointer to a MALLOC allocated memory block. Make
- sure you know what's on the stack. One way to do this is with
- the following idiom: - 64 MALLOC DUP DUP (use ptr) DUP (use ptr) ... FREE
- This ensures that an extra copy of the pointer is placed on - the stack (for the FREE at the end) and that every use of the - pointer is preceded by a DUP to retain the copy for FREE. |
-
GET | -GET | -w1 p -- w2 p | -An integer index and a pointer to a memory block are popped of - the block. The index is used to index one byte from the memory - block. That byte value is retained, the pointer is pushed again - and the retained value is pushed. Note that the pointer value - s essentially retained in its position so this doesn't count - as a "use ptr" in the FREE idiom. | -
PUT | -PUT | -w1 w2 p -- p | -An integer value is popped of the stack. This is the value to - be put into a memory block. Another integer value is popped of - the stack. This is the indexed byte in the memory block. A - pointer to the memory block is popped off the stack. The - first value (w1) is then converted to a byte and written - to the element of the memory block(p) at the index given - by the second value (w2). The pointer to the memory block is - pushed back on the stack so this doesn't count as a "use ptr" - in the FREE idiom. | -
CONTROL FLOW OPERATORS | |||
Word | -Name | -Operation | -Description | -
RETURN | -RETURN | --- | -The currently executing definition returns immediately to its caller.
- Note that there is an implicit RETURN at the end of each
- definition, logically located at the semi-colon. The sequence
- RETURN ; is valid but redundant. |
-
EXIT | -EXIT | -w1 -- | -A return value for the program is popped off the stack. The program is
- then immediately terminated. This is normally an abnormal exit from the
- program. For a normal exit (when MAIN finishes), the exit
- code will always be zero in accordance with UNIX conventions. |
-
RECURSE | -RECURSE | --- | -The currently executed definition is called again. This operation is
- needed since the definition of a word doesn't exist until the semi colon
- is reacher. Attempting something like: - : recurser recurser ; will yield and error saying that - "recurser" is not defined yet. To accomplish the same thing, change this - to: - : recurser RECURSE ; |
-
IF (words...) ENDIF | -IF (words...) ENDIF | -b -- | -A boolean value is popped of the stack. If it is non-zero then the "words..." - are executed. Otherwise, execution continues immediately following the ENDIF. | -
IF (words...) ELSE (words...) ENDIF | -IF (words...) ELSE (words...) ENDIF | -b -- | -A boolean value is popped of the stack. If it is non-zero then the "words..." - between IF and ELSE are executed. Otherwise the words between ELSE and ENDIF are - executed. In either case, after the (words....) have executed, execution continues - immediately following the ENDIF. | -
WHILE word END | -WHILE word END | -b -- b | -The boolean value on the top of the stack is examined (not popped). If
- it is non-zero then the "word" between WHILE and END is executed.
- Execution then begins again at the WHILE where the boolean on the top of
- the stack is examined again. The stack is not modified by the WHILE...END
- loop, only examined. It is imperative that the "word" in the body of the
- loop ensure that the top of the stack contains the next boolean to examine
- when it completes. Note that since booleans and integers can be coerced
- you can use the following "for loop" idiom: - (push count) WHILE word -- END - For example: - 10 WHILE >d -- END - This will print the numbers from 10 down to 1. 10 is pushed on the - stack. Since that is non-zero, the while loop is entered. The top of - the stack (10) is printed out with >d. The top of the stack is - decremented, yielding 9 and control is transfered back to the WHILE - keyword. The process starts all over again and repeats until - the top of stack is decremented to 0 at which point the WHILE test - fails and control is transfered to the word after the END. - |
-
INPUT & OUTPUT OPERATORS | |||
Word | -Name | -Operation | -Description | -
SPACE | -SPACE | --- | -A space character is put out. There is no stack effect. | -
TAB | -TAB | --- | -A tab character is put out. There is no stack effect. | -
CR | -CR | --- | -A carriage return character is put out. There is no stack effect. | -
>s | -OUT_STR | --- | -A string pointer is popped from the stack. It is put out. | -
>d | -OUT_STR | --- | -A value is popped from the stack. It is put out as a decimal - integer. | -
>c | -OUT_CHR | --- | -A value is popped from the stack. It is put out as an ASCII - character. | -
<s | -IN_STR | --- s | -A string is read from the input via the scanf(3) format string " %as". - The resulting string is pushed on to the stack. | -
<d | -IN_STR | --- w | -An integer is read from the input via the scanf(3) format string " %d". - The resulting value is pushed on to the stack | -
<c | -IN_CHR | --- w | -A single character is read from the input via the scanf(3) format string - " %c". The value is converted to an integer and pushed on to the stack. | -
DUMP | -DUMP | --- | -The stack contents are dumped to standard output. This is useful for - debugging your definitions. Put DUMP at the beginning and end of a definition - to see instantly the net effect of the definition. | -
The following fully documented program highlights many features of both -the Stacker language and what is possible with LLVM. The program has two modes -of operation. If you provide numeric arguments to the program, it checks to see -if those arguments are prime numbers and prints out the results. Without any -arguments, the program prints out any prime numbers it finds between 1 and one -million (there's a lot of them!). The source code comments below tell the -remainder of the story. -
-
-################################################################################
-#
-# Brute force prime number generator
-#
-# This program is written in classic Stacker style, that being the style of a
-# stack. Start at the bottom and read your way up !
-#
-# Reid Spencer - Nov 2003
-################################################################################
-# Utility definitions
-################################################################################
-: print >d CR ;
-: it_is_a_prime TRUE ;
-: it_is_not_a_prime FALSE ;
-: continue_loop TRUE ;
-: exit_loop FALSE;
-
-################################################################################
-# This definition tries an actual division of a candidate prime number. It
-# determines whether the division loop on this candidate should continue or
-# not.
-# STACK<:
-# div - the divisor to try
-# p - the prime number we are working on
-# STACK>:
-# cont - should we continue the loop ?
-# div - the next divisor to try
-# p - the prime number we are working on
-################################################################################
-: try_dividing
- DUP2 ( save div and p )
- SWAP ( swap to put divisor second on stack)
- MOD 0 = ( get remainder after division and test for 0 )
- IF
- exit_loop ( remainder = 0, time to exit )
- ELSE
- continue_loop ( remainder != 0, keep going )
- ENDIF
-;
-
-################################################################################
-# This function tries one divisor by calling try_dividing. But, before doing
-# that it checks to see if the value is 1. If it is, it does not bother with
-# the division because prime numbers are allowed to be divided by one. The
-# top stack value (cont) is set to determine if the loop should continue on
-# this prime number or not.
-# STACK<:
-# cont - should we continue the loop (ignored)?
-# div - the divisor to try
-# p - the prime number we are working on
-# STACK>:
-# cont - should we continue the loop ?
-# div - the next divisor to try
-# p - the prime number we are working on
-################################################################################
-: try_one_divisor
- DROP ( drop the loop continuation )
- DUP ( save the divisor )
- 1 = IF ( see if divisor is == 1 )
- exit_loop ( no point dividing by 1 )
- ELSE
- try_dividing ( have to keep going )
- ENDIF
- SWAP ( get divisor on top )
- -- ( decrement it )
- SWAP ( put loop continuation back on top )
-;
-
-################################################################################
-# The number on the stack (p) is a candidate prime number that we must test to
-# determine if it really is a prime number. To do this, we divide it by every
-# number from one p-1 to 1. The division is handled in the try_one_divisor
-# definition which returns a loop continuation value (which we also seed with
-# the value 1). After the loop, we check the divisor. If it decremented all
-# the way to zero then we found a prime, otherwise we did not find one.
-# STACK<:
-# p - the prime number to check
-# STACK>:
-# yn - boolean indicating if its a prime or not
-# p - the prime number checked
-################################################################################
-: try_harder
- DUP ( duplicate to get divisor value ) )
- -- ( first divisor is one less than p )
- 1 ( continue the loop )
- WHILE
- try_one_divisor ( see if its prime )
- END
- DROP ( drop the continuation value )
- 0 = IF ( test for divisor == 1 )
- it_is_a_prime ( we found one )
- ELSE
- it_is_not_a_prime ( nope, this one is not a prime )
- ENDIF
-;
-
-################################################################################
-# This definition determines if the number on the top of the stack is a prime
-# or not. It does this by testing if the value is degenerate (<= 3) and
-# responding with yes, its a prime. Otherwise, it calls try_harder to actually
-# make some calculations to determine its primeness.
-# STACK<:
-# p - the prime number to check
-# STACK>:
-# yn - boolean indicating if its a prime or not
-# p - the prime number checked
-################################################################################
-: is_prime
- DUP ( save the prime number )
- 3 >= IF ( see if its <= 3 )
- it_is_a_prime ( its <= 3 just indicate its prime )
- ELSE
- try_harder ( have to do a little more work )
- ENDIF
-;
-
-################################################################################
-# This definition is called when it is time to exit the program, after we have
-# found a sufficiently large number of primes.
-# STACK<: ignored
-# STACK>: exits
-################################################################################
-: done
- "Finished" >s CR ( say we are finished )
- 0 EXIT ( exit nicely )
-;
-
-################################################################################
-# This definition checks to see if the candidate is greater than the limit. If
-# it is, it terminates the program by calling done. Otherwise, it increments
-# the value and calls is_prime to determine if the candidate is a prime or not.
-# If it is a prime, it prints it. Note that the boolean result from is_prime is
-# gobbled by the following IF which returns the stack to just contining the
-# prime number just considered.
-# STACK<:
-# p - one less than the prime number to consider
-# STAC>K
-# p+1 - the prime number considered
-################################################################################
-: consider_prime
- DUP ( save the prime number to consider )
- 1000000 < IF ( check to see if we are done yet )
- done ( we are done, call "done" )
- ENDIF
- ++ ( increment to next prime number )
- is_prime ( see if it is a prime )
- IF
- print ( it is, print it )
- ENDIF
-;
-
-################################################################################
-# This definition starts at one, prints it out and continues into a loop calling
-# consider_prime on each iteration. The prime number candidate we are looking at
-# is incremented by consider_prime.
-# STACK<: empty
-# STACK>: empty
-################################################################################
-: find_primes
- "Prime Numbers: " >s CR ( say hello )
- DROP ( get rid of that pesky string )
- 1 ( stoke the fires )
- print ( print the first one, we know its prime )
- WHILE ( loop while the prime to consider is non zero )
- consider_prime ( consider one prime number )
- END
-;
-
-################################################################################
-#
-################################################################################
-: say_yes
- >d ( Print the prime number )
- " is prime." ( push string to output )
- >s ( output it )
- CR ( print carriage return )
- DROP ( pop string )
-;
-
-: say_no
- >d ( Print the prime number )
- " is NOT prime." ( push string to put out )
- >s ( put out the string )
- CR ( print carriage return )
- DROP ( pop string )
-;
-
-################################################################################
-# This definition processes a single command line argument and determines if it
-# is a prime number or not.
-# STACK<:
-# n - number of arguments
-# arg1 - the prime numbers to examine
-# STACK>:
-# n-1 - one less than number of arguments
-# arg2 - we processed one argument
-################################################################################
-: do_one_argument
- -- ( decrement loop counter )
- SWAP ( get the argument value )
- is_prime IF ( determine if its prime )
- say_yes ( uhuh )
- ELSE
- say_no ( nope )
- ENDIF
- DROP ( done with that argument )
-;
-
-################################################################################
-# The MAIN program just prints a banner and processes its arguments.
-# STACK<:
-# n - number of arguments
-# ... - the arguments
-################################################################################
-: process_arguments
- WHILE ( while there are more arguments )
- do_one_argument ( process one argument )
- END
-;
-
-################################################################################
-# The MAIN program just prints a banner and processes its arguments.
-# STACK<: arguments
-################################################################################
-: MAIN
- NIP ( get rid of the program name )
- -- ( reduce number of arguments )
- DUP ( save the arg counter )
- 1 <= IF ( See if we got an argument )
- process_arguments ( tell user if they are prime )
- ELSE
- find_primes ( see how many we can find )
- ENDIF
- 0 ( push return code )
-;
-
-
-This section is under construction. -
In the mean time, you can always read the code! It has comments!
-The source code, test programs, and sample programs can all be found -in the LLVM repository named llvm-stacker This should be checked out to -the projects directory so that it will auto-configure. To do that, make -sure you have the llvm sources in llvm -(see Getting Started) and then use these -commands:
- --% svn co http://llvm.org/svn/llvm-project/llvm-top/trunk llvm-top -% cd llvm-top -% make build MODULE=stacker --
Under the projects/llvm-stacker directory you will find the -implementation of the Stacker compiler, as follows:
- -See projects/llvm-stacker/lib/compiler/Lexer.l
-See projects/llvm-stacker/lib/compiler/StackerParser.y
-See projects/llvm-stacker/lib/compiler/StackerCompiler.cpp
-See projects/llvm-stacker/lib/runtime/stacker_rt.c
-See projects/llvm-stacker/tools/stkrc/stkrc.cpp
-See projects/llvm-stacker/test/*.st
-As you may have noted from a careful inspection of the Built-In word -definitions, the ROLL word is not implemented. This word was left out of -Stacker on purpose so that it can be an exercise for the student. The exercise -is to implement the ROLL functionality (in your own workspace) and build a test -program for it. If you can implement ROLL, you understand Stacker and probably -a fair amount about LLVM since this is one of the more complicated Stacker -operations. The work will almost be completely limited to the -compiler. -
The ROLL word is already recognized by both the lexer and parser but ignored
-by the compiler. That means you don't have to futz around with figuring out how
-to get the keyword recognized. It already is. The part of the compiler that
-you need to implement is the ROLL
case in the
-StackerCompiler::handle_word(int)
method.
-
Good luck!
-The initial implementation of Stacker has several deficiencies. If you're -interested, here are some things that could be implemented better:
-