Stacker: An Example Of Using LLVM

Date: Mon, 11 Aug 2008 06:13:31 +0000 (+0000) Subject: the stacker doc is way out of date. X-Git-Url: http://demsky.eecs.uci.edu/git/?a=commitdiff_plain;h=4630e4ddcf70d22e231a2f7f30774aecfe15c3a0;p=oota-llvm.git the stacker doc is way out of date. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54631 91177308-0d34-0410-b5e6-96231b3b80d8 --- diff --git a/docs/GettingStarted.html b/docs/GettingStarted.html index d057390d83c..3110ac6db4d 100644 --- a/docs/GettingStarted.html +++ b/docs/GettingStarted.html @@ -1291,8 +1291,7 @@ different tools.

This directory contains projects that are not strictly part of LLVM but are shipped with LLVM. This is also the directory where you should create your own LLVM-based projects. See llvm/projects/sample for an example of how - to set up your own project. See llvm/projects/Stacker for a fully - functional example of a compiler front end.

+ to set up your own project.

diff --git a/docs/Stacker.html b/docs/Stacker.html deleted file mode 100644 index 81b623efa9a..00000000000 --- a/docs/Stacker.html +++ /dev/null @@ -1,1428 +0,0 @@ - - - - Stacker: An Example Of Using LLVM - - - - -

- -

Abstract
Introduction
Lessons I Learned About LLVM -
The Stacker Lexicon -
1. The Stack
2. Punctuation
3. Comments
4. Literals
5. Words
6. Standard Style
7. Built-Ins
Prime: A Complete Example
Internal Code Details -
1. The Directory Structure
2. The Lexer
3. The Parser
4. The Compiler
5. The Runtime
6. Compiler Driver
7. Test Programs
8. Exercise
9. Things Remaining To Be Done

- -

Written by Reid Spencer

- - -

Abstract

This document is another way to learn about LLVM. Unlike the -LLVM Reference Manual or -LLVM Programmer's Manual, here we learn -about LLVM through the experience of creating a simple programming language -named Stacker. Stacker was invented specifically as a demonstration of -LLVM. The emphasis in this document is not on describing the -intricacies of LLVM itself but on how to use it to build your own -compiler system.

- -

Introduction

Amongst other things, LLVM is a platform for compiler writers. -Because of its exceptionally clean and small IR (intermediate -representation), compiler writing with LLVM is much easier than with -other system. As proof, I wrote the entire compiler (language definition, -lexer, parser, code generator, etc.) in about four days! -That's important to know because it shows how quickly you can get a new -language running when using LLVM. Furthermore, this was the first -language the author ever created using LLVM. The learning curve is -included in that four days.

The language described here, Stacker, is Forth-like. Programs -are simple collections of word definitions, and the only thing definitions -can do is manipulate a stack or generate I/O. Stacker is not a "real" -programming language; it's very simple. Although it is computationally -complete, you wouldn't use it for your next big project. However, -the fact that it is complete, it's simple, and it doesn't have -a C-like syntax make it useful for demonstration purposes. It shows -that LLVM could be applied to a wide variety of languages.

The basic notions behind stacker is very simple. There's a stack of -integers (or character pointers) that the program manipulates. Pretty -much the only thing the program can do is manipulate the stack and do -some limited I/O operations. The language provides you with several -built-in words that manipulate the stack in interesting ways. To get -your feet wet, here's how you write the traditional "Hello, World" -program in Stacker:

: hello_world "Hello, World!" >s DROP CR ; -: MAIN hello_world ;

This has two "definitions" (Stacker manipulates words, not -functions and words have definitions): MAIN and -hello_world. The MAIN definition is standard; it -tells Stacker where to start. Here, MAIN is defined to -simply invoke the word hello_world. The -hello_world definition tells stacker to push the -"Hello, World!" string on to the stack, print it out -(>s), pop it off the stack (DROP), and -finally print a carriage return (CR). Although -hello_world uses the stack, its net effect is null. Well -written Stacker definitions have that characteristic.

Exercise for the reader: how could you make this a one line program?

- -

Lessons I Learned About LLVM

Stacker was written for two purposes:

to get the author over the learning curve, and
to provide a simple example of how to write a compiler using LLVM.

During the development of Stacker, many lessons about LLVM were -learned. Those lessons are described in the following subsections.

- -

Everything's a Value!

Although I knew that LLVM uses a Single Static Assignment (SSA) format, -it wasn't obvious to me how prevalent this idea was in LLVM until I really -started using it. Reading the -Programmer's Manual and Language Reference, -I noted that most of the important LLVM IR (Intermediate Representation) C++ -classes were derived from the Value class. The full power of that simple -design only became fully understood once I started constructing executable -expressions for Stacker.

- -

This really makes your programming go faster. Think about compiling code -for the following C/C++ expression: (a|b)*((x+1)/(y+1)). Assuming -the values are on the stack in the order a, b, x, y, this could be -expressed in stacker as: 1 + SWAP 1 + / ROT2 OR *. -You could write a function using LLVM that computes this expression like -this:

- -

-Value* 
-expression(BasicBlock* bb, Value* a, Value* b, Value* x, Value* y )
-{
-    ConstantInt* one = ConstantInt::get(Type::IntTy, 1);
-    BinaryOperator* or1 = BinaryOperator::createOr(a, b, "", bb);
-    BinaryOperator* add1 = BinaryOperator::createAdd(x, one, "", bb);
-    BinaryOperator* add2 = BinaryOperator::createAdd(y, one, "", bb);
-    BinaryOperator* div1 = BinaryOperator::createDiv(add1, add2, "", bb);
-    BinaryOperator* mult1 = BinaryOperator::createMul(or1, div1, "", bb);
-    return mult1;
-}
-

- -

"Okay, big deal," you say? It is a big deal. Here's why. Note that I didn't -have to tell this function which kinds of Values are being passed in. They could be -Instructions, Constants, GlobalVariables, or -any of the other subclasses of Value that LLVM supports. -Furthermore, if you specify Values that are incorrect for this sequence of -operations, LLVM will either notice right away (at compilation time) or the LLVM -Verifier will pick up the inconsistency when the compiler runs. In either case -LLVM prevents you from making a type error that gets passed through to the -generated program. This really helps you write a compiler that -always generates correct code!

The second point is that we don't have to worry about branching, registers, -stack variables, saving partial results, etc. The instructions we create -are the values we use. Note that all that was created in the above -code is a Constant value and five operators. Each of the instructions is -the resulting value of that instruction. This saves a lot of time.

The lesson is this: SSA form is very powerful: there is no difference -between a value and the instruction that created it. This is fully -enforced by the LLVM IR. Use it to your best advantage.

- -

Terminate Those Blocks!

I had to learn about terminating blocks the hard way: using the debugger -to figure out what the LLVM verifier was trying to tell me and begging for -help on the LLVMdev mailing list. I hope you avoid this experience.

Emblazon this rule in your mind:

All BasicBlocks in your compiler must be - terminated with a terminating instruction (branch, return, etc.). -

Terminating instructions are a semantic requirement of the LLVM IR. There -is no facility for implicitly chaining together blocks placed into a function -in the order they occur. Indeed, in the general case, blocks will not be -added to the function in the order of execution because of the recursive -way compilers are written.

Furthermore, if you don't terminate your blocks, your compiler code will -compile just fine. You won't find out about the problem until you're running -the compiler and the module you just created fails on the LLVM Verifier.

- -

Concrete Blocks

After a little initial fumbling around, I quickly caught on to how blocks -should be constructed. In general, here's what I learned: -

Create your blocks early. While writing your compiler, you - will encounter several situations where you know apriori that you will - need several blocks. For example, if-then-else, switch, while, and for - statements in C/C++ all need multiple blocks for expression in LLVM. - The rule is, create them early.
Terminate your blocks early. This just reduces the chances - that you forget to terminate your blocks which is required (go - here for more). -
Use getTerminator() for instruction insertion. I noticed early on - that many of the constructors for the Instruction classes take an optional - insert_before argument. At first, I thought this was a mistake - because clearly the normal mode of inserting instructions would be one at - a time after some other instruction, not before. However, - if you hold on to your terminating instruction (or use the handy dandy - getTerminator() method on a BasicBlock), it can - always be used as the insert_before argument to your instruction - constructors. This causes the instruction to automatically be inserted in - the RightPlace™ place, just before the terminating instruction. The - nice thing about this design is that you can pass blocks around and insert - new instructions into them without ever knowing what instructions came - before. This makes for some very clean compiler design.

The foregoing is such an important principal, its worth making an idiom:

-BasicBlock* bb = BasicBlock::Create();
-bb->getInstList().push_back( BranchInst::Create( ... ) );
-new Instruction(..., bb->getTerminator() );
-

To make this clear, consider the typical if-then-else statement -(see StackerCompiler::handle_if() method). We can set this up -in a single function using LLVM in the following way:

-using namespace llvm;
-BasicBlock*
-MyCompiler::handle_if( BasicBlock* bb, ICmpInst* condition )
-{
-    // Create the blocks to contain code in the structure of if/then/else
-    BasicBlock* then_bb = BasicBlock::Create(); 
-    BasicBlock* else_bb = BasicBlock::Create();
-    BasicBlock* exit_bb = BasicBlock::Create();
-
-    // Insert the branch instruction for the "if"
-    bb->getInstList().push_back( BranchInst::Create( then_bb, else_bb, condition ) );
-
-    // Set up the terminating instructions
-    then->getInstList().push_back( BranchInst::Create( exit_bb ) );
-    else->getInstList().push_back( BranchInst::Create( exit_bb ) );
-
-    // Fill in the then part .. details excised for brevity
-    this->fill_in( then_bb );
-
-    // Fill in the else part .. details excised for brevity
-    this->fill_in( else_bb );
-
-    // Return a block to the caller that can be filled in with the code
-    // that follows the if/then/else construct.
-    return exit_bb;
-}
-

Presumably in the foregoing, the calls to the "fill_in" method would add -the instructions for the "then" and "else" parts. They would use the third part -of the idiom almost exclusively (inserting new instructions before the -terminator). Furthermore, they could even recurse back to handle_if -should they encounter another if/then/else statement, and it will just work.

Note how cleanly this all works out. In particular, the push_back methods on -the BasicBlock's instruction list. These are lists of type -Instruction (which is also of type Value). To create -the "if" branch we merely instantiate a BranchInst that takes as -arguments the blocks to branch to and the condition to branch on. The -BasicBlock objects act like branch labels! This new -BranchInst terminates the BasicBlock provided -as an argument. To give the caller a way to keep inserting after calling -handle_if, we create an exit_bb block which is -returned -to the caller. Note that the exit_bb block is used as the -terminator for both the then_bb and the else_bb -blocks. This guarantees that no matter what else handle_if -or fill_in does, they end up at the exit_bb block. -

- -

push_back Is Your Friend

-One of the first things I noticed is the frequent use of the "push_back" -method on the various lists. This is so common that it is worth mentioning. -The "push_back" inserts a value into an STL list, vector, array, etc. at the -end. The method might have also been named "insert_tail" or "append". -Although I've used STL quite frequently, my use of push_back wasn't very -high in other programs. In LLVM, you'll use it all the time. -

- -

The Wily GetElementPtrInst

-It took a little getting used to and several rounds of postings to the LLVM -mailing list to wrap my head around this instruction correctly. Even though I had -read the Language Reference and Programmer's Manual a couple times each, I still -missed a few very key points: -

GetElementPtrInst gives you back a Value for the last thing indexed.
All global variables in LLVM are pointers.
Pointers must also be dereferenced with the GetElementPtrInst -instruction.

This means that when you look up an element in the global variable (assuming -it's a struct or array), you must deference the pointer first! For many -things, this leads to the idiom: -

-std::vector<Value*> index_vector;
-index_vector.push_back( ConstantInt::get( Type::LongTy, 0 );
-// ... push other indices ...
-GetElementPtrInst* gep = GetElementPtrInst::Create( ptr, index_vector );
-

For example, suppose we have a global variable whose type is [24 x int]. The -variable itself represents a pointer to that array. To subscript the -array, we need two indices, not just one. The first index (0) dereferences the -pointer. The second index subscripts the array. If you're a "C" programmer, this -will run against your grain because you'll naturally think of the global array -variable and the address of its first element as the same. That tripped me up -for a while until I realized that they really do differ .. by type. -Remember that LLVM is strongly typed. Everything has a type. -The "type" of the global variable is [24 x int]*. That is, it's -a pointer to an array of 24 ints. When you dereference that global variable with -a single (0) index, you now have a "[24 x int]" type. Although -the pointer value of the dereferenced global and the address of the zero'th element -in the array will be the same, they differ in their type. The zero'th element has -type "int" while the pointer value has type "[24 x int]".

Get this one aspect of LLVM right in your head, and you'll save yourself -a lot of compiler writing headaches down the road.

- -

Getting Linkage Types Right

Linkage types in LLVM can be a little confusing, especially if your compiler -writing mind has affixed firm concepts to particular words like "weak", -"external", "global", "linkonce", etc. LLVM does not use the precise -definitions of, say, ELF or GCC, even though they share common terms. To be fair, -the concepts are related and similar but not precisely the same. This can lead -you to think you know what a linkage type represents but in fact it is slightly -different. I recommend you read the - Language Reference on this topic very -carefully. Then, read it again.

Here are some handy tips that I discovered along the way:

Uninitialized means external. That is, the symbol is declared in the current - module and can be used by that module, but it is not defined by that module.
Setting an initializer changes a global' linkage type. Setting an - initializer changes a global's linkage type from whatever it was to a normal, - defined global (not external). You'll need to call the setLinkage() method to - reset it if you specify the initializer after the GlobalValue has been constructed. - This is important for LinkOnce and Weak linkage types.
Appending linkage can keep track of things. Appending linkage can - be used to keep track of compilation information at runtime. It could be used, - for example, to build a full table of all the C++ virtual tables or hold the - C++ RTTI data, or whatever. Appending linkage can only be applied to arrays. - All arrays with the same name in each module are concatenated together at link - time.

- -

Constants Are Easier Than That!

-Constants in LLVM took a little getting used to until I discovered a few utility -functions in the LLVM IR that make things easier. Here's what I learned:

Constants are Values like anything else and can be operands of instructions
Integer constants, frequently needed, can be created using the static "get" - methods of the ConstantInt class. The nice thing about these is that you can - "get" any kind of integer quickly.
There's a special method on Constant class which allows you to get the null - constant for any type. This is really handy for initializing large - arrays or structures, etc.

- -

The Stacker Lexicon

This section describes the Stacker language

The Stack

Stacker definitions define what they do to the global stack. Before -proceeding, a few words about the stack are in order. The stack is simply -a global array of 32-bit integers or pointers. A global index keeps track -of the location of the top of the stack. All of this is hidden from the -programmer, but it needs to be noted because it is the foundation of the -conceptual programming model for Stacker. When you write a definition, -you are, essentially, saying how you want that definition to manipulate -the global stack.

Manipulating the stack can be quite hazardous. There is no distinction -given and no checking for the various types of values that can be placed -on the stack. Automatic coercion between types is performed. In many -cases, this is useful. For example, a boolean value placed on the stack -can be interpreted as an integer with good results. However, using a -word that interprets that boolean value as a pointer to a string to -print out will almost always yield a crash. Stacker simply leaves it -to the programmer to get it right without any interference or hindering -on interpretation of the stack values. You've been warned. :)

- -

Punctuation

Punctuation in Stacker is very simple. The colon and semi-colon -characters are used to introduce and terminate a definition -(respectively). Except for FORWARD declarations, definitions -are all you can specify in Stacker. Definitions are read left to right. -Immediately after the colon comes the name of the word being defined. -The remaining words in the definition specify what the word does. The definition -is terminated by a semi-colon.

So, your typical definition will have the form:

: name ... ;

The name is up to you but it must start with a letter and contain -only letters, numbers, and underscore. Names are case sensitive and must not be -the same as the name of a built-in word. The ... is replaced by -the stack manipulating words that you wish to define name as.

- -

Comments

Stacker supports two types of comments. A hash mark (#) starts a comment - that extends to the end of the line. It is identical to the kind of comments - commonly used in shell scripts. A pair of parentheses also surround a comment. - In both cases, the content of the comment is ignored by the Stacker compiler. The - following does nothing in Stacker. -


-# This is a comment to end of line
-( This is an enclosed comment )
-

See the example program to see comments in use in -a real program.

- -

Literals

There are three kinds of literal values in Stacker: Integers, Strings, - and Booleans. In each case, the stack operation is to simply push the - value on to the stack. So, for example:
- 42 " is the answer." TRUE
- will push three values on to the stack: the integer 42, the - string " is the answer.", and the boolean TRUE.

- -

Words

Each definition in Stacker is composed of a set of words. Words are -read and executed in order from left to right. There is very little -checking in Stacker to make sure you're doing the right thing with -the stack. It is assumed that the programmer knows how the stack -transformation he applies will affect the program.

Words in a definition come in two flavors: built-in and programmer -defined. Simply mentioning the name of a previously defined or declared -programmer-defined word causes that word's stack actions to be invoked. It -is somewhat like a function call in other languages. The built-in -words have various effects, described below.

Sometimes you need to call a word before it is defined. For this, you can -use the FORWARD declaration. It looks like this:

FORWARD name ;

This simply states to Stacker that "name" is the name of a definition -that is defined elsewhere. Generally it means the definition can be found -"forward" in the file. But, it doesn't have to be in the current compilation -unit. Anything declared with FORWARD is an external symbol for -linking.

- -

Standard Style

TODO

- -

Built In Words

The built-in words of the Stacker language are put in several groups -depending on what they do. The groups are as follows:

Logical: These words provide the logical operations for - comparing stack operands.
The words are: < > <= >= - = <> true false.
Bitwise: These words perform bitwise computations on - their operands.
The words are: << >> XOR AND NOT
Arithmetic: These words perform arithmetic computations on - their operands.
The words are: ABS NEG + - * / MOD */ ++ -- MIN MAX
StackThese words manipulate the stack directly by moving - its elements around.
The words are: DROP DROP2 NIP NIP2 DUP DUP2 - SWAP SWAP2 OVER OVER2 ROT ROT2 RROT RROT2 TUCK TUCK2 PICK SELECT ROLL
MemoryThese words allocate, free, and manipulate memory - areas outside the stack.
The words are: MALLOC FREE GET PUT
Control: These words alter the normal left to right flow - of execution.
The words are: IF ELSE ENDIF WHILE END RETURN EXIT RECURSE
I/O: These words perform output on the standard output - and input on the standard input. No other I/O is possible in Stacker. -
The words are: SPACE TAB CR >s >d >c <s <d <c.

While you may be familiar with many of these operations from other -programming languages, a careful review of their semantics is important -for correct programming in Stacker. Of most importance is the effect -that each of these built-in words has on the global stack. The effect is -not always intuitive. To better describe the effects, we'll borrow from Forth the idiom of -describing the effect on the stack with:

BEFORE -- AFTER

That is, to the left of the -- is a representation of the stack before -the operation. To the right of the -- is a representation of the stack -after the operation. In the table below that describes the operation of -each of the built in words, we will denote the elements of the stack -using the following construction:

b - a boolean truth value
w - a normal integer valued word.
s - a pointer to a string value
p - a pointer to a malloc'd memory block

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Definition Of Operation Of Built In Words
LOGICAL OPERATIONS
Word	Name	Operation	Description
<	LT	w1 w2 -- b	Two values (w1 and w2) are popped off the stack and - compared. If w1 is less than w2, TRUE is pushed back on - the stack, otherwise FALSE is pushed back on the stack.
>	GT	w1 w2 -- b	Two values (w1 and w2) are popped off the stack and - compared. If w1 is greater than w2, TRUE is pushed back on - the stack, otherwise FALSE is pushed back on the stack.
>=	GE	w1 w2 -- b	Two values (w1 and w2) are popped off the stack and - compared. If w1 is greater than or equal to w2, TRUE is - pushed back on the stack, otherwise FALSE is pushed back - on the stack.
<=	LE	w1 w2 -- b	Two values (w1 and w2) are popped off the stack and - compared. If w1 is less than or equal to w2, TRUE is - pushed back on the stack, otherwise FALSE is pushed back - on the stack.
=	EQ	w1 w2 -- b	Two values (w1 and w2) are popped off the stack and - compared. If w1 is equal to w2, TRUE is - pushed back on the stack, otherwise FALSE is pushed back -
<>	NE	w1 w2 -- b	Two values (w1 and w2) are popped off the stack and - compared. If w1 is equal to w2, TRUE is - pushed back on the stack, otherwise FALSE is pushed back -
FALSE	FALSE	-- b	The boolean value FALSE (0) is pushed on to the stack.
TRUE	TRUE	-- b	The boolean value TRUE (-1) is pushed on to the stack.
BITWISE OPERATORS
Word	Name	Operation	Description
<<	SHL	w1 w2 -- w1<<w2	Two values (w1 and w2) are popped off the stack. The w2 - operand is shifted left by the number of bits given by the - w1 operand. The result is pushed back to the stack.
>>	SHR	w1 w2 -- w1>>w2	Two values (w1 and w2) are popped off the stack. The w2 - operand is shifted right by the number of bits given by the - w1 operand. The result is pushed back to the stack.
OR	OR	w1 w2 -- w2\|w1	Two values (w1 and w2) are popped off the stack. The values - are bitwise OR'd together and pushed back on the stack. This is - not a logical OR. The sequence 1 2 OR yields 3 not 1.
AND	AND	w1 w2 -- w2&w1	Two values (w1 and w2) are popped off the stack. The values - are bitwise AND'd together and pushed back on the stack. This is - not a logical AND. The sequence 1 2 AND yields 0 not 1.
XOR	XOR	w1 w2 -- w2^w1	Two values (w1 and w2) are popped off the stack. The values - are bitwise exclusive OR'd together and pushed back on the stack. - For example, The sequence 1 3 XOR yields 2.
ARITHMETIC OPERATORS
Word	Name	Operation	Description
ABS	ABS	w -- \|w\|	One value s popped off the stack; its absolute value is computed - and then pushed on to the stack. If w1 is -1 then w2 is 1. If w1 is - 1 then w2 is also 1.
NEG	NEG	w -- -w	One value is popped off the stack which is negated and then - pushed back on to the stack. If w1 is -1 then w2 is 1. If w1 is - 1 then w2 is -1.
+	ADD	w1 w2 -- w2+w1	Two values are popped off the stack. Their sum is pushed back - on to the stack
-	SUB	w1 w2 -- w2-w1	Two values are popped off the stack. Their difference is pushed back - on to the stack
*	MUL	w1 w2 -- w2*w1	Two values are popped off the stack. Their product is pushed back - on to the stack
/	DIV	w1 w2 -- w2/w1	Two values are popped off the stack. Their quotient is pushed back - on to the stack
MOD	MOD	w1 w2 -- w2%w1	Two values are popped off the stack. Their remainder after division - of w1 by w2 is pushed back on to the stack
*/	STAR_SLAH	w1 w2 w3 -- (w3*w2)/w1	Three values are popped off the stack. The product of w1 and w2 is - divided by w3. The result is pushed back on to the stack.
++	INCR	w -- w+1	One value is popped off the stack. It is incremented by one and then - pushed back on to the stack.
--	DECR	w -- w-1	One value is popped off the stack. It is decremented by one and then - pushed back on to the stack.
MIN	MIN	w1 w2 -- (w2<w1?w2:w1)	Two values are popped off the stack. The larger one is pushed back - on to the stack.
MAX	MAX	w1 w2 -- (w2>w1?w2:w1)	Two values are popped off the stack. The larger value is pushed back - on to the stack.
STACK MANIPULATION OPERATORS
Word	Name	Operation	Description
DROP	DROP	w --	One value is popped off the stack.
DROP2	DROP2	w1 w2 --	Two values are popped off the stack.
NIP	NIP	w1 w2 -- w2	The second value on the stack is removed from the stack. That is, - a value is popped off the stack and retained. Then a second value is - popped and the retained value is pushed.
NIP2	NIP2	w1 w2 w3 w4 -- w3 w4	The third and fourth values on the stack are removed from it. That is, - two values are popped and retained. Then two more values are popped and - the two retained values are pushed back on.
DUP	DUP	w1 -- w1 w1	One value is popped off the stack. That value is then pushed on to - the stack twice to duplicate the top stack vaue.
DUP2	DUP2	w1 w2 -- w1 w2 w1 w2	The top two values on the stack are duplicated. That is, two vaues - are popped off the stack. They are alternately pushed back on the - stack twice each.
SWAP	SWAP	w1 w2 -- w2 w1	The top two stack items are reversed in their order. That is, two - values are popped off the stack and pushed back on to the stack in - the opposite order they were popped.
SWAP2	SWAP2	w1 w2 w3 w4 -- w3 w4 w2 w1	The top four stack items are swapped in pairs. That is, two values - are popped and retained. Then, two more values are popped and retained. - The values are pushed back on to the stack in the reverse order but - in pairs.
OVER	OVER	w1 w2-- w1 w2 w1	Two values are popped from the stack. They are pushed back - on to the stack in the order w1 w2 w1. This seems to cause the - top stack element to be duplicated "over" the next value.
OVER2	OVER2	w1 w2 w3 w4 -- w1 w2 w3 w4 w1 w2	The third and fourth values on the stack are replicated on to the - top of the stack
ROT	ROT	w1 w2 w3 -- w2 w3 w1	The top three values are rotated. That is, three value are popped - off the stack. They are pushed back on to the stack in the order - w1 w3 w2.
ROT2	ROT2	w1 w2 w3 w4 w5 w6 -- w3 w4 w5 w6 w1 w2	Like ROT but the rotation is done using three pairs instead of - three singles.
RROT	RROT	w1 w2 w3 -- w3 w1 w2	Reverse rotation. Like ROT, but it rotates the other way around. - Essentially, the third element on the stack is moved to the top - of the stack.
RROT2	RROT2	w1 w2 w3 w4 w5 w6 -- w3 w4 w5 w6 w1 w2	Double reverse rotation. Like RROT but the rotation is done using - three pairs instead of three singles. The fifth and sixth stack - elements are moved to the first and second positions
TUCK	TUCK	w1 w2 -- w2 w1 w2	Similar to OVER except that the second operand is being - replicated. Essentially, the first operand is being "tucked" - in between two instances of the second operand. Logically, two - values are popped off the stack. They are placed back on the - stack in the order w2 w1 w2.
TUCK2	TUCK2	w1 w2 w3 w4 -- w3 w4 w1 w2 w3 w4	Like TUCK but a pair of elements is tucked over two pairs. - That is, the top two elements of the stack are duplicated and - inserted into the stack at the fifth and positions.
PICK	PICK	x0 ... Xn n -- x0 ... Xn x0	The top of the stack is used as an index into the remainder of - the stack. The element at the nth position replaces the index - (top of stack). This is useful for cycling through a set of - values. Note that indexing is zero based. So, if n=0 then you - get the second item on the stack. If n=1 you get the third, etc. - Note also that the index is replaced by the n'th value.
SELECT	SELECT	m n X0..Xm Xm+1 .. Xn -- Xm	This is like PICK but the list is removed and you need to specify - both the index and the size of the list. Careful with this one, - the wrong value for n can blow away a huge amount of the stack.
ROLL	ROLL	x0 x1 .. xn n -- x1 .. xn x0	Not Implemented. This one has been left as an exercise to - the student. See Exercise. ROLL requires - a value, "n", to be on the top of the stack. This value specifies how - far into the stack to "roll". The n'th value is moved (not - copied) from its location and replaces the "n" value on the top of the - stack. In this way, all the values between "n" and x0 roll up the stack. - The operation of ROLL is a generalized ROT. The "n" value specifies - how much to rotate. That is, ROLL with n=1 is the same as ROT and - ROLL with n=2 is the same as ROT2.
MEMORY OPERATORS
Word	Name	Operation	Description
MALLOC	MALLOC	w1 -- p	One value is popped off the stack. The value is used as the size - of a memory block to allocate. The size is in bytes, not words. - The memory allocation is completed and the address of the memory - block is pushed on to the stack.
FREE	FREE	p --	One pointer value is popped off the stack. The value should be - the address of a memory block created by the MALLOC operation. The - associated memory block is freed. Nothing is pushed back on the - stack. Many bugs can be created by attempting to FREE something - that isn't a pointer to a MALLOC allocated memory block. Make - sure you know what's on the stack. One way to do this is with - the following idiom: - `64 MALLOC DUP DUP (use ptr) DUP (use ptr) ... FREE` - This ensures that an extra copy of the pointer is placed on - the stack (for the FREE at the end) and that every use of the - pointer is preceded by a DUP to retain the copy for FREE.
GET	GET	w1 p -- w2 p	An integer index and a pointer to a memory block are popped of - the block. The index is used to index one byte from the memory - block. That byte value is retained, the pointer is pushed again - and the retained value is pushed. Note that the pointer value - s essentially retained in its position so this doesn't count - as a "use ptr" in the FREE idiom.
PUT	PUT	w1 w2 p -- p	An integer value is popped of the stack. This is the value to - be put into a memory block. Another integer value is popped of - the stack. This is the indexed byte in the memory block. A - pointer to the memory block is popped off the stack. The - first value (w1) is then converted to a byte and written - to the element of the memory block(p) at the index given - by the second value (w2). The pointer to the memory block is - pushed back on the stack so this doesn't count as a "use ptr" - in the FREE idiom.
CONTROL FLOW OPERATORS
Word	Name	Operation	Description
RETURN	RETURN	--	The currently executing definition returns immediately to its caller. - Note that there is an implicit `RETURN` at the end of each - definition, logically located at the semi-colon. The sequence - `RETURN ;` is valid but redundant.
EXIT	EXIT	w1 --	A return value for the program is popped off the stack. The program is - then immediately terminated. This is normally an abnormal exit from the - program. For a normal exit (when `MAIN` finishes), the exit - code will always be zero in accordance with UNIX conventions.
RECURSE	RECURSE	--	The currently executed definition is called again. This operation is - needed since the definition of a word doesn't exist until the semi colon - is reacher. Attempting something like: - `: recurser recurser ;` will yield and error saying that - "recurser" is not defined yet. To accomplish the same thing, change this - to: - `: recurser RECURSE ;`
IF (words...) ENDIF	IF (words...) ENDIF	b --	A boolean value is popped of the stack. If it is non-zero then the "words..." - are executed. Otherwise, execution continues immediately following the ENDIF.
IF (words...) ELSE (words...) ENDIF	IF (words...) ELSE (words...) ENDIF	b --	A boolean value is popped of the stack. If it is non-zero then the "words..." - between IF and ELSE are executed. Otherwise the words between ELSE and ENDIF are - executed. In either case, after the (words....) have executed, execution continues - immediately following the ENDIF.
WHILE word END	WHILE word END	b -- b	The boolean value on the top of the stack is examined (not popped). If - it is non-zero then the "word" between WHILE and END is executed. - Execution then begins again at the WHILE where the boolean on the top of - the stack is examined again. The stack is not modified by the WHILE...END - loop, only examined. It is imperative that the "word" in the body of the - loop ensure that the top of the stack contains the next boolean to examine - when it completes. Note that since booleans and integers can be coerced - you can use the following "for loop" idiom: - `(push count) WHILE word -- END` - For example: - `10 WHILE >d -- END` - This will print the numbers from 10 down to 1. 10 is pushed on the - stack. Since that is non-zero, the while loop is entered. The top of - the stack (10) is printed out with >d. The top of the stack is - decremented, yielding 9 and control is transfered back to the WHILE - keyword. The process starts all over again and repeats until - the top of stack is decremented to 0 at which point the WHILE test - fails and control is transfered to the word after the END. -
INPUT & OUTPUT OPERATORS
Word	Name	Operation	Description
SPACE	SPACE	--	A space character is put out. There is no stack effect.
TAB	TAB	--	A tab character is put out. There is no stack effect.
CR	CR	--	A carriage return character is put out. There is no stack effect.
>s	OUT_STR	--	A string pointer is popped from the stack. It is put out.
>d	OUT_STR	--	A value is popped from the stack. It is put out as a decimal - integer.
>c	OUT_CHR	--	A value is popped from the stack. It is put out as an ASCII - character.
<s	IN_STR	-- s	A string is read from the input via the scanf(3) format string " %as". - The resulting string is pushed on to the stack.
<d	IN_STR	-- w	An integer is read from the input via the scanf(3) format string " %d". - The resulting value is pushed on to the stack
<c	IN_CHR	-- w	A single character is read from the input via the scanf(3) format string - " %c". The value is converted to an integer and pushed on to the stack.
DUMP	DUMP	--	The stack contents are dumped to standard output. This is useful for - debugging your definitions. Put DUMP at the beginning and end of a definition - to see instantly the net effect of the definition.

- -

Prime: A Complete Example

The following fully documented program highlights many features of both -the Stacker language and what is possible with LLVM. The program has two modes -of operation. If you provide numeric arguments to the program, it checks to see -if those arguments are prime numbers and prints out the results. Without any -arguments, the program prints out any prime numbers it finds between 1 and one -million (there's a lot of them!). The source code comments below tell the -remainder of the story. -


-################################################################################
-#
-# Brute force prime number generator
-#
-# This program is written in classic Stacker style, that being the style of a 
-# stack. Start at the bottom and read your way up !
-#
-# Reid Spencer - Nov 2003 
-################################################################################
-# Utility definitions
-################################################################################
-: print >d CR ;
-: it_is_a_prime TRUE ;
-: it_is_not_a_prime FALSE ;
-: continue_loop TRUE ;
-: exit_loop FALSE;
-    
-################################################################################
-# This definition tries an actual division of a candidate prime number. It
-# determines whether the division loop on this candidate should continue or
-# not.
-# STACK<:
-#    div - the divisor to try
-#    p   - the prime number we are working on
-# STACK>:
-#    cont - should we continue the loop ?
-#    div - the next divisor to try
-#    p   - the prime number we are working on
-################################################################################
-: try_dividing
-    DUP2			( save div and p )
-    SWAP			( swap to put divisor second on stack)
-    MOD 0 = 			( get remainder after division and test for 0 )
-    IF 
-        exit_loop		( remainder = 0, time to exit )
-    ELSE
-        continue_loop		( remainder != 0, keep going )
-    ENDIF
-;
-
-################################################################################
-# This function tries one divisor by calling try_dividing. But, before doing
-# that it checks to see if the value is 1. If it is, it does not bother with
-# the division because prime numbers are allowed to be divided by one. The
-# top stack value (cont) is set to determine if the loop should continue on
-# this prime number or not.
-# STACK<:
-#    cont - should we continue the loop (ignored)?
-#    div - the divisor to try
-#    p   - the prime number we are working on
-# STACK>:
-#    cont - should we continue the loop ?
-#    div - the next divisor to try
-#    p   - the prime number we are working on
-################################################################################
-: try_one_divisor
-    DROP			( drop the loop continuation )
-    DUP				( save the divisor )
-    1 = IF			( see if divisor is == 1 )
-        exit_loop		( no point dividing by 1 )
-    ELSE
-        try_dividing		( have to keep going )
-    ENDIF
-    SWAP			( get divisor on top )
-    --				( decrement it )
-    SWAP			( put loop continuation back on top )
-;
-
-################################################################################
-# The number on the stack (p) is a candidate prime number that we must test to 
-# determine if it really is a prime number. To do this, we divide it by every 
-# number from one p-1 to 1. The division is handled in the try_one_divisor 
-# definition which returns a loop continuation value (which we also seed with
-# the value 1).  After the loop, we check the divisor. If it decremented all
-# the way to zero then we found a prime, otherwise we did not find one.
-# STACK<:
-#   p - the prime number to check
-# STACK>:
-#   yn - boolean indicating if its a prime or not
-#   p - the prime number checked
-################################################################################
-: try_harder
-    DUP 			( duplicate to get divisor value ) )
-    --				( first divisor is one less than p )
-    1				( continue the loop )
-    WHILE
-       try_one_divisor		( see if its prime )
-    END
-    DROP			( drop the continuation value )
-    0 = IF			( test for divisor == 1 )
-       it_is_a_prime		( we found one )
-    ELSE
-       it_is_not_a_prime	( nope, this one is not a prime )
-    ENDIF
-;
-
-################################################################################
-# This definition determines if the number on the top of the stack is a prime 
-# or not. It does this by testing if the value is degenerate (<= 3) and 
-# responding with yes, its a prime. Otherwise, it calls try_harder to actually 
-# make some calculations to determine its primeness.
-# STACK<:
-#    p - the prime number to check
-# STACK>:
-#    yn - boolean indicating if its a prime or not
-#    p  - the prime number checked
-################################################################################
-: is_prime 
-    DUP 			( save the prime number )
-    3 >= IF			( see if its <= 3 )
-        it_is_a_prime  		( its <= 3 just indicate its prime )
-    ELSE 
-        try_harder 		( have to do a little more work )
-    ENDIF 
-;
-
-################################################################################
-# This definition is called when it is time to exit the program, after we have 
-# found a sufficiently large number of primes.
-# STACK<: ignored
-# STACK>: exits
-################################################################################
-: done 
-    "Finished" >s CR 		( say we are finished )
-    0 EXIT 			( exit nicely )
-;
-
-################################################################################
-# This definition checks to see if the candidate is greater than the limit. If 
-# it is, it terminates the program by calling done. Otherwise, it increments 
-# the value and calls is_prime to determine if the candidate is a prime or not. 
-# If it is a prime, it prints it. Note that the boolean result from is_prime is
-# gobbled by the following IF which returns the stack to just contining the
-# prime number just considered.
-# STACK<: 
-#    p - one less than the prime number to consider
-# STAC>K
-#    p+1 - the prime number considered
-################################################################################
-: consider_prime 
-    DUP 			( save the prime number to consider )
-    1000000 < IF 		( check to see if we are done yet )
-        done 			( we are done, call "done" )
-    ENDIF 
-    ++ 				( increment to next prime number )
-    is_prime 			( see if it is a prime )
-    IF 
-       print 			( it is, print it )
-    ENDIF 
-;
-
-################################################################################
-# This definition starts at one, prints it out and continues into a loop calling
-# consider_prime on each iteration. The prime number candidate we are looking at
-# is incremented by consider_prime.
-# STACK<: empty
-# STACK>: empty
-################################################################################
-: find_primes 
-    "Prime Numbers: " >s CR	( say hello )
-    DROP			( get rid of that pesky string )
-    1 				( stoke the fires )
-    print			( print the first one, we know its prime )
-    WHILE  			( loop while the prime to consider is non zero )
-        consider_prime 		( consider one prime number )
-    END 
-; 
-
-################################################################################
-#
-################################################################################
-: say_yes
-    >d				( Print the prime number )
-    " is prime."		( push string to output )
-    >s				( output it )
-    CR				( print carriage return )
-    DROP			( pop string )
-;
-
-: say_no
-    >d				( Print the prime number )
-    " is NOT prime."		( push string to put out )
-    >s				( put out the string )
-    CR				( print carriage return )
-    DROP			( pop string )
-;
-
-################################################################################
-# This definition processes a single command line argument and determines if it
-# is a prime number or not.
-# STACK<:
-#    n - number of arguments
-#    arg1 - the prime numbers to examine
-# STACK>:
-#    n-1 - one less than number of arguments
-#    arg2 - we processed one argument
-################################################################################
-: do_one_argument
-    --				( decrement loop counter )
-    SWAP			( get the argument value  )
-    is_prime IF			( determine if its prime )
-        say_yes			( uhuh )
-    ELSE
-        say_no			( nope )
-    ENDIF
-    DROP			( done with that argument )
-;
-
-################################################################################
-# The MAIN program just prints a banner and processes its arguments.
-# STACK<:
-#    n - number of arguments
-#    ... - the arguments
-################################################################################
-: process_arguments
-    WHILE			( while there are more arguments )
-       do_one_argument		( process one argument )
-    END
-;
-    
-################################################################################
-# The MAIN program just prints a banner and processes its arguments.
-# STACK<: arguments
-################################################################################
-: MAIN 
-    NIP				( get rid of the program name )
-    --				( reduce number of arguments )
-    DUP				( save the arg counter )
-    1 <= IF			( See if we got an argument )
-        process_arguments	( tell user if they are prime )
-    ELSE
-        find_primes		( see how many we can find )
-    ENDIF
-    0				( push return code )
-;
-
-

- -

Internals

This section is under construction. -

In the mean time, you can always read the code! It has comments!

- -

Directory Structure

- -

The source code, test programs, and sample programs can all be found -in the LLVM repository named llvm-stacker This should be checked out to -the projects directory so that it will auto-configure. To do that, make -sure you have the llvm sources in llvm -(see Getting Started) and then use these -commands:

- -

-% svn co http://llvm.org/svn/llvm-project/llvm-top/trunk llvm-top
-% cd llvm-top
-% make build MODULE=stacker
-

- -

Under the projects/llvm-stacker directory you will find the -implementation of the Stacker compiler, as follows:

- -

lib - contains most of the source code -
- lib/compiler - contains the compiler library -
- lib/runtime - contains the runtime library -
test - contains the test programs
tools - contains the Stacker compiler main program, stkrc -
- lib/stkrc - contains the Stacker compiler main program - -
- sample - contains the sample programs
-

- - -

The Lexer

- -

See projects/llvm-stacker/lib/compiler/Lexer.l

- - -

The Parser

See projects/llvm-stacker/lib/compiler/StackerParser.y

- -

The Compiler

See projects/llvm-stacker/lib/compiler/StackerCompiler.cpp

- -

The Runtime

See projects/llvm-stacker/lib/runtime/stacker_rt.c

- -

Compiler Driver

See projects/llvm-stacker/tools/stkrc/stkrc.cpp

- -

Test Programs

See projects/llvm-stacker/test/*.st

- -

Exercise

As you may have noted from a careful inspection of the Built-In word -definitions, the ROLL word is not implemented. This word was left out of -Stacker on purpose so that it can be an exercise for the student. The exercise -is to implement the ROLL functionality (in your own workspace) and build a test -program for it. If you can implement ROLL, you understand Stacker and probably -a fair amount about LLVM since this is one of the more complicated Stacker -operations. The work will almost be completely limited to the -compiler. -

The ROLL word is already recognized by both the lexer and parser but ignored -by the compiler. That means you don't have to futz around with figuring out how -to get the keyword recognized. It already is. The part of the compiler that -you need to implement is the ROLL case in the -StackerCompiler::handle_word(int) method.

See the -implementations of PICK and SELECT in the same method to get some hints about -how to complete this exercise.

Good luck!

- -

Things Remaining To Be Done

The initial implementation of Stacker has several deficiencies. If you're -interested, here are some things that could be implemented better:

Write an LLVM pass to compute the correct stack depth needed by the - program. Currently the stack is set to a fixed number which means programs - with large numbers of definitions might fail.
Write an LLVM pass to optimize the use of the global stack. The code - emitted currently is somewhat wasteful. It gets cleaned up a lot by existing - passes but more could be done.
Make the compiler driver use the LLVM linking facilities (with IPO) - before depending on GCC to do the final link.
Clean up parsing. It doesn't handle errors very well.
Rearrange the StackerCompiler.cpp code to make better use of inserting - instructions before a block's terminating instruction. I didn't figure this - technique out until I was nearly done with LLVM. As it is, its a bad example - of how to insert instructions!
Provide for I/O to arbitrary files instead of just stdin/stdout.
Write additional built-in words; with inspiration from FORTH
Write additional sample Stacker programs.
Add your own compiler writing experiences and tips in the - Lessons I Learned About LLVM section.

- - - -

- - Reid Spencer
- LLVM Compiler Infrastructure
- Last modified: $Date$ -

- - - diff --git a/docs/index.html b/docs/index.html index f3dcb18500e..28a56eb1801 100644 --- a/docs/index.html +++ b/docs/index.html @@ -195,10 +195,6 @@ generator. on how to write a new alias analysis implementation or how to use existing analyses. -

The Stacker Chronicles - This document -describes both the Stacker language and LLVM frontend, but also some details -about LLVM useful for those writing front-ends.

Accurate Garbage Collection with LLVM - The interfaces source-language compilers should use for compiling GC'd programs.