From cde1d9db3f90841de0d0d80f91db2fe1963f572c Mon Sep 17 00:00:00 2001 From: Chris Lattner Date: Tue, 6 Nov 2007 07:16:22 +0000 Subject: [PATCH] chapter 2 edits git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@43760 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/tutorial/LangImpl2.html | 76 +++++++++++++++++++----------------- 1 file changed, 40 insertions(+), 36 deletions(-) diff --git a/docs/tutorial/LangImpl2.html b/docs/tutorial/LangImpl2.html index b8b61128841..df54f5b84f9 100644 --- a/docs/tutorial/LangImpl2.html +++ b/docs/tutorial/LangImpl2.html @@ -45,7 +45,7 @@ with LLVM" tutorial. This chapter shows you how to use the Lexer built in Chapter 1 to build a full parser for -our Kaleidoscope language and build an Abstract Syntax Tree (AST).

@@ -53,7 +53,7 @@ Tree (AST).

href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent Parsing and Operator-Precedence -Parsing to parse the Kaleidoscope language (the later for binary expression +Parsing to parse the Kaleidoscope language (the latter for binary expression and the former for everything else). Before we get to parsing though, lets talk about the output of the parser: the Abstract Syntax Tree.

@@ -144,7 +144,8 @@ themselves:

 /// PrototypeAST - This class represents the "prototype" for a function,
-/// which captures its argument names as well as if it is an operator.
+/// which captures its name, and its argument names (thus implicitly the number
+/// of arguments the function takes).
 class PrototypeAST {
   std::string Name;
   std::vector<std::string> Args;
@@ -165,9 +166,9 @@ public:
 

In Kaleidoscope, functions are typed with just a count of their arguments. -Since all values are double precision floating point, this fact doesn't need to -be captured anywhere. In a more aggressive and realistic language, the -"ExprAST" class would probably have a type field.

+Since all values are double precision floating point, the type of each argument +doesn't need to be stored anywhere. In a more aggressive and realistic +language, the "ExprAST" class would probably have a type field.

With this scaffolding, we can now talk about parsing expressions and function bodies in Kaleidoscope.

@@ -213,10 +214,6 @@ us to look one token ahead at what the lexer is returning. Every function in our parser will assume that CurTok is the current token that needs to be parsed.

-

Again, we define these with global variables; it would be better design to -wrap the entire parser in a class and use instance variables for these. -

-
 
@@ -293,7 +290,7 @@ static ExprAST *ParseParenExpr() {
 

This function illustrates a number of interesting things about the parser: 1) it shows how we use the Error routines. When called, this function expects that the current token is a '(' token, but after parsing the subexpression, it -is possible that there is not a ')' waiting. For example, if the user types in +is possible that there is no ')' waiting. For example, if the user types in "(4 x" instead of "(4)", the parser should emit an error. Because errors can occur, the parser needs a way to indicate that they happened: in our parser, we return null on an error.

@@ -357,10 +354,11 @@ either a VariableExprAST or CallExprAST node as appropriate.

Now that we have all of our simple expression parsing logic in place, we can -define a helper function to wrap them up in a class. We call this class of -expressions "primary" expressions, for reasons that will become more clear -later. In order to parse a primary expression, we need to determine what sort -of expression it is:

+define a helper function to wrap it together into one entry-point. We call this +class of expressions "primary" expressions, for reasons that will become more +clear later in the tutorial. In order to +parse an arbitrary primary expression, we need to determine what sort of +specific expression it is:

@@ -438,12 +436,13 @@ int main() {
 

For the basic form of Kaleidoscope, we will only support 4 binary operators -(this can obviously be extended by you, the reader). The +(this can obviously be extended by you, our brave and intrepid reader). The GetTokPrecedence function returns the precedence for the current token, or -1 if the token is not a binary operator. Having a map makes it easy to add new operators and makes it clear that the algorithm doesn't depend on the specific operators involved, but it would be easy enough to eliminate the map -and do the comparisons in the GetTokPrecedence function.

+and do the comparisons in the GetTokPrecedence function (or just use +a fixed-size array).

With the helper above defined, we can now start parsing binary expressions. The basic idea of operator precedence parsing is to break down an expression @@ -578,8 +577,8 @@ context):

// the pending operator take RHS as its LHS. int NextPrec = GetTokPrecedence(); if (TokPrec < NextPrec) { - RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0; + RHS = ParseBinOpRHS(TokPrec+1, RHS); + if (RHS == 0) return 0; } // Merge LHS/RHS. LHS = new BinaryExprAST(BinOp, LHS, RHS); @@ -600,6 +599,8 @@ of the '+' expression.

Finally, on the next iteration of the while loop, the "+g" piece is parsed. and added to the AST. With this little bit of code (14 non-trivial lines), we correctly handle fully general binary expression parsing in a very elegant way. +This was a whirlwind tour of this code, and it is somewhat subtle. I recommend +running through it with a few tough examples to see how it works.

This wraps up handling of expressions. At this point, we can point the @@ -616,7 +617,7 @@ handle function definitions etc.

-The first basic thing missing is that of function prototypes. In Kaleidoscope, +The next thing missing is handling of function prototypes. In Kaleidoscope, these are used both for 'extern' function declarations as well as function body definitions. The code to do this is straight-forward and not very interesting (once you've survived expressions): @@ -636,6 +637,7 @@ static PrototypeAST *ParsePrototype() { if (CurTok != '(') return ErrorP("Expected '(' in prototype"); + // Read the list of argument names. std::vector<std::string> ArgNames; while (getNextToken() == tok_identifier) ArgNames.push_back(IdentifierStr); @@ -750,25 +752,26 @@ type "4+5;" and the parser will know you are done.

-

With just under 400 lines of commented code, we fully defined our minimal -language, including a lexer, parser and AST builder. With this done, the -executable will validate code and tell us if it is gramatically invalid. For +

With just under 400 lines of commented code (240 lines of non-comment, +non-blank code), we fully defined our minimal language, including a lexer, +parser and AST builder. With this done, the executable will validate +Kaleidoscope code and tell us if it is gramatically invalid. For example, here is a sample interaction:

-$ ./a.out 
-ready> def foo(x y) x+foo(y, 4.0);
-ready> Parsed a function definition.
-ready> def foo(x y) x+y y;
-ready> Parsed a function definition.
-ready> Parsed a top-level expr
-ready> def foo(x y) x+y );
-ready> Parsed a function definition.
-ready> Error: unknown token when expecting an expression
-ready> extern sin(a);
+$ ./a.out
+ready> def foo(x y) x+foo(y, 4.0);
+Parsed a function definition.
+ready> def foo(x y) x+y y;
+Parsed a function definition.
+Parsed a top-level expr
+ready> def foo(x y) x+y );
+Parsed a function definition.
+Error: unknown token when expecting an expression
+ready> extern sin(a);
 ready> Parsed an extern
-ready> ^D
+ready> ^D
 $ 
 
@@ -794,7 +797,7 @@ course). To build this, just compile with:

    # Compile
-   g++ -g toy.cpp 
+   g++ -g -O3 toy.cpp 
    # Run
    ./a.out 
 
@@ -919,7 +922,8 @@ public: }; /// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its argument names as well as if it is an operator. +/// which captures its name, and its argument names (thus implicitly the number +/// of arguments the function takes). class PrototypeAST { std::string Name; std::vector< Args; -- 2.34.1