1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
6 <title>Kaleidoscope: Implementing code generation to LLVM IR</title>
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <meta name="author" content="Chris Lattner">
9 <link rel="stylesheet" href="../llvm.css" type="text/css">
14 <div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div>
17 <li><a href="index.html">Up to Tutorial Index</a></li>
20 <li><a href="#intro">Chapter 3 Introduction</a></li>
21 <li><a href="#basics">Code Generation Setup</a></li>
22 <li><a href="#exprs">Expression Code Generation</a></li>
23 <li><a href="#funcs">Function Code Generation</a></li>
24 <li><a href="#driver">Driver Changes and Closing Thoughts</a></li>
25 <li><a href="#code">Full Code Listing</a></li>
28 <li><a href="LangImpl4.html">Chapter 4</a>: Adding JIT and Optimizer
32 <div class="doc_author">
33 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
36 <!-- *********************************************************************** -->
37 <div class="doc_section"><a name="intro">Chapter 3 Introduction</a></div>
38 <!-- *********************************************************************** -->
40 <div class="doc_text">
42 <p>Welcome to Chapter 3 of the "<a href="index.html">Implementing a language
43 with LLVM</a>" tutorial. This chapter shows you how to transform the <a
44 href="LangImpl2.html">Abstract Syntax Tree</a>, built in Chapter 2, into LLVM IR.
45 This will teach you a little bit about how LLVM does things, as well as
46 demonstrate how easy it is to use. It's much more work to build a lexer and
47 parser than it is to generate LLVM IR code. :)
50 <p><b>Please note</b>: the code in this chapter and later require LLVM 2.2 or
51 LLVM SVN to work. LLVM 2.1 and before will not work with it.</p>
55 <!-- *********************************************************************** -->
56 <div class="doc_section"><a name="basics">Code Generation Setup</a></div>
57 <!-- *********************************************************************** -->
59 <div class="doc_text">
62 In order to generate LLVM IR, we want some simple setup to get started. First
63 we define virtual code generation (codegen) methods in each AST class:</p>
65 <div class="doc_code">
67 /// ExprAST - Base class for all expression nodes.
71 <b>virtual Value *Codegen() = 0;</b>
74 /// NumberExprAST - Expression class for numeric literals like "1.0".
75 class NumberExprAST : public ExprAST {
78 explicit NumberExprAST(double val) : Val(val) {}
79 <b>virtual Value *Codegen();</b>
85 <p>The Codegen() method says to emit IR for that AST node along with all the things it
86 depends on, and they all return an LLVM Value object.
87 "Value" is the class used to represent a "<a
88 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
89 Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect
90 of SSA values is that their value is computed as the related instruction
91 executes, and it does not get a new value until (and if) the instruction
92 re-executes. In other words, there is no way to "change" an SSA value. For
93 more information, please read up on <a
94 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
95 Assignment</a> - the concepts are really quite natural once you grok them.</p>
97 <p>Note that instead of adding virtual methods to the ExprAST class hierarchy,
98 it could also make sense to use a <a
99 href="http://en.wikipedia.org/wiki/Visitor_pattern">visitor pattern</a> or some
100 other way to model this. Again, this tutorial won't dwell on good software
101 engineering practices: for our purposes, adding a virtual method is
105 second thing we want is an "Error" method like we used for the parser, which will
106 be used to report errors found during code generation (for example, use of an
107 undeclared parameter):</p>
109 <div class="doc_code">
111 Value *ErrorV(const char *Str) { Error(Str); return 0; }
113 static Module *TheModule;
114 static LLVMBuilder Builder;
115 static std::map<std::string, Value*> NamedValues;
119 <p>The static variables will be used during code generation. <tt>TheModule</tt>
120 is the LLVM construct that contains all of the functions and global variables in
121 a chunk of code. In many ways, it is the top-level structure that the LLVM IR
122 uses to contain code.</p>
124 <p>The <tt>Builder</tt> object is a helper object that makes it easy to generate
125 LLVM instructions. Instances of the <a
126 href="http://llvm.org/doxygen/LLVMBuilder_8h-source.html"><tt>LLVMBuilder</tt></a>
127 class keep track of the current place to insert instructions and has methods to
128 create new instructions.</p>
130 <p>The <tt>NamedValues</tt> map keeps track of which values are defined in the
131 current scope and what their LLVM representation is. (In other words, it is a
132 symbol table for the code). In this form of Kaleidoscope, the only things that
133 can be referenced are function parameters. As such, function parameters will
134 be in this map when generating code for their function body.</p>
137 With these basics in place, we can start talking about how to generate code for
138 each expression. Note that this assumes that the <tt>Builder</tt> has been set
139 up to generate code <em>into</em> something. For now, we'll assume that this
140 has already been done, and we'll just use it to emit code.
145 <!-- *********************************************************************** -->
146 <div class="doc_section"><a name="exprs">Expression Code Generation</a></div>
147 <!-- *********************************************************************** -->
149 <div class="doc_text">
151 <p>Generating LLVM code for expression nodes is very straightforward: less
152 than 45 lines of commented code for all four of our expression nodes. First
153 we'll do numeric literals:</p>
155 <div class="doc_code">
157 Value *NumberExprAST::Codegen() {
158 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
163 <p>In the LLVM IR, numeric constants are represented with the
164 <tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt>
165 internally (<tt>APFloat</tt> has the capability of holding floating point
166 constants of <em>A</em>rbitrary <em>P</em>recision). This code basically just
167 creates and returns a <tt>ConstantFP</tt>. Note that in the LLVM IR
168 that constants are all uniqued together and shared. For this reason, the API
169 uses "the foo::get(..)" idiom instead of "new foo(..)" or "foo::create(..)".</p>
171 <div class="doc_code">
173 Value *VariableExprAST::Codegen() {
174 // Look this variable up in the function.
175 Value *V = NamedValues[Name];
176 return V ? V : ErrorV("Unknown variable name");
181 <p>References to variables are also quite simple using LLVM. In the simple version
182 of Kaleidoscope, we assume that the variable has already been emited somewhere
183 and its value is available. In practice, the only values that can be in the
184 <tt>NamedValues</tt> map are function arguments. This
185 code simply checks to see that the specified name is in the map (if not, an
186 unknown variable is being referenced) and returns the value for it. In future
187 chapters, we'll add support for <a href="LangImpl5.html#for">loop induction
188 variables</a> in the symbol table, and for <a
189 href="LangImpl7.html#localvars">local variables</a>.</p>
191 <div class="doc_code">
193 Value *BinaryExprAST::Codegen() {
194 Value *L = LHS->Codegen();
195 Value *R = RHS->Codegen();
196 if (L == 0 || R == 0) return 0;
199 case '+': return Builder.CreateAdd(L, R, "addtmp");
200 case '-': return Builder.CreateSub(L, R, "subtmp");
201 case '*': return Builder.CreateMul(L, R, "multmp");
203 L = Builder.CreateFCmpULT(L, R, "cmptmp");
204 // Convert bool 0/1 to double 0.0 or 1.0
205 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
206 default: return ErrorV("invalid binary operator");
212 <p>Binary operators start to get more interesting. The basic idea here is that
213 we recursively emit code for the left-hand side of the expression, then the
214 right-hand side, then we compute the result of the binary expression. In this
215 code, we do a simple switch on the opcode to create the right LLVM instruction.
218 <p>In the example above, the LLVM builder class is starting to show its value.
219 LLVMBuilder knows where to insert the newly created instruction, all you have to
220 do is specify what instruction to create (e.g. with <tt>CreateAdd</tt>), which
221 operands to use (<tt>L</tt> and <tt>R</tt> here) and optionally provide a name
222 for the generated instruction.</p>
224 <p>One nice thing about LLVM is that the name is just a hint. For instance, if
225 the code above emits multiple "addtmp" variables, LLVM will automatically
226 provide each one with an increasing, unique numeric suffix. Local value names
227 for instructions are purely optional, but it makes it much easier to read the
230 <p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained by
231 strict rules: for example, the Left and Right operators of
232 an <a href="../LangRef.html#i_add">add instruction</a> must have the same
233 type, and the result type of the add must match the operand types. Because
234 all values in Kaleidoscope are doubles, this makes for very simple code for add,
237 <p>On the other hand, LLVM specifies that the <a
238 href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value
239 (a one bit integer). The problem with this is that Kaleidoscope wants the value to be a 0.0 or 1.0 value. In order to get these semantics, we combine the fcmp instruction with
240 a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>. This instruction
241 converts its input integer into a floating point value by treating the input
242 as an unsigned value. In contrast, if we used the <a
243 href="../LangRef.html#i_sitofp">sitofp instruction</a>, the Kaleidoscope '<'
244 operator would return 0.0 and -1.0, depending on the input value.</p>
246 <div class="doc_code">
248 Value *CallExprAST::Codegen() {
249 // Look up the name in the global module table.
250 Function *CalleeF = TheModule->getFunction(Callee);
252 return ErrorV("Unknown function referenced");
254 // If argument mismatch error.
255 if (CalleeF->arg_size() != Args.size())
256 return ErrorV("Incorrect # arguments passed");
258 std::vector<Value*> ArgsV;
259 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
260 ArgsV.push_back(Args[i]->Codegen());
261 if (ArgsV.back() == 0) return 0;
264 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
269 <p>Code generation for function calls is quite straightforward with LLVM. The
270 code above initially does a function name lookup in the LLVM Module's symbol
271 table. Recall that the LLVM Module is the container that holds all of the
272 functions we are JIT'ing. By giving each function the same name as what the
273 user specifies, we can use the LLVM symbol table to resolve function names for
276 <p>Once we have the function to call, we recursively codegen each argument that
277 is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call
278 instruction</a>. Note that LLVM uses the native C calling conventions by
279 default, allowing these calls to also call into standard library functions like
280 "sin" and "cos", with no additional effort.</p>
282 <p>This wraps up our handling of the four basic expressions that we have so far
283 in Kaleidoscope. Feel free to go in and add some more. For example, by
284 browsing the <a href="../LangRef.html">LLVM language reference</a> you'll find
285 several other interesting instructions that are really easy to plug into our
290 <!-- *********************************************************************** -->
291 <div class="doc_section"><a name="funcs">Function Code Generation</a></div>
292 <!-- *********************************************************************** -->
294 <div class="doc_text">
296 <p>Code generation for prototypes and functions must handle a number of
297 details, which make their code less beautiful than expression code
298 generation, but allows us to illustrate some important points. First, lets
299 talk about code generation for prototypes: they are used both for function
300 bodies and external function declarations. The code starts with:</p>
302 <div class="doc_code">
304 Function *PrototypeAST::Codegen() {
305 // Make the function type: double(double,double) etc.
306 std::vector<const Type*> Doubles(Args.size(), Type::DoubleTy);
307 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
309 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
313 <p>This code packs a lot of power into a few lines. Note first that this
314 function returns a "Function*" instead of a "Value*". Because a "prototype"
315 really talks about the external interface for a function (not the value computed
316 by an expression), it makes sense for it to return the LLVM Function it
317 corresponds to when codegen'd.</p>
319 <p>The call to <tt>FunctionType::get</tt> creates
320 the <tt>FunctionType</tt> that should be used for a given Prototype. Since all
321 function arguments in Kaleidoscope are of type double, the first line creates
322 a vector of "N" LLVM double types. It then uses the <tt>FunctionType::get</tt>
323 method to create a function type that takes "N" doubles as arguments, returns
324 one double as a result, and that is not vararg (the false parameter indicates
325 this). Note that Types in LLVM are uniqued just like Constants are, so you
326 don't "new" a type, you "get" it.</p>
328 <p>The final line above actually creates the function that the prototype will
329 correspond to. This indicates the type, linkage and name to use, as well as which
330 module to insert into. "<a href="LangRef.html#linkage">external linkage</a>"
331 means that the function may be defined outside the current module and/or that it
332 is callable by functions outside the module. The Name passed in is the name the
333 user specified: since "<tt>TheModule</tt>" is specified, this name is registered
334 in "<tt>TheModule</tt>"s symbol table, which is used by the function call code
337 <div class="doc_code">
339 // If F conflicted, there was already something named 'Name'. If it has a
340 // body, don't allow redefinition or reextern.
341 if (F->getName() != Name) {
342 // Delete the one we just made and get the existing one.
343 F->eraseFromParent();
344 F = TheModule->getFunction(Name);
348 <p>The Module symbol table works just like the Function symbol table when it
349 comes to name conflicts: if a new function is created with a name was previously
350 added to the symbol table, it will get implicitly renamed when added to the
351 Module. The code above exploits this fact to determine if there was a previous
352 definition of this function.</p>
354 <p>In Kaleidoscope, I choose to allow redefinitions of functions in two cases:
355 first, we want to allow 'extern'ing a function more than once, as long as the
356 prototypes for the externs match (since all arguments have the same type, we
357 just have to check that the number of arguments match). Second, we want to
358 allow 'extern'ing a function and then definining a body for it. This is useful
359 when defining mutually recursive functions.</p>
361 <p>In order to implement this, the code above first checks to see if there is
362 a collision on the name of the function. If so, it deletes the function we just
363 created (by calling <tt>eraseFromParent</tt>) and then calling
364 <tt>getFunction</tt> to get the existing function with the specified name. Note
365 that many APIs in LLVM have "erase" forms and "remove" forms. The "remove" form
366 unlinks the object from its parent (e.g. a Function from a Module) and returns
367 it. The "erase" form unlinks the object and then deletes it.</p>
369 <div class="doc_code">
371 // If F already has a body, reject this.
372 if (!F->empty()) {
373 ErrorF("redefinition of function");
377 // If F took a different number of args, reject.
378 if (F->arg_size() != Args.size()) {
379 ErrorF("redefinition of function with different # args");
386 <p>In order to verify the logic above, we first check to see if the pre-existing
387 function is "empty". In this case, empty means that it has no basic blocks in
388 it, which means it has no body. If it has no body, it is a forward
389 declaration. Since we don't allow anything after a full definition of the
390 function, the code rejects this case. If the previous reference to a function
391 was an 'extern', we simply verify that the number of arguments for that
392 definition and this one match up. If not, we emit an error.</p>
394 <div class="doc_code">
396 // Set names for all arguments.
398 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
400 AI->setName(Args[Idx]);
402 // Add arguments to variable symbol table.
403 NamedValues[Args[Idx]] = AI;
410 <p>The last bit of code for prototypes loops over all of the arguments in the
411 function, setting the name of the LLVM Argument objects to match, and registering
412 the arguments in the <tt>NamedValues</tt> map for future use by the
413 <tt>VariableExprAST</tt> AST node. Once this is set up, it returns the Function
414 object to the caller. Note that we don't check for conflicting
415 argument names here (e.g. "extern foo(a b a)"). Doing so would be very
416 straight-forward with the mechanics we have already used above.</p>
418 <div class="doc_code">
420 Function *FunctionAST::Codegen() {
423 Function *TheFunction = Proto->Codegen();
424 if (TheFunction == 0)
429 <p>Code generation for function definitions starts out simply enough: we just
430 codegen the prototype (Proto) and verify that it is ok. We then clear out the
431 <tt>NamedValues</tt> map to make sure that there isn't anything in it from the
432 last function we compiled. Code generation of the prototype ensures that there
433 is an LLVM Function object that is ready to go for us.</p>
435 <div class="doc_code">
437 // Create a new basic block to start insertion into.
438 BasicBlock *BB = new BasicBlock("entry", TheFunction);
439 Builder.SetInsertPoint(BB);
441 if (Value *RetVal = Body->Codegen()) {
445 <p>Now we get to the point where the <tt>Builder</tt> is set up. The first
446 line creates a new <a href="http://en.wikipedia.org/wiki/Basic_block">basic
447 block</a> (named "entry"), which is inserted into <tt>TheFunction</tt>. The
448 second line then tells the builder that new instructions should be inserted into
449 the end of the new basic block. Basic blocks in LLVM are an important part
450 of functions that define the <a
451 href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a>.
452 Since we don't have any control flow, our functions will only contain one
453 block at this point. We'll fix this in <a href="LangImpl5.html">Chapter 5</a> :).</p>
455 <div class="doc_code">
457 if (Value *RetVal = Body->Codegen()) {
458 // Finish off the function.
459 Builder.CreateRet(RetVal);
461 // Validate the generated code, checking for consistency.
462 verifyFunction(*TheFunction);
468 <p>Once the insertion point is set up, we call the <tt>CodeGen()</tt> method for
469 the root expression of the function. If no error happens, this emits code to
470 compute the expression into the entry block and returns the value that was
471 computed. Assuming no error, we then create an LLVM <a
472 href="../LangRef.html#i_ret">ret instruction</a>, which completes the function.
473 Once the function is built, we call <tt>verifyFunction</tt>, which
474 is provided by LLVM. This function does a variety of consistency checks on the
475 generated code, to determine if our compiler is doing everything right. Using
476 this is important: it can catch a lot of bugs. Once the function is finished
477 and validated, we return it.</p>
479 <div class="doc_code">
481 // Error reading body, remove function.
482 TheFunction->eraseFromParent();
488 <p>The only piece left here is handling of the error case. For simplicity, we
489 handle this by merely deleting the function we produced with the
490 <tt>eraseFromParent</tt> method. This allows the user to redefine a function
491 that they incorrectly typed in before: if we didn't delete it, it would live in
492 the symbol table, with a body, preventing future redefinition.</p>
494 <p>This code does have a bug, though. Since the <tt>PrototypeAST::Codegen</tt>
495 can return a previously defined forward declaration, our code can actually delete
496 a forward declaration. There are a number of ways to fix this bug, see what you
497 can come up with! Here is a testcase:</p>
499 <div class="doc_code">
501 extern foo(a b); # ok, defines foo.
502 def foo(a b) c; # error, 'c' is invalid.
503 def bar() foo(1, 2); # error, unknown function "foo"
509 <!-- *********************************************************************** -->
510 <div class="doc_section"><a name="driver">Driver Changes and
511 Closing Thoughts</a></div>
512 <!-- *********************************************************************** -->
514 <div class="doc_text">
517 For now, code generation to LLVM doesn't really get us much, except that we can
518 look at the pretty IR calls. The sample code inserts calls to Codegen into the
519 "<tt>HandleDefinition</tt>", "<tt>HandleExtern</tt>" etc functions, and then
520 dumps out the LLVM IR. This gives a nice way to look at the LLVM IR for simple
521 functions. For example:
524 <div class="doc_code">
527 Read top-level expression:
528 define double @""() {
530 %addtmp = add double 4.000000e+00, 5.000000e+00
536 <p>Note how the parser turns the top-level expression into anonymous functions
537 for us. This will be handy when we add <a href="LangImpl4.html#jit">JIT
538 support</a> in the next chapter. Also note that the code is very literally
539 transcribed, no optimizations are being performed. We will
540 <a href="LangImpl4.html#trivialconstfold">add optimizations</a> explicitly in
541 the next chapter.</p>
543 <div class="doc_code">
545 ready> <b>def foo(a b) a*a + 2*a*b + b*b;</b>
546 Read function definition:
547 define double @foo(double %a, double %b) {
549 %multmp = mul double %a, %a
550 %multmp1 = mul double 2.000000e+00, %a
551 %multmp2 = mul double %multmp1, %b
552 %addtmp = add double %multmp, %multmp2
553 %multmp3 = mul double %b, %b
554 %addtmp4 = add double %addtmp, %multmp3
560 <p>This shows some simple arithmetic. Notice the striking similarity to the
561 LLVM builder calls that we use to create the instructions.</p>
563 <div class="doc_code">
565 ready> <b>def bar(a) foo(a, 4.0) + bar(31337);</b>
566 Read function definition:
567 define double @bar(double %a) {
569 %calltmp = call double @foo( double %a, double 4.000000e+00 )
570 %calltmp1 = call double @bar( double 3.133700e+04 )
571 %addtmp = add double %calltmp, %calltmp1
577 <p>This shows some function calls. Note that this function will take a long
578 time to execute if you call it. In the future we'll add conditional control
579 flow to actually make recursion useful :).</p>
581 <div class="doc_code">
583 ready> <b>extern cos(x);</b>
585 declare double @cos(double)
587 ready> <b>cos(1.234);</b>
588 Read top-level expression:
589 define double @""() {
591 %calltmp = call double @cos( double 1.234000e+00 )
597 <p>This shows an extern for the libm "cos" function, and a call to it.</p>
600 <div class="doc_code">
603 ; ModuleID = 'my cool jit'
605 define double @""() {
607 %addtmp = add double 4.000000e+00, 5.000000e+00
611 define double @foo(double %a, double %b) {
613 %multmp = mul double %a, %a
614 %multmp1 = mul double 2.000000e+00, %a
615 %multmp2 = mul double %multmp1, %b
616 %addtmp = add double %multmp, %multmp2
617 %multmp3 = mul double %b, %b
618 %addtmp4 = add double %addtmp, %multmp3
622 define double @bar(double %a) {
624 %calltmp = call double @foo( double %a, double 4.000000e+00 )
625 %calltmp1 = call double @bar( double 3.133700e+04 )
626 %addtmp = add double %calltmp, %calltmp1
630 declare double @cos(double)
632 define double @""() {
634 %calltmp = call double @cos( double 1.234000e+00 )
640 <p>When you quit the current demo, it dumps out the IR for the entire module
641 generated. Here you can see the big picture with all the functions referencing
644 <p>This wraps up the third chapter of the Kaleidoscope tutorial. Up next, we'll
645 describe how to <a href="LangImpl4.html">add JIT codegen and optimizer
646 support</a> to this so we can actually start running code!</p>
651 <!-- *********************************************************************** -->
652 <div class="doc_section"><a name="code">Full Code Listing</a></div>
653 <!-- *********************************************************************** -->
655 <div class="doc_text">
658 Here is the complete code listing for our running example, enhanced with the
659 LLVM code generator. Because this uses the LLVM libraries, we need to link
660 them in. To do this, we use the <a
661 href="http://llvm.org/cmds/llvm-config.html">llvm-config</a> tool to inform
662 our makefile/command line about which options to use:</p>
664 <div class="doc_code">
667 g++ -g -O3 toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy
673 <p>Here is the code:</p>
675 <div class="doc_code">
678 // See example below.
680 #include "llvm/DerivedTypes.h"
681 #include "llvm/Module.h"
682 #include "llvm/Analysis/Verifier.h"
683 #include "llvm/Support/LLVMBuilder.h"
684 #include <cstdio>
685 #include <string>
687 #include <vector>
688 using namespace llvm;
690 //===----------------------------------------------------------------------===//
692 //===----------------------------------------------------------------------===//
694 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
695 // of these for known things.
700 tok_def = -2, tok_extern = -3,
703 tok_identifier = -4, tok_number = -5,
706 static std::string IdentifierStr; // Filled in if tok_identifier
707 static double NumVal; // Filled in if tok_number
709 /// gettok - Return the next token from standard input.
710 static int gettok() {
711 static int LastChar = ' ';
713 // Skip any whitespace.
714 while (isspace(LastChar))
715 LastChar = getchar();
717 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
718 IdentifierStr = LastChar;
719 while (isalnum((LastChar = getchar())))
720 IdentifierStr += LastChar;
722 if (IdentifierStr == "def") return tok_def;
723 if (IdentifierStr == "extern") return tok_extern;
724 return tok_identifier;
727 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
731 LastChar = getchar();
732 } while (isdigit(LastChar) || LastChar == '.');
734 NumVal = strtod(NumStr.c_str(), 0);
738 if (LastChar == '#') {
739 // Comment until end of line.
740 do LastChar = getchar();
741 while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
747 // Check for end of file. Don't eat the EOF.
751 // Otherwise, just return the character as its ascii value.
752 int ThisChar = LastChar;
753 LastChar = getchar();
757 //===----------------------------------------------------------------------===//
758 // Abstract Syntax Tree (aka Parse Tree)
759 //===----------------------------------------------------------------------===//
761 /// ExprAST - Base class for all expression nodes.
764 virtual ~ExprAST() {}
765 virtual Value *Codegen() = 0;
768 /// NumberExprAST - Expression class for numeric literals like "1.0".
769 class NumberExprAST : public ExprAST {
772 explicit NumberExprAST(double val) : Val(val) {}
773 virtual Value *Codegen();
776 /// VariableExprAST - Expression class for referencing a variable, like "a".
777 class VariableExprAST : public ExprAST {
780 explicit VariableExprAST(const std::string &name) : Name(name) {}
781 virtual Value *Codegen();
784 /// BinaryExprAST - Expression class for a binary operator.
785 class BinaryExprAST : public ExprAST {
789 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
790 : Op(op), LHS(lhs), RHS(rhs) {}
791 virtual Value *Codegen();
794 /// CallExprAST - Expression class for function calls.
795 class CallExprAST : public ExprAST {
797 std::vector<ExprAST*> Args;
799 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
800 : Callee(callee), Args(args) {}
801 virtual Value *Codegen();
804 /// PrototypeAST - This class represents the "prototype" for a function,
805 /// which captures its argument names as well as if it is an operator.
808 std::vector<std::string> Args;
810 PrototypeAST(const std::string &name, const std::vector<std::string> &args)
811 : Name(name), Args(args) {}
816 /// FunctionAST - This class represents a function definition itself.
821 FunctionAST(PrototypeAST *proto, ExprAST *body)
822 : Proto(proto), Body(body) {}
827 //===----------------------------------------------------------------------===//
829 //===----------------------------------------------------------------------===//
831 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
832 /// token the parser it looking at. getNextToken reads another token from the
833 /// lexer and updates CurTok with its results.
835 static int getNextToken() {
836 return CurTok = gettok();
839 /// BinopPrecedence - This holds the precedence for each binary operator that is
841 static std::map<char, int> BinopPrecedence;
843 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
844 static int GetTokPrecedence() {
845 if (!isascii(CurTok))
848 // Make sure it's a declared binop.
849 int TokPrec = BinopPrecedence[CurTok];
850 if (TokPrec <= 0) return -1;
854 /// Error* - These are little helper functions for error handling.
855 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
856 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
857 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
859 static ExprAST *ParseExpression();
863 /// ::= identifier '(' expression* ')'
864 static ExprAST *ParseIdentifierExpr() {
865 std::string IdName = IdentifierStr;
867 getNextToken(); // eat identifier.
869 if (CurTok != '(') // Simple variable ref.
870 return new VariableExprAST(IdName);
873 getNextToken(); // eat (
874 std::vector<ExprAST*> Args;
877 ExprAST *Arg = ParseExpression();
881 if (CurTok == ')') break;
884 return Error("Expected ')'");
892 return new CallExprAST(IdName, Args);
895 /// numberexpr ::= number
896 static ExprAST *ParseNumberExpr() {
897 ExprAST *Result = new NumberExprAST(NumVal);
898 getNextToken(); // consume the number
902 /// parenexpr ::= '(' expression ')'
903 static ExprAST *ParseParenExpr() {
904 getNextToken(); // eat (.
905 ExprAST *V = ParseExpression();
909 return Error("expected ')'");
910 getNextToken(); // eat ).
915 /// ::= identifierexpr
918 static ExprAST *ParsePrimary() {
920 default: return Error("unknown token when expecting an expression");
921 case tok_identifier: return ParseIdentifierExpr();
922 case tok_number: return ParseNumberExpr();
923 case '(': return ParseParenExpr();
928 /// ::= ('+' primary)*
929 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
930 // If this is a binop, find its precedence.
932 int TokPrec = GetTokPrecedence();
934 // If this is a binop that binds at least as tightly as the current binop,
935 // consume it, otherwise we are done.
936 if (TokPrec < ExprPrec)
939 // Okay, we know this is a binop.
941 getNextToken(); // eat binop
943 // Parse the primary expression after the binary operator.
944 ExprAST *RHS = ParsePrimary();
947 // If BinOp binds less tightly with RHS than the operator after RHS, let
948 // the pending operator take RHS as its LHS.
949 int NextPrec = GetTokPrecedence();
950 if (TokPrec < NextPrec) {
951 RHS = ParseBinOpRHS(TokPrec+1, RHS);
952 if (RHS == 0) return 0;
956 LHS = new BinaryExprAST(BinOp, LHS, RHS);
961 /// ::= primary binoprhs
963 static ExprAST *ParseExpression() {
964 ExprAST *LHS = ParsePrimary();
967 return ParseBinOpRHS(0, LHS);
971 /// ::= id '(' id* ')'
972 static PrototypeAST *ParsePrototype() {
973 if (CurTok != tok_identifier)
974 return ErrorP("Expected function name in prototype");
976 std::string FnName = IdentifierStr;
980 return ErrorP("Expected '(' in prototype");
982 std::vector<std::string> ArgNames;
983 while (getNextToken() == tok_identifier)
984 ArgNames.push_back(IdentifierStr);
986 return ErrorP("Expected ')' in prototype");
989 getNextToken(); // eat ')'.
991 return new PrototypeAST(FnName, ArgNames);
994 /// definition ::= 'def' prototype expression
995 static FunctionAST *ParseDefinition() {
996 getNextToken(); // eat def.
997 PrototypeAST *Proto = ParsePrototype();
998 if (Proto == 0) return 0;
1000 if (ExprAST *E = ParseExpression())
1001 return new FunctionAST(Proto, E);
1005 /// toplevelexpr ::= expression
1006 static FunctionAST *ParseTopLevelExpr() {
1007 if (ExprAST *E = ParseExpression()) {
1008 // Make an anonymous proto.
1009 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
1010 return new FunctionAST(Proto, E);
1015 /// external ::= 'extern' prototype
1016 static PrototypeAST *ParseExtern() {
1017 getNextToken(); // eat extern.
1018 return ParsePrototype();
1021 //===----------------------------------------------------------------------===//
1023 //===----------------------------------------------------------------------===//
1025 static Module *TheModule;
1026 static LLVMBuilder Builder;
1027 static std::map<std::string, Value*> NamedValues;
1029 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1031 Value *NumberExprAST::Codegen() {
1032 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1035 Value *VariableExprAST::Codegen() {
1036 // Look this variable up in the function.
1037 Value *V = NamedValues[Name];
1038 return V ? V : ErrorV("Unknown variable name");
1041 Value *BinaryExprAST::Codegen() {
1042 Value *L = LHS->Codegen();
1043 Value *R = RHS->Codegen();
1044 if (L == 0 || R == 0) return 0;
1047 case '+': return Builder.CreateAdd(L, R, "addtmp");
1048 case '-': return Builder.CreateSub(L, R, "subtmp");
1049 case '*': return Builder.CreateMul(L, R, "multmp");
1051 L = Builder.CreateFCmpULT(L, R, "cmptmp");
1052 // Convert bool 0/1 to double 0.0 or 1.0
1053 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1054 default: return ErrorV("invalid binary operator");
1058 Value *CallExprAST::Codegen() {
1059 // Look up the name in the global module table.
1060 Function *CalleeF = TheModule->getFunction(Callee);
1062 return ErrorV("Unknown function referenced");
1064 // If argument mismatch error.
1065 if (CalleeF->arg_size() != Args.size())
1066 return ErrorV("Incorrect # arguments passed");
1068 std::vector<Value*> ArgsV;
1069 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1070 ArgsV.push_back(Args[i]->Codegen());
1071 if (ArgsV.back() == 0) return 0;
1074 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1077 Function *PrototypeAST::Codegen() {
1078 // Make the function type: double(double,double) etc.
1079 std::vector<const Type*> Doubles(Args.size(), Type::DoubleTy);
1080 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1082 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1084 // If F conflicted, there was already something named 'Name'. If it has a
1085 // body, don't allow redefinition or reextern.
1086 if (F->getName() != Name) {
1087 // Delete the one we just made and get the existing one.
1088 F->eraseFromParent();
1089 F = TheModule->getFunction(Name);
1091 // If F already has a body, reject this.
1092 if (!F->empty()) {
1093 ErrorF("redefinition of function");
1097 // If F took a different number of args, reject.
1098 if (F->arg_size() != Args.size()) {
1099 ErrorF("redefinition of function with different # args");
1104 // Set names for all arguments.
1106 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
1108 AI->setName(Args[Idx]);
1110 // Add arguments to variable symbol table.
1111 NamedValues[Args[Idx]] = AI;
1117 Function *FunctionAST::Codegen() {
1118 NamedValues.clear();
1120 Function *TheFunction = Proto->Codegen();
1121 if (TheFunction == 0)
1124 // Create a new basic block to start insertion into.
1125 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1126 Builder.SetInsertPoint(BB);
1128 if (Value *RetVal = Body->Codegen()) {
1129 // Finish off the function.
1130 Builder.CreateRet(RetVal);
1132 // Validate the generated code, checking for consistency.
1133 verifyFunction(*TheFunction);
1137 // Error reading body, remove function.
1138 TheFunction->eraseFromParent();
1142 //===----------------------------------------------------------------------===//
1143 // Top-Level parsing and JIT Driver
1144 //===----------------------------------------------------------------------===//
1146 static void HandleDefinition() {
1147 if (FunctionAST *F = ParseDefinition()) {
1148 if (Function *LF = F->Codegen()) {
1149 fprintf(stderr, "Read function definition:");
1153 // Skip token for error recovery.
1158 static void HandleExtern() {
1159 if (PrototypeAST *P = ParseExtern()) {
1160 if (Function *F = P->Codegen()) {
1161 fprintf(stderr, "Read extern: ");
1165 // Skip token for error recovery.
1170 static void HandleTopLevelExpression() {
1171 // Evaluate a top level expression into an anonymous function.
1172 if (FunctionAST *F = ParseTopLevelExpr()) {
1173 if (Function *LF = F->Codegen()) {
1174 fprintf(stderr, "Read top-level expression:");
1178 // Skip token for error recovery.
1183 /// top ::= definition | external | expression | ';'
1184 static void MainLoop() {
1186 fprintf(stderr, "ready> ");
1188 case tok_eof: return;
1189 case ';': getNextToken(); break; // ignore top level semicolons.
1190 case tok_def: HandleDefinition(); break;
1191 case tok_extern: HandleExtern(); break;
1192 default: HandleTopLevelExpression(); break;
1199 //===----------------------------------------------------------------------===//
1200 // "Library" functions that can be "extern'd" from user code.
1201 //===----------------------------------------------------------------------===//
1203 /// putchard - putchar that takes a double and returns 0.
1205 double putchard(double X) {
1210 //===----------------------------------------------------------------------===//
1211 // Main driver code.
1212 //===----------------------------------------------------------------------===//
1215 TheModule = new Module("my cool jit");
1217 // Install standard binary operators.
1218 // 1 is lowest precedence.
1219 BinopPrecedence['<'] = 10;
1220 BinopPrecedence['+'] = 20;
1221 BinopPrecedence['-'] = 20;
1222 BinopPrecedence['*'] = 40; // highest.
1224 // Prime the first token.
1225 fprintf(stderr, "ready> ");
1229 TheModule->dump();
1234 <a href="LangImpl4.html">Next: Adding JIT and Optimizer Support</a>
1237 <!-- *********************************************************************** -->
1240 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1241 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1242 <a href="http://validator.w3.org/check/referer"><img
1243 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
1245 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1246 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1247 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $