1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
6 <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
8 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
9 <meta name="author" content="Chris Lattner">
10 <link rel="stylesheet" href="../llvm.css" type="text/css">
15 <div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
17 <div class="doc_author">
18 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
21 <!-- *********************************************************************** -->
22 <div class="doc_section"><a name="intro">Part 7 Introduction</a></div>
23 <!-- *********************************************************************** -->
25 <div class="doc_text">
27 <p>Welcome to Part 7 of the "<a href="index.html">Implementing a language with
28 LLVM</a>" tutorial. In parts 1 through 6, we've built a very respectable,
30 href="http://en.wikipedia.org/wiki/Functional_programming">functional
31 programming language</a>. In our journey, we learned some parsing techniques,
32 how to build and represent an AST, how to build LLVM IR, and how to optimize
33 the resultant code and JIT compile it.</p>
35 <p>While Kaleidoscope is interesting as a functional language, this makes it
36 "too easy" to generate LLVM IR for it. In particular, a functional language
37 makes it very easy to build LLVM IR directly in <a
38 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
39 Since LLVM requires that the input code be in SSA form, this is a very nice
40 property and it is often unclear to newcomers how to generate code for an
41 imperative language with mutable variables.</p>
43 <p>The short (and happy) summary of this chapter is that there is no need for
44 your front-end to build SSA form: LLVM provides highly tuned and well tested
45 support for this, though the way it works is a bit unexpected for some.</p>
49 <!-- *********************************************************************** -->
50 <div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
51 <!-- *********************************************************************** -->
53 <div class="doc_text">
56 To understand why mutable variables cause complexities in SSA construction,
57 consider this extremely simple C example:
60 <div class="doc_code">
63 int test(_Bool Condition) {
74 <p>In this case, we have the variable "X", whose value depends on the path
75 executed in the program. Because there are two different possible values for X
76 before the return instruction, a PHI node is inserted to merge the two values.
77 The LLVM IR that we want for this example looks like this:</p>
79 <div class="doc_code">
81 @G = weak global i32 0 ; type of @G is i32*
82 @H = weak global i32 0 ; type of @H is i32*
84 define i32 @test(i1 %Condition) {
86 br i1 %Condition, label %cond_true, label %cond_false
97 %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
103 <p>In this example, the loads from the G and H global variables are explicit in
104 the LLVM IR, and they live in the then/else branches of the if statement
105 (cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
106 in the cond_next block selects the right value to use based on where control
107 flow is coming from: if control flow comes from the cond_false block, X.2 gets
108 the value of X.1. Alternatively, if control flow comes from cond_tree, it gets
109 the value of X.0. The intent of this chapter is not to explain the details of
110 SSA form. For more information, see one of the many <a
111 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
114 <p>The question for this article is "who places phi nodes when lowering
115 assignments to mutable variables?". The issue here is that LLVM
116 <em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
117 However, SSA construction requires non-trivial algorithms and data structures,
118 so it is inconvenient and wasteful for every front-end to have to reproduce this
123 <!-- *********************************************************************** -->
124 <div class="doc_section"><a name="memory">Memory in LLVM</a></div>
125 <!-- *********************************************************************** -->
127 <div class="doc_text">
129 <p>The 'trick' here is that while LLVM does require all register values to be
130 in SSA form, it does not require (or permit) memory objects to be in SSA form.
131 In the example above, note that the loads from G and H are direct accesses to
132 G and H: they are not renamed or versioned. This differs from some other
133 compiler systems, which do try to version memory objects. In LLVM, instead of
134 encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
135 href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
139 With this in mind, the high-level idea is that we want to make a stack variable
140 (which lives in memory, because it is on the stack) for each mutable object in
141 a function. To take advantage of this trick, we need to talk about how LLVM
142 represents stack variables.
145 <p>In LLVM, all memory accesses are explicit with load/store instructions, and
146 it is carefully designed to not have (or need) an "address-of" operator. Notice
147 how the type of the @G/@H global variables is actually "i32*" even though the
148 variable is defined as "i32". What this means is that @G defines <em>space</em>
149 for an i32 in the global data area, but its <em>name</em> actually refers to the
150 address for that space. Stack variables work the same way, but instead of being
151 declared with global variable definitions, they are declared with the
152 <a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
154 <div class="doc_code">
156 define i32 @test(i1 %Condition) {
158 %X = alloca i32 ; type of %X is i32*.
160 %tmp = load i32* %X ; load the stack value %X from the stack.
161 %tmp2 = add i32 %tmp, 1 ; increment it
162 store i32 %tmp2, i32* %X ; store it back
167 <p>This code shows an example of how you can declare and manipulate a stack
168 variable in the LLVM IR. Stack memory allocated with the alloca instruction is
169 fully general: you can pass the address of the stack slot to functions, you can
170 store it in other variables, etc. In our example above, we could rewrite the
171 example to use the alloca technique to avoid using a PHI node:</p>
173 <div class="doc_code">
175 @G = weak global i32 0 ; type of @G is i32*
176 @H = weak global i32 0 ; type of @H is i32*
178 define i32 @test(i1 %Condition) {
180 %X = alloca i32 ; type of %X is i32*.
181 br i1 %Condition, label %cond_true, label %cond_false
185 store i32 %X.0, i32* %X ; Update X
190 store i32 %X.1, i32* %X ; Update X
194 %X.2 = load i32* %X ; Read X
200 <p>With this, we have discovered a way to handle arbitrary mutable variables
201 without the need to create Phi nodes at all:</p>
204 <li>Each mutable variable becomes a stack allocation.</li>
205 <li>Each read of the variable becomes a load from the stack.</li>
206 <li>Each update of the variable becomes a store to the stack.</li>
207 <li>Taking the address of a variable just uses the stack address directly.</li>
210 <p>While this solution has solved our immediate problem, it introduced another
211 one: we have now apparently introduced a lot of stack traffic for very simple
212 and common operations, a major performance problem. Fortunately for us, the
213 LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
214 this case, promoting allocas like this into SSA registers, inserting Phi nodes
215 as appropriate. If you run this example through the pass, for example, you'll
218 <div class="doc_code">
220 $ <b>llvm-as < example.ll | opt -mem2reg | llvm-dis</b>
221 @G = weak global i32 0
222 @H = weak global i32 0
224 define i32 @test(i1 %Condition) {
226 br i1 %Condition, label %cond_true, label %cond_false
237 %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
243 <p>The mem2reg pass implements the standard "iterated dominator frontier"
244 algorithm for constructing SSA form and has a number of optimizations that speed
245 up very common degenerate cases. mem2reg really is the answer for dealing with
246 mutable variables, and we highly recommend that you depend on it. Note that
247 mem2reg only works on variables in certain circumstances:</p>
250 <li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
251 promotes them. It does not apply to global variables or heap allocations.</li>
253 <li>mem2reg only looks for alloca instructions in the entry block of the
254 function. Being in the entry block guarantees that the alloca is only executed
255 once, which makes analysis simpler.</li>
257 <li>mem2reg only promotes allocas whose uses are direct loads and stores. If
258 the address of the stack object is passed to a function, or if any funny pointer
259 arithmetic is involved, the alloca will not be promoted.</li>
261 <li>mem2reg only works on allocas of <a
262 href="../LangRef.html#t_classifications">first class</a>
263 values (such as pointers, scalars and vectors), and only if the array size
264 of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
265 promoting structs or arrays to registers. Note that the "scalarrepl" pass is
266 more powerful and can promote structs, "unions", and arrays in many cases.</li>
271 All of these properties are easy to satisfy for most imperative languages, and
272 we'll illustrate this below with Kaleidoscope. The final question you may be
273 asking is: should I bother with this nonsense for my front-end? Wouldn't it be
274 better if I just did SSA construction directly, avoiding use of the mem2reg
275 optimization pass? In short, we strongly recommend that use you this technique
276 for building SSA form, unless there is an extremely good reason not to. Using
277 this technique is:</p>
280 <li>Proven and well tested: llvm-gcc and clang both use this technique for local
281 mutable variables. As such, the most common clients of LLVM are using this to
282 handle a bulk of their variables. You can be sure that bugs are found fast and
285 <li>Extremely Fast: mem2reg has a number of special cases that make it fast in
286 common cases as well as fully general. For example, it has fast-paths for
287 variables that are only used in a single block, variables that only have one
288 assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
291 <li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
292 Debug information in LLVM</a> relies on having the address of the variable
293 exposed to attach debug info to it. This technique dovetails very naturally
294 with this style of debug info.</li>
297 <p>If nothing else, this makes it much easier to get your front-end up and
298 running, and is very simple to implement. Lets extend Kaleidoscope with mutable
304 <!-- *********************************************************************** -->
305 <div class="doc_section"><a name="kalvars">Mutable Variables in
306 Kaleidoscope</a></div>
307 <!-- *********************************************************************** -->
309 <div class="doc_text">
311 <p>Now that we know the sort of problem we want to tackle, lets see what this
312 looks like in the context of our little Kaleidoscope language. We're going to
313 add two features:</p>
316 <li>The ability to mutate variables with the '=' operator.</li>
317 <li>The ability to define new variables.</li>
320 <p>While the first item is really what this is about, we only have variables
321 for incoming arguments and for induction variables, and redefining them only
322 goes so far :). Also, the ability to define new variables is a
323 useful thing regardless of whether you will be mutating them. Here's a
324 motivating example that shows how we could use these:</p>
326 <div class="doc_code">
328 # Define ':' for sequencing: as a low-precedence operator that ignores operands
329 # and just returns the RHS.
330 def binary : 1 (x y) y;
332 # Recursive fib, we could do this before.
341 <b>var a = 1, b = 1, c in</b>
342 (for i = 3, i &;t; x in
354 In order to mutate variables, we have to change our existing variables to use
355 the "alloca trick". Once we have that, we'll add our new operator, then extend
356 Kaleidoscope to support new variable definitions.
361 <!-- *********************************************************************** -->
362 <div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
364 <!-- *********************************************************************** -->
366 <div class="doc_text">
369 The symbol table in Kaleidoscope is managed at code generation time by the
370 '<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*"
371 that holds the double value for the named variable. In order to support
372 mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
373 the <em>memory location</em> of the variable in question. Note that this
374 change is a refactoring: it changes the structure of the code, but does not
375 (by itself) change the behavior of the compiler. All of these changes are
376 isolated in the Kaleidoscope code generator.</p>
379 At this point in Kaleidoscope's development, it only supports variables for two
380 things: incoming arguments to functions and the induction variable of 'for'
381 loops. For consistency, we'll allow mutation of these variables in addition to
382 other user-defined variables. This means that these will both need memory
386 <p>To start our transformation of Kaleidoscope, we'll change the NamedValues
387 map to map to AllocaInst* instead of Value*. Once we do this, the C++ compiler
388 will tell use what parts of the code we need to update:</p>
390 <div class="doc_code">
392 static std::map<std::string, AllocaInst*> NamedValues;
396 <p>Also, since we will need to create these alloca's, we'll use a helper
397 function that ensures that the allocas are created in the entry block of the
400 <div class="doc_code">
402 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
403 /// the function. This is used for mutable variables etc.
404 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
405 const std::string &VarName) {
406 LLVMBuilder TmpB(&TheFunction->getEntryBlock(),
407 TheFunction->getEntryBlock().begin());
408 return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
413 <p>This funny looking code creates an LLVMBuilder object that is pointing at
414 the first instruction (.begin()) of the entry block. It then creates an alloca
415 with the expected name and returns it. Because all values in Kaleidoscope are
416 doubles, there is no need to pass in a type to use.</p>
418 <p>With this in place, the first functionality change we want to make is to
419 variable references. In our new scheme, variables live on the stack, so code
420 generating a reference to them actually needs to produce a load from the stack
423 <div class="doc_code">
425 Value *VariableExprAST::Codegen() {
426 // Look this variable up in the function.
427 Value *V = NamedValues[Name];
428 if (V == 0) return ErrorV("Unknown variable name");
431 return Builder.CreateLoad(V, Name.c_str());
436 <p>As you can see, this is pretty straight-forward. Next we need to update the
437 things that define the variables to set up the alloca. We'll start with
438 <tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
439 the unabridged code):</p>
441 <div class="doc_code">
443 Function *TheFunction = Builder.GetInsertBlock()->getParent();
445 <b>// Create an alloca for the variable in the entry block.
446 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
448 // Emit the start code first, without 'variable' in scope.
449 Value *StartVal = Start->Codegen();
450 if (StartVal == 0) return 0;
452 <b>// Store the value into the alloca.
453 Builder.CreateStore(StartVal, Alloca);</b>
456 // Compute the end condition.
457 Value *EndCond = End->Codegen();
458 if (EndCond == 0) return EndCond;
460 <b>// Reload, increment, and restore the alloca. This handles the case where
461 // the body of the loop mutates the variable.
462 Value *CurVar = Builder.CreateLoad(Alloca);
463 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
464 Builder.CreateStore(NextVar, Alloca);</b>
469 <p>This code is virtually identical to the code <a
470 href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The
471 big difference is that we no longer have to construct a PHI node, and we use
472 load/store to access the variable as needed.</p>
474 <p>To support mutable argument variables, we need to also make allocas for them.
475 The code for this is also pretty simple:</p>
477 <div class="doc_code">
479 /// CreateArgumentAllocas - Create an alloca for each argument and register the
480 /// argument in the symbol table so that references to it will succeed.
481 void PrototypeAST::CreateArgumentAllocas(Function *F) {
482 Function::arg_iterator AI = F->arg_begin();
483 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
484 // Create an alloca for this variable.
485 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
487 // Store the initial value into the alloca.
488 Builder.CreateStore(AI, Alloca);
490 // Add arguments to variable symbol table.
491 NamedValues[Args[Idx]] = Alloca;
497 <p>For each argument, we make an alloca, store the input value to the function
498 into the alloca, and register the alloca as the memory location for the
499 argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
500 it sets up the entry block for the function.</p>
502 <p>The final missing piece is adding the 'mem2reg' pass, which allows us to get
503 good codegen once again:</p>
505 <div class="doc_code">
507 // Set up the optimizer pipeline. Start with registering info about how the
508 // target lays out data structures.
509 OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData()));
510 <b>// Promote allocas to registers.
511 OurFPM.add(createPromoteMemoryToRegisterPass());</b>
512 // Do simple "peephole" optimizations and bit-twiddling optzns.
513 OurFPM.add(createInstructionCombiningPass());
514 // Reassociate expressions.
515 OurFPM.add(createReassociatePass());
519 <p>It is interesting to see what the code looks like before and after the
520 mem2reg optimization runs. For example, this is the before/after code for our
521 recursive fib. Before the optimization:</p>
523 <div class="doc_code">
525 define double @fib(double %x) {
527 <b>%x1 = alloca double
528 store double %x, double* %x1
529 %x2 = load double* %x1</b>
530 %multmp = fcmp ult double %x2, 3.000000e+00
531 %booltmp = uitofp i1 %multmp to double
532 %ifcond = fcmp one double %booltmp, 0.000000e+00
533 br i1 %ifcond, label %then, label %else
535 then: ; preds = %entry
538 else: ; preds = %entry
539 <b>%x3 = load double* %x1</b>
540 %subtmp = sub double %x3, 1.000000e+00
541 %calltmp = call double @fib( double %subtmp )
542 <b>%x4 = load double* %x1</b>
543 %subtmp5 = sub double %x4, 2.000000e+00
544 %calltmp6 = call double @fib( double %subtmp5 )
545 %addtmp = add double %calltmp, %calltmp6
548 ifcont: ; preds = %else, %then
549 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
555 <p>Here there is only one variable (x, the input argument) but you can still
556 see the extremely simple-minded code generation strategy we are using. In the
557 entry block, an alloca is created, and the initial input value is stored into
558 it. Each reference to the variable does a reload from the stack. Also, note
559 that we didn't modify the if/then/else expression, so it still inserts a PHI
560 node. While we could make an alloca for it, it is actually easier to create a
561 PHI node for it, so we still just make the PHI.</p>
563 <p>Here is the code after the mem2reg pass runs:</p>
565 <div class="doc_code">
567 define double @fib(double %x) {
569 %multmp = fcmp ult double <b>%x</b>, 3.000000e+00
570 %booltmp = uitofp i1 %multmp to double
571 %ifcond = fcmp one double %booltmp, 0.000000e+00
572 br i1 %ifcond, label %then, label %else
578 %subtmp = sub double <b>%x</b>, 1.000000e+00
579 %calltmp = call double @fib( double %subtmp )
580 %subtmp5 = sub double <b>%x</b>, 2.000000e+00
581 %calltmp6 = call double @fib( double %subtmp5 )
582 %addtmp = add double %calltmp, %calltmp6
585 ifcont: ; preds = %else, %then
586 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
592 <p>This is a trivial case for mem2reg, since there are no redefinitions of the
593 variable. The point of showing this is to calm your tension about inserting
594 such blatent inefficiencies :).</p>
596 <p>After the rest of the optimizers run, we get:</p>
598 <div class="doc_code">
600 define double @fib(double %x) {
602 %multmp = fcmp ult double %x, 3.000000e+00
603 %booltmp = uitofp i1 %multmp to double
604 %ifcond = fcmp ueq double %booltmp, 0.000000e+00
605 br i1 %ifcond, label %else, label %ifcont
608 %subtmp = sub double %x, 1.000000e+00
609 %calltmp = call double @fib( double %subtmp )
610 %subtmp5 = sub double %x, 2.000000e+00
611 %calltmp6 = call double @fib( double %subtmp5 )
612 %addtmp = add double %calltmp, %calltmp6
616 ret double 1.000000e+00
621 <p>Here we see that the simplifycfg pass decided to clone the return instruction
622 into the end of the 'else' block. This allowed it to eliminate some branches
623 and the PHI node.</p>
625 <p>Now that all symbol table references are updated to use stack variables,
626 we'll add the assignment operator.</p>
630 <!-- *********************************************************************** -->
631 <div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
632 <!-- *********************************************************************** -->
634 <div class="doc_text">
636 <p>With our current framework, adding a new assignment operator is really
637 simple. We will parse it just like any other binary operator, but handle it
638 internally (instead of allowing the user to define it). The first step is to
639 set a precedence:</p>
641 <div class="doc_code">
644 // Install standard binary operators.
645 // 1 is lowest precedence.
646 <b>BinopPrecedence['='] = 2;</b>
647 BinopPrecedence['<'] = 10;
648 BinopPrecedence['+'] = 20;
649 BinopPrecedence['-'] = 20;
653 <p>Now that the parser knows the precedence of the binary operator, it takes
654 care of all the parsing and AST generation. We just need to implement codegen
655 for the assignment operator. This looks like:</p>
657 <div class="doc_code">
659 Value *BinaryExprAST::Codegen() {
660 // Special case '=' because we don't want to emit the LHS as an expression.
662 // Assignment requires the LHS to be an identifier.
663 VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS);
665 return ErrorV("destination of '=' must be a variable");
669 <p>Unlike the rest of the binary operators, our assignment operator doesn't
670 follow the "emit LHS, emit RHS, do computation" model. As such, it is handled
671 as a special case before the other binary operators are handled. The other
672 strange thing about it is that it requires the LHS to be a variable directly.
675 <div class="doc_code">
678 Value *Val = RHS->Codegen();
679 if (Val == 0) return 0;
682 Value *Variable = NamedValues[LHSE->getName()];
683 if (Variable == 0) return ErrorV("Unknown variable name");
685 Builder.CreateStore(Val, Variable);
692 <p>Once it has the variable, codegen'ing the assignment is straight-forward:
693 we emit the RHS of the assignment, create a store, and return the computed
694 value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>
696 <p>Now that we have an assignment operator, we can mutate loop variables and
697 arguments. For example, we can now run code like this:</p>
699 <div class="doc_code">
701 # Function to print a double.
704 # Define ':' for sequencing: as a low-precedence operator that ignores operands
705 # and just returns the RHS.
706 def binary : 1 (x y) y;
717 <p>When run, this example prints "123" and then "4", showing that we did
718 actually mutate the value! Okay, we have now officially implemented our goal:
719 getting this to work requires SSA construction in the general case. However,
720 to be really useful, we want the ability to define our own local variables, lets
726 <!-- *********************************************************************** -->
727 <div class="doc_section"><a name="localvars">User-defined Local
729 <!-- *********************************************************************** -->
731 <div class="doc_text">
733 <p>Adding var/in is just like any other other extensions we made to
734 Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
735 The first step for adding our new 'var/in' construct is to extend the lexer.
736 As before, this is pretty trivial, the code looks like this:</p>
738 <div class="doc_code">
747 static int gettok() {
749 if (IdentifierStr == "in") return tok_in;
750 if (IdentifierStr == "binary") return tok_binary;
751 if (IdentifierStr == "unary") return tok_unary;
752 <b>if (IdentifierStr == "var") return tok_var;</b>
753 return tok_identifier;
758 <p>The next step is to define the AST node that we will construct. For var/in,
759 it will look like this:</p>
761 <div class="doc_code">
763 /// VarExprAST - Expression class for var/in
764 class VarExprAST : public ExprAST {
765 std::vector<std::pair<std::string, ExprAST*> > VarNames;
768 VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
770 : VarNames(varnames), Body(body) {}
772 virtual Value *Codegen();
777 <p>var/in allows a list of names to be defined all at once, and each name can
778 optionally have an initializer value. As such, we capture this information in
779 the VarNames vector. Also, var/in has a body, this body is allowed to access
780 the variables defined by the let/in.</p>
782 <p>With this ready, we can define the parser pieces. First thing we do is add
783 it as a primary expression:</p>
785 <div class="doc_code">
788 /// ::= identifierexpr
793 <b>/// ::= varexpr</b>
794 static ExprAST *ParsePrimary() {
796 default: return Error("unknown token when expecting an expression");
797 case tok_identifier: return ParseIdentifierExpr();
798 case tok_number: return ParseNumberExpr();
799 case '(': return ParseParenExpr();
800 case tok_if: return ParseIfExpr();
801 case tok_for: return ParseForExpr();
802 <b>case tok_var: return ParseVarExpr();</b>
808 <p>Next we define ParseVarExpr:</p>
810 <div class="doc_code">
812 /// varexpr ::= 'var' identifer ('=' expression)?
813 // (',' identifer ('=' expression)?)* 'in' expression
814 static ExprAST *ParseVarExpr() {
815 getNextToken(); // eat the var.
817 std::vector<std::pair<std::string, ExprAST*> > VarNames;
819 // At least one variable name is required.
820 if (CurTok != tok_identifier)
821 return Error("expected identifier after var");
825 <p>The first part of this code parses the list of identifier/expr pairs into the
826 local <tt>VarNames</tt> vector.
828 <div class="doc_code">
831 std::string Name = IdentifierStr;
832 getNextToken(); // eat identifer.
834 // Read the optional initializer.
837 getNextToken(); // eat the '='.
839 Init = ParseExpression();
840 if (Init == 0) return 0;
843 VarNames.push_back(std::make_pair(Name, Init));
845 // End of var list, exit loop.
846 if (CurTok != ',') break;
847 getNextToken(); // eat the ','.
849 if (CurTok != tok_identifier)
850 return Error("expected identifier list after var");
855 <p>Once all the variables are parsed, we then parse the body and create the
858 <div class="doc_code">
860 // At this point, we have to have 'in'.
861 if (CurTok != tok_in)
862 return Error("expected 'in' keyword after 'var'");
863 getNextToken(); // eat 'in'.
865 ExprAST *Body = ParseExpression();
866 if (Body == 0) return 0;
868 return new VarExprAST(VarNames, Body);
873 <p>Now that we can parse and represent the code, we need to support emission of
874 LLVM IR for it. This code starts out with:</p>
876 <div class="doc_code">
878 Value *VarExprAST::Codegen() {
879 std::vector<AllocaInst *> OldBindings;
881 Function *TheFunction = Builder.GetInsertBlock()->getParent();
883 // Register all variables and emit their initializer.
884 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
885 const std::string &VarName = VarNames[i].first;
886 ExprAST *Init = VarNames[i].second;
890 <p>Basically it loops over all the variables, installing them one at a time.
891 For each variable we put into the symbol table, we remember the previous value
892 that we replace in OldBindings.</p>
894 <div class="doc_code">
896 // Emit the initializer before adding the variable to scope, this prevents
897 // the initializer from referencing the variable itself, and permits stuff
900 // var a = a in ... # refers to outer 'a'.
903 InitVal = Init->Codegen();
904 if (InitVal == 0) return 0;
905 } else { // If not specified, use 0.0.
906 InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
909 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
910 Builder.CreateStore(InitVal, Alloca);
912 // Remember the old variable binding so that we can restore the binding when
914 OldBindings.push_back(NamedValues[VarName]);
916 // Remember this binding.
917 NamedValues[VarName] = Alloca;
922 <p>There are more comments here than code. The basic idea is that we emit the
923 initializer, create the alloca, then update the symbol table to point to it.
924 Once all the variables are installed in the symbol table, we evaluate the body
925 of the var/in expression:</p>
927 <div class="doc_code">
929 // Codegen the body, now that all vars are in scope.
930 Value *BodyVal = Body->Codegen();
931 if (BodyVal == 0) return 0;
935 <p>Finally, before returning, we restore the previous variable bindings:</p>
937 <div class="doc_code">
939 // Pop all our variables from scope.
940 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
941 NamedValues[VarNames[i].first] = OldBindings[i];
943 // Return the body computation.
949 <p>The end result of all of this is that we get properly scoped variable
950 definitions, and we even (trivially) allow mutation of them :).</p>
952 <p>With this, we completed what we set out to do. Our nice iterative fib
953 example from the intro compiles and runs just fine. The mem2reg pass optimizes
954 all of our stack variables into SSA registers, inserting PHI nodes where needed,
955 and our front-end remains simple: no iterated dominator frontier computation
956 anywhere in sight.</p>
960 <!-- *********************************************************************** -->
961 <div class="doc_section"><a name="code">Full Code Listing</a></div>
962 <!-- *********************************************************************** -->
964 <div class="doc_text">
967 Here is the complete code listing for our running example, enhanced with mutable
968 variables and var/in support. To build this example, use:
971 <div class="doc_code">
974 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
980 <p>Here is the code:</p>
982 <div class="doc_code">
984 #include "llvm/DerivedTypes.h"
985 #include "llvm/ExecutionEngine/ExecutionEngine.h"
986 #include "llvm/Module.h"
987 #include "llvm/ModuleProvider.h"
988 #include "llvm/PassManager.h"
989 #include "llvm/Analysis/Verifier.h"
990 #include "llvm/Target/TargetData.h"
991 #include "llvm/Transforms/Scalar.h"
992 #include "llvm/Support/LLVMBuilder.h"
993 #include <cstdio>
994 #include <string>
996 #include <vector>
997 using namespace llvm;
999 //===----------------------------------------------------------------------===//
1001 //===----------------------------------------------------------------------===//
1003 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
1004 // of these for known things.
1009 tok_def = -2, tok_extern = -3,
1012 tok_identifier = -4, tok_number = -5,
1015 tok_if = -6, tok_then = -7, tok_else = -8,
1016 tok_for = -9, tok_in = -10,
1019 tok_binary = -11, tok_unary = -12,
1025 static std::string IdentifierStr; // Filled in if tok_identifier
1026 static double NumVal; // Filled in if tok_number
1028 /// gettok - Return the next token from standard input.
1029 static int gettok() {
1030 static int LastChar = ' ';
1032 // Skip any whitespace.
1033 while (isspace(LastChar))
1034 LastChar = getchar();
1036 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
1037 IdentifierStr = LastChar;
1038 while (isalnum((LastChar = getchar())))
1039 IdentifierStr += LastChar;
1041 if (IdentifierStr == "def") return tok_def;
1042 if (IdentifierStr == "extern") return tok_extern;
1043 if (IdentifierStr == "if") return tok_if;
1044 if (IdentifierStr == "then") return tok_then;
1045 if (IdentifierStr == "else") return tok_else;
1046 if (IdentifierStr == "for") return tok_for;
1047 if (IdentifierStr == "in") return tok_in;
1048 if (IdentifierStr == "binary") return tok_binary;
1049 if (IdentifierStr == "unary") return tok_unary;
1050 if (IdentifierStr == "var") return tok_var;
1051 return tok_identifier;
1054 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
1058 LastChar = getchar();
1059 } while (isdigit(LastChar) || LastChar == '.');
1061 NumVal = strtod(NumStr.c_str(), 0);
1065 if (LastChar == '#') {
1066 // Comment until end of line.
1067 do LastChar = getchar();
1068 while (LastChar != EOF && LastChar != '\n' & LastChar != '\r');
1070 if (LastChar != EOF)
1074 // Check for end of file. Don't eat the EOF.
1075 if (LastChar == EOF)
1078 // Otherwise, just return the character as its ascii value.
1079 int ThisChar = LastChar;
1080 LastChar = getchar();
1084 //===----------------------------------------------------------------------===//
1085 // Abstract Syntax Tree (aka Parse Tree)
1086 //===----------------------------------------------------------------------===//
1088 /// ExprAST - Base class for all expression nodes.
1091 virtual ~ExprAST() {}
1092 virtual Value *Codegen() = 0;
1095 /// NumberExprAST - Expression class for numeric literals like "1.0".
1096 class NumberExprAST : public ExprAST {
1099 NumberExprAST(double val) : Val(val) {}
1100 virtual Value *Codegen();
1103 /// VariableExprAST - Expression class for referencing a variable, like "a".
1104 class VariableExprAST : public ExprAST {
1107 VariableExprAST(const std::string &name) : Name(name) {}
1108 const std::string &getName() const { return Name; }
1109 virtual Value *Codegen();
1112 /// UnaryExprAST - Expression class for a unary operator.
1113 class UnaryExprAST : public ExprAST {
1117 UnaryExprAST(char opcode, ExprAST *operand)
1118 : Opcode(opcode), Operand(operand) {}
1119 virtual Value *Codegen();
1122 /// BinaryExprAST - Expression class for a binary operator.
1123 class BinaryExprAST : public ExprAST {
1127 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1128 : Op(op), LHS(lhs), RHS(rhs) {}
1129 virtual Value *Codegen();
1132 /// CallExprAST - Expression class for function calls.
1133 class CallExprAST : public ExprAST {
1135 std::vector<ExprAST*> Args;
1137 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
1138 : Callee(callee), Args(args) {}
1139 virtual Value *Codegen();
1142 /// IfExprAST - Expression class for if/then/else.
1143 class IfExprAST : public ExprAST {
1144 ExprAST *Cond, *Then, *Else;
1146 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1147 : Cond(cond), Then(then), Else(_else) {}
1148 virtual Value *Codegen();
1151 /// ForExprAST - Expression class for for/in.
1152 class ForExprAST : public ExprAST {
1153 std::string VarName;
1154 ExprAST *Start, *End, *Step, *Body;
1156 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end,
1157 ExprAST *step, ExprAST *body)
1158 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1159 virtual Value *Codegen();
1162 /// VarExprAST - Expression class for var/in
1163 class VarExprAST : public ExprAST {
1164 std::vector<std::pair<std::string, ExprAST*> > VarNames;
1167 VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
1169 : VarNames(varnames), Body(body) {}
1171 virtual Value *Codegen();
1174 /// PrototypeAST - This class represents the "prototype" for a function,
1175 /// which captures its argument names as well as if it is an operator.
1176 class PrototypeAST {
1178 std::vector<std::string> Args;
1180 unsigned Precedence; // Precedence if a binary op.
1182 PrototypeAST(const std::string &name, const std::vector<std::string> &args,
1183 bool isoperator = false, unsigned prec = 0)
1184 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
1186 bool isUnaryOp() const { return isOperator && Args.size() == 1; }
1187 bool isBinaryOp() const { return isOperator && Args.size() == 2; }
1189 char getOperatorName() const {
1190 assert(isUnaryOp() || isBinaryOp());
1191 return Name[Name.size()-1];
1194 unsigned getBinaryPrecedence() const { return Precedence; }
1196 Function *Codegen();
1198 void CreateArgumentAllocas(Function *F);
1201 /// FunctionAST - This class represents a function definition itself.
1203 PrototypeAST *Proto;
1206 FunctionAST(PrototypeAST *proto, ExprAST *body)
1207 : Proto(proto), Body(body) {}
1209 Function *Codegen();
1212 //===----------------------------------------------------------------------===//
1214 //===----------------------------------------------------------------------===//
1216 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
1217 /// token the parser it looking at. getNextToken reads another token from the
1218 /// lexer and updates CurTok with its results.
1220 static int getNextToken() {
1221 return CurTok = gettok();
1224 /// BinopPrecedence - This holds the precedence for each binary operator that is
1226 static std::map<char, int> BinopPrecedence;
1228 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
1229 static int GetTokPrecedence() {
1230 if (!isascii(CurTok))
1233 // Make sure it's a declared binop.
1234 int TokPrec = BinopPrecedence[CurTok];
1235 if (TokPrec <= 0) return -1;
1239 /// Error* - These are little helper functions for error handling.
1240 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1241 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1242 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1244 static ExprAST *ParseExpression();
1248 /// ::= identifer '(' expression* ')'
1249 static ExprAST *ParseIdentifierExpr() {
1250 std::string IdName = IdentifierStr;
1252 getNextToken(); // eat identifer.
1254 if (CurTok != '(') // Simple variable ref.
1255 return new VariableExprAST(IdName);
1258 getNextToken(); // eat (
1259 std::vector<ExprAST*> Args;
1260 if (CurTok != ')') {
1262 ExprAST *Arg = ParseExpression();
1264 Args.push_back(Arg);
1266 if (CurTok == ')') break;
1269 return Error("Expected ')'");
1277 return new CallExprAST(IdName, Args);
1280 /// numberexpr ::= number
1281 static ExprAST *ParseNumberExpr() {
1282 ExprAST *Result = new NumberExprAST(NumVal);
1283 getNextToken(); // consume the number
1287 /// parenexpr ::= '(' expression ')'
1288 static ExprAST *ParseParenExpr() {
1289 getNextToken(); // eat (.
1290 ExprAST *V = ParseExpression();
1294 return Error("expected ')'");
1295 getNextToken(); // eat ).
1299 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
1300 static ExprAST *ParseIfExpr() {
1301 getNextToken(); // eat the if.
1304 ExprAST *Cond = ParseExpression();
1305 if (!Cond) return 0;
1307 if (CurTok != tok_then)
1308 return Error("expected then");
1309 getNextToken(); // eat the then
1311 ExprAST *Then = ParseExpression();
1312 if (Then == 0) return 0;
1314 if (CurTok != tok_else)
1315 return Error("expected else");
1319 ExprAST *Else = ParseExpression();
1320 if (!Else) return 0;
1322 return new IfExprAST(Cond, Then, Else);
1325 /// forexpr ::= 'for' identifer '=' expr ',' expr (',' expr)? 'in' expression
1326 static ExprAST *ParseForExpr() {
1327 getNextToken(); // eat the for.
1329 if (CurTok != tok_identifier)
1330 return Error("expected identifier after for");
1332 std::string IdName = IdentifierStr;
1333 getNextToken(); // eat identifer.
1336 return Error("expected '=' after for");
1337 getNextToken(); // eat '='.
1340 ExprAST *Start = ParseExpression();
1341 if (Start == 0) return 0;
1343 return Error("expected ',' after for start value");
1346 ExprAST *End = ParseExpression();
1347 if (End == 0) return 0;
1349 // The step value is optional.
1351 if (CurTok == ',') {
1353 Step = ParseExpression();
1354 if (Step == 0) return 0;
1357 if (CurTok != tok_in)
1358 return Error("expected 'in' after for");
1359 getNextToken(); // eat 'in'.
1361 ExprAST *Body = ParseExpression();
1362 if (Body == 0) return 0;
1364 return new ForExprAST(IdName, Start, End, Step, Body);
1367 /// varexpr ::= 'var' identifer ('=' expression)?
1368 // (',' identifer ('=' expression)?)* 'in' expression
1369 static ExprAST *ParseVarExpr() {
1370 getNextToken(); // eat the var.
1372 std::vector<std::pair<std::string, ExprAST*> > VarNames;
1374 // At least one variable name is required.
1375 if (CurTok != tok_identifier)
1376 return Error("expected identifier after var");
1379 std::string Name = IdentifierStr;
1380 getNextToken(); // eat identifer.
1382 // Read the optional initializer.
1384 if (CurTok == '=') {
1385 getNextToken(); // eat the '='.
1387 Init = ParseExpression();
1388 if (Init == 0) return 0;
1391 VarNames.push_back(std::make_pair(Name, Init));
1393 // End of var list, exit loop.
1394 if (CurTok != ',') break;
1395 getNextToken(); // eat the ','.
1397 if (CurTok != tok_identifier)
1398 return Error("expected identifier list after var");
1401 // At this point, we have to have 'in'.
1402 if (CurTok != tok_in)
1403 return Error("expected 'in' keyword after 'var'");
1404 getNextToken(); // eat 'in'.
1406 ExprAST *Body = ParseExpression();
1407 if (Body == 0) return 0;
1409 return new VarExprAST(VarNames, Body);
1414 /// ::= identifierexpr
1420 static ExprAST *ParsePrimary() {
1422 default: return Error("unknown token when expecting an expression");
1423 case tok_identifier: return ParseIdentifierExpr();
1424 case tok_number: return ParseNumberExpr();
1425 case '(': return ParseParenExpr();
1426 case tok_if: return ParseIfExpr();
1427 case tok_for: return ParseForExpr();
1428 case tok_var: return ParseVarExpr();
1435 static ExprAST *ParseUnary() {
1436 // If the current token is not an operator, it must be a primary expr.
1437 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1438 return ParsePrimary();
1440 // If this is a unary operator, read it.
1443 if (ExprAST *Operand = ParseUnary())
1444 return new UnaryExprAST(Opc, Operand);
1449 /// ::= ('+' unary)*
1450 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1451 // If this is a binop, find its precedence.
1453 int TokPrec = GetTokPrecedence();
1455 // If this is a binop that binds at least as tightly as the current binop,
1456 // consume it, otherwise we are done.
1457 if (TokPrec < ExprPrec)
1460 // Okay, we know this is a binop.
1462 getNextToken(); // eat binop
1464 // Parse the unary expression after the binary operator.
1465 ExprAST *RHS = ParseUnary();
1468 // If BinOp binds less tightly with RHS than the operator after RHS, let
1469 // the pending operator take RHS as its LHS.
1470 int NextPrec = GetTokPrecedence();
1471 if (TokPrec < NextPrec) {
1472 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1473 if (RHS == 0) return 0;
1477 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1482 /// ::= unary binoprhs
1484 static ExprAST *ParseExpression() {
1485 ExprAST *LHS = ParseUnary();
1488 return ParseBinOpRHS(0, LHS);
1492 /// ::= id '(' id* ')'
1493 /// ::= binary LETTER number? (id, id)
1494 /// ::= unary LETTER (id)
1495 static PrototypeAST *ParsePrototype() {
1498 int Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
1499 unsigned BinaryPrecedence = 30;
1503 return ErrorP("Expected function name in prototype");
1504 case tok_identifier:
1505 FnName = IdentifierStr;
1511 if (!isascii(CurTok))
1512 return ErrorP("Expected unary operator");
1514 FnName += (char)CurTok;
1520 if (!isascii(CurTok))
1521 return ErrorP("Expected binary operator");
1523 FnName += (char)CurTok;
1527 // Read the precedence if present.
1528 if (CurTok == tok_number) {
1529 if (NumVal < 1 || NumVal > 100)
1530 return ErrorP("Invalid precedecnce: must be 1..100");
1531 BinaryPrecedence = (unsigned)NumVal;
1538 return ErrorP("Expected '(' in prototype");
1540 std::vector<std::string> ArgNames;
1541 while (getNextToken() == tok_identifier)
1542 ArgNames.push_back(IdentifierStr);
1544 return ErrorP("Expected ')' in prototype");
1547 getNextToken(); // eat ')'.
1549 // Verify right number of names for operator.
1550 if (Kind && ArgNames.size() != Kind)
1551 return ErrorP("Invalid number of operands for operator");
1553 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1556 /// definition ::= 'def' prototype expression
1557 static FunctionAST *ParseDefinition() {
1558 getNextToken(); // eat def.
1559 PrototypeAST *Proto = ParsePrototype();
1560 if (Proto == 0) return 0;
1562 if (ExprAST *E = ParseExpression())
1563 return new FunctionAST(Proto, E);
1567 /// toplevelexpr ::= expression
1568 static FunctionAST *ParseTopLevelExpr() {
1569 if (ExprAST *E = ParseExpression()) {
1570 // Make an anonymous proto.
1571 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
1572 return new FunctionAST(Proto, E);
1577 /// external ::= 'extern' prototype
1578 static PrototypeAST *ParseExtern() {
1579 getNextToken(); // eat extern.
1580 return ParsePrototype();
1583 //===----------------------------------------------------------------------===//
1585 //===----------------------------------------------------------------------===//
1587 static Module *TheModule;
1588 static LLVMFoldingBuilder Builder;
1589 static std::map<std::string, AllocaInst*> NamedValues;
1590 static FunctionPassManager *TheFPM;
1592 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1594 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
1595 /// the function. This is used for mutable variables etc.
1596 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
1597 const std::string &VarName) {
1598 LLVMBuilder TmpB(&TheFunction->getEntryBlock(),
1599 TheFunction->getEntryBlock().begin());
1600 return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
1604 Value *NumberExprAST::Codegen() {
1605 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1608 Value *VariableExprAST::Codegen() {
1609 // Look this variable up in the function.
1610 Value *V = NamedValues[Name];
1611 if (V == 0) return ErrorV("Unknown variable name");
1614 return Builder.CreateLoad(V, Name.c_str());
1617 Value *UnaryExprAST::Codegen() {
1618 Value *OperandV = Operand->Codegen();
1619 if (OperandV == 0) return 0;
1621 Function *F = TheModule->getFunction(std::string("unary")+Opcode);
1623 return ErrorV("Unknown unary operator");
1625 return Builder.CreateCall(F, OperandV, "unop");
1629 Value *BinaryExprAST::Codegen() {
1630 // Special case '=' because we don't want to emit the LHS as an expression.
1632 // Assignment requires the LHS to be an identifier.
1633 VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS);
1635 return ErrorV("destination of '=' must be a variable");
1637 Value *Val = RHS->Codegen();
1638 if (Val == 0) return 0;
1640 // Look up the name.
1641 Value *Variable = NamedValues[LHSE->getName()];
1642 if (Variable == 0) return ErrorV("Unknown variable name");
1644 Builder.CreateStore(Val, Variable);
1649 Value *L = LHS->Codegen();
1650 Value *R = RHS->Codegen();
1651 if (L == 0 || R == 0) return 0;
1654 case '+': return Builder.CreateAdd(L, R, "addtmp");
1655 case '-': return Builder.CreateSub(L, R, "subtmp");
1656 case '*': return Builder.CreateMul(L, R, "multmp");
1658 L = Builder.CreateFCmpULT(L, R, "multmp");
1659 // Convert bool 0/1 to double 0.0 or 1.0
1660 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1664 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1666 Function *F = TheModule->getFunction(std::string("binary")+Op);
1667 assert(F && "binary operator not found!");
1669 Value *Ops[] = { L, R };
1670 return Builder.CreateCall(F, Ops, Ops+2, "binop");
1673 Value *CallExprAST::Codegen() {
1674 // Look up the name in the global module table.
1675 Function *CalleeF = TheModule->getFunction(Callee);
1677 return ErrorV("Unknown function referenced");
1679 // If argument mismatch error.
1680 if (CalleeF->arg_size() != Args.size())
1681 return ErrorV("Incorrect # arguments passed");
1683 std::vector<Value*> ArgsV;
1684 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1685 ArgsV.push_back(Args[i]->Codegen());
1686 if (ArgsV.back() == 0) return 0;
1689 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1692 Value *IfExprAST::Codegen() {
1693 Value *CondV = Cond->Codegen();
1694 if (CondV == 0) return 0;
1696 // Convert condition to a bool by comparing equal to 0.0.
1697 CondV = Builder.CreateFCmpONE(CondV,
1698 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1701 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1703 // Create blocks for the then and else cases. Insert the 'then' block at the
1704 // end of the function.
1705 BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1706 BasicBlock *ElseBB = new BasicBlock("else");
1707 BasicBlock *MergeBB = new BasicBlock("ifcont");
1709 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1712 Builder.SetInsertPoint(ThenBB);
1714 Value *ThenV = Then->Codegen();
1715 if (ThenV == 0) return 0;
1717 Builder.CreateBr(MergeBB);
1718 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1719 ThenBB = Builder.GetInsertBlock();
1722 TheFunction->getBasicBlockList().push_back(ElseBB);
1723 Builder.SetInsertPoint(ElseBB);
1725 Value *ElseV = Else->Codegen();
1726 if (ElseV == 0) return 0;
1728 Builder.CreateBr(MergeBB);
1729 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1730 ElseBB = Builder.GetInsertBlock();
1732 // Emit merge block.
1733 TheFunction->getBasicBlockList().push_back(MergeBB);
1734 Builder.SetInsertPoint(MergeBB);
1735 PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1737 PN->addIncoming(ThenV, ThenBB);
1738 PN->addIncoming(ElseV, ElseBB);
1742 Value *ForExprAST::Codegen() {
1744 // var = alloca double
1746 // start = startexpr
1747 // store start -> var
1755 // endcond = endexpr
1757 // curvar = load var
1758 // nextvar = curvar + step
1759 // store nextvar -> var
1760 // br endcond, loop, endloop
1763 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1765 // Create an alloca for the variable in the entry block.
1766 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1768 // Emit the start code first, without 'variable' in scope.
1769 Value *StartVal = Start->Codegen();
1770 if (StartVal == 0) return 0;
1772 // Store the value into the alloca.
1773 Builder.CreateStore(StartVal, Alloca);
1775 // Make the new basic block for the loop header, inserting after current
1777 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1778 BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1780 // Insert an explicit fall through from the current block to the LoopBB.
1781 Builder.CreateBr(LoopBB);
1783 // Start insertion in LoopBB.
1784 Builder.SetInsertPoint(LoopBB);
1786 // Within the loop, the variable is defined equal to the PHI node. If it
1787 // shadows an existing variable, we have to restore it, so save it now.
1788 AllocaInst *OldVal = NamedValues[VarName];
1789 NamedValues[VarName] = Alloca;
1791 // Emit the body of the loop. This, like any other expr, can change the
1792 // current BB. Note that we ignore the value computed by the body, but don't
1794 if (Body->Codegen() == 0)
1797 // Emit the step value.
1800 StepVal = Step->Codegen();
1801 if (StepVal == 0) return 0;
1803 // If not specified, use 1.0.
1804 StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1807 // Compute the end condition.
1808 Value *EndCond = End->Codegen();
1809 if (EndCond == 0) return EndCond;
1811 // Reload, increment, and restore the alloca. This handles the case where
1812 // the body of the loop mutates the variable.
1813 Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
1814 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
1815 Builder.CreateStore(NextVar, Alloca);
1817 // Convert condition to a bool by comparing equal to 0.0.
1818 EndCond = Builder.CreateFCmpONE(EndCond,
1819 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1822 // Create the "after loop" block and insert it.
1823 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1824 BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1826 // Insert the conditional branch into the end of LoopEndBB.
1827 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1829 // Any new code will be inserted in AfterBB.
1830 Builder.SetInsertPoint(AfterBB);
1832 // Restore the unshadowed variable.
1834 NamedValues[VarName] = OldVal;
1836 NamedValues.erase(VarName);
1839 // for expr always returns 0.0.
1840 return Constant::getNullValue(Type::DoubleTy);
1843 Value *VarExprAST::Codegen() {
1844 std::vector<AllocaInst *> OldBindings;
1846 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1848 // Register all variables and emit their initializer.
1849 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
1850 const std::string &VarName = VarNames[i].first;
1851 ExprAST *Init = VarNames[i].second;
1853 // Emit the initializer before adding the variable to scope, this prevents
1854 // the initializer from referencing the variable itself, and permits stuff
1857 // var a = a in ... # refers to outer 'a'.
1860 InitVal = Init->Codegen();
1861 if (InitVal == 0) return 0;
1862 } else { // If not specified, use 0.0.
1863 InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
1866 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1867 Builder.CreateStore(InitVal, Alloca);
1869 // Remember the old variable binding so that we can restore the binding when
1871 OldBindings.push_back(NamedValues[VarName]);
1873 // Remember this binding.
1874 NamedValues[VarName] = Alloca;
1877 // Codegen the body, now that all vars are in scope.
1878 Value *BodyVal = Body->Codegen();
1879 if (BodyVal == 0) return 0;
1881 // Pop all our variables from scope.
1882 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
1883 NamedValues[VarNames[i].first] = OldBindings[i];
1885 // Return the body computation.
1890 Function *PrototypeAST::Codegen() {
1891 // Make the function type: double(double,double) etc.
1892 std::vector<const Type*> Doubles(Args.size(), Type::DoubleTy);
1893 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1895 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1897 // If F conflicted, there was already something named 'Name'. If it has a
1898 // body, don't allow redefinition or reextern.
1899 if (F->getName() != Name) {
1900 // Delete the one we just made and get the existing one.
1901 F->eraseFromParent();
1902 F = TheModule->getFunction(Name);
1904 // If F already has a body, reject this.
1905 if (!F->empty()) {
1906 ErrorF("redefinition of function");
1910 // If F took a different number of args, reject.
1911 if (F->arg_size() != Args.size()) {
1912 ErrorF("redefinition of function with different # args");
1917 // Set names for all arguments.
1919 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
1921 AI->setName(Args[Idx]);
1926 /// CreateArgumentAllocas - Create an alloca for each argument and register the
1927 /// argument in the symbol table so that references to it will succeed.
1928 void PrototypeAST::CreateArgumentAllocas(Function *F) {
1929 Function::arg_iterator AI = F->arg_begin();
1930 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
1931 // Create an alloca for this variable.
1932 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
1934 // Store the initial value into the alloca.
1935 Builder.CreateStore(AI, Alloca);
1937 // Add arguments to variable symbol table.
1938 NamedValues[Args[Idx]] = Alloca;
1943 Function *FunctionAST::Codegen() {
1944 NamedValues.clear();
1946 Function *TheFunction = Proto->Codegen();
1947 if (TheFunction == 0)
1950 // If this is an operator, install it.
1951 if (Proto->isBinaryOp())
1952 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
1954 // Create a new basic block to start insertion into.
1955 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1956 Builder.SetInsertPoint(BB);
1958 // Add all arguments to the symbol table and create their allocas.
1959 Proto->CreateArgumentAllocas(TheFunction);
1961 if (Value *RetVal = Body->Codegen()) {
1962 // Finish off the function.
1963 Builder.CreateRet(RetVal);
1965 // Validate the generated code, checking for consistency.
1966 verifyFunction(*TheFunction);
1968 // Optimize the function.
1969 TheFPM->run(*TheFunction);
1974 // Error reading body, remove function.
1975 TheFunction->eraseFromParent();
1977 if (Proto->isBinaryOp())
1978 BinopPrecedence.erase(Proto->getOperatorName());
1982 //===----------------------------------------------------------------------===//
1983 // Top-Level parsing and JIT Driver
1984 //===----------------------------------------------------------------------===//
1986 static ExecutionEngine *TheExecutionEngine;
1988 static void HandleDefinition() {
1989 if (FunctionAST *F = ParseDefinition()) {
1990 if (Function *LF = F->Codegen()) {
1991 fprintf(stderr, "Read function definition:");
1995 // Skip token for error recovery.
2000 static void HandleExtern() {
2001 if (PrototypeAST *P = ParseExtern()) {
2002 if (Function *F = P->Codegen()) {
2003 fprintf(stderr, "Read extern: ");
2007 // Skip token for error recovery.
2012 static void HandleTopLevelExpression() {
2013 // Evaluate a top level expression into an anonymous function.
2014 if (FunctionAST *F = ParseTopLevelExpr()) {
2015 if (Function *LF = F->Codegen()) {
2016 // JIT the function, returning a function pointer.
2017 void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
2019 // Cast it to the right type (takes no arguments, returns a double) so we
2020 // can call it as a native function.
2021 double (*FP)() = (double (*)())FPtr;
2022 fprintf(stderr, "Evaluated to %f\n", FP());
2025 // Skip token for error recovery.
2030 /// top ::= definition | external | expression | ';'
2031 static void MainLoop() {
2033 fprintf(stderr, "ready> ");
2035 case tok_eof: return;
2036 case ';': getNextToken(); break; // ignore top level semicolons.
2037 case tok_def: HandleDefinition(); break;
2038 case tok_extern: HandleExtern(); break;
2039 default: HandleTopLevelExpression(); break;
2046 //===----------------------------------------------------------------------===//
2047 // "Library" functions that can be "extern'd" from user code.
2048 //===----------------------------------------------------------------------===//
2050 /// putchard - putchar that takes a double and returns 0.
2052 double putchard(double X) {
2057 /// printd - printf that takes a double prints it as "%f\n", returning 0.
2059 double printd(double X) {
2064 //===----------------------------------------------------------------------===//
2065 // Main driver code.
2066 //===----------------------------------------------------------------------===//
2069 // Install standard binary operators.
2070 // 1 is lowest precedence.
2071 BinopPrecedence['='] = 2;
2072 BinopPrecedence['<'] = 10;
2073 BinopPrecedence['+'] = 20;
2074 BinopPrecedence['-'] = 20;
2075 BinopPrecedence['*'] = 40; // highest.
2077 // Prime the first token.
2078 fprintf(stderr, "ready> ");
2081 // Make the module, which holds all the code.
2082 TheModule = new Module("my cool jit");
2085 TheExecutionEngine = ExecutionEngine::create(TheModule);
2088 ExistingModuleProvider OurModuleProvider(TheModule);
2089 FunctionPassManager OurFPM(&OurModuleProvider);
2091 // Set up the optimizer pipeline. Start with registering info about how the
2092 // target lays out data structures.
2093 OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData()));
2094 // Promote allocas to registers.
2095 OurFPM.add(createPromoteMemoryToRegisterPass());
2096 // Do simple "peephole" optimizations and bit-twiddling optzns.
2097 OurFPM.add(createInstructionCombiningPass());
2098 // Reassociate expressions.
2099 OurFPM.add(createReassociatePass());
2100 // Eliminate Common SubExpressions.
2101 OurFPM.add(createGVNPass());
2102 // Simplify the control flow graph (deleting unreachable blocks, etc).
2103 OurFPM.add(createCFGSimplificationPass());
2105 // Set the global so the code gen can use this.
2106 TheFPM = &OurFPM;
2108 // Run the main "interpreter loop" now.
2112 } // Free module provider and pass manager.
2115 // Print out all of the generated code.
2116 TheModule->dump();
2124 <!-- *********************************************************************** -->
2127 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2128 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
2129 <a href="http://validator.w3.org/check/referer"><img
2130 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
2132 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
2133 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
2134 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $