docs/tutorial/LangImpl7.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2                       "http://www.w3.org/TR/html4/strict.dtd">
   3
   4 <html>
   5 <head>
   6   <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
   7          construction</title>
   8   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   9   <meta name="author" content="Chris Lattner">
  10   <link rel="stylesheet" href="../llvm.css" type="text/css">
  11 </head>
  12
  13 <body>
  14
  15 <div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
  16
  17 <div class="doc_author">
  18   <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
  19 </div>
  20
  21 <!-- *********************************************************************** -->
  22 <div class="doc_section"><a name="intro">Part 7 Introduction</a></div>
  23 <!-- *********************************************************************** -->
  24
  25 <div class="doc_text">
  26
  27 <p>Welcome to Part 7 of the "<a href="index.html">Implementing a language with
  28 LLVM</a>" tutorial.  In parts 1 through 6, we've built a very respectable,
  29 albeit simple, <a
  30 href="http://en.wikipedia.org/wiki/Functional_programming">functional
  31 programming language</a>.  In our journey, we learned some parsing techniques,
  32 how to build and represent an AST, how to build LLVM IR, and how to optimize
  33 the resultant code and JIT compile it.</p>
  34
  35 <p>While Kaleidoscope is interesting as a functional language, this makes it
  36 "too easy" to generate LLVM IR for it.  In particular, a functional language
  37 makes it very easy to build LLVM IR directly in <a
  38 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
  39 Since LLVM requires that the input code be in SSA form, this is a very nice
  40 property and it is often unclear to newcomers how to generate code for an
  41 imperative language with mutable variables.</p>
  42
  43 <p>The short (and happy) summary of this chapter is that there is no need for
  44 your front-end to build SSA form: LLVM provides highly tuned and well tested
  45 support for this, though the way it works is a bit unexpected for some.</p>
  46
  47 </div>
  48
  49 <!-- *********************************************************************** -->
  50 <div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
  51 <!-- *********************************************************************** -->
  52
  53 <div class="doc_text">
  54
  55 <p>
  56 To understand why mutable variables cause complexities in SSA construction,
  57 consider this extremely simple C example:
  58 </p>
  59
  60 <div class="doc_code">
  61 <pre>
  62 int G, H;
  63 int test(_Bool Condition) {
  64   int X;
  65   if (Condition)
  66     X = G;
  67   else
  68     X = H;
  69   return X;
  70 }
  71 </pre>
  72 </div>
  73
  74 <p>In this case, we have the variable "X", whose value depends on the path
  75 executed in the program.  Because there are two different possible values for X
  76 before the return instruction, a PHI node is inserted to merge the two values.
  77 The LLVM IR that we want for this example looks like this:</p>
  78
  79 <div class="doc_code">
  80 <pre>
  81 @G = weak global i32 0   ; type of @G is i32*
  82 @H = weak global i32 0   ; type of @H is i32*
  83
  84 define i32 @test(i1 %Condition) {
  85 entry:
  86         br i1 %Condition, label %cond_true, label %cond_false
  87
  88 cond_true:
  89         %X.0 = load i32* @G
  90         br label %cond_next
  91
  92 cond_false:
  93         %X.1 = load i32* @H
  94         br label %cond_next
  95
  96 cond_next:
  97         %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
  98         ret i32 %X.2
  99 }
 100 </pre>
 101 </div>
 102
 103 <p>In this example, the loads from the G and H global variables are explicit in
 104 the LLVM IR, and they live in the then/else branches of the if statement
 105 (cond_true/cond_false).  In order to merge the incoming values, the X.2 phi node
 106 in the cond_next block selects the right value to use based on where control
 107 flow is coming from: if control flow comes from the cond_false block, X.2 gets
 108 the value of X.1.  Alternatively, if control flow comes from cond_tree, it gets
 109 the value of X.0.  The intent of this chapter is not to explain the details of
 110 SSA form.  For more information, see one of the many <a
 111 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
 112 references</a>.</p>
 113
 114 <p>The question for this article is "who places phi nodes when lowering
 115 assignments to mutable variables?".  The issue here is that LLVM
 116 <em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
 117 However, SSA construction requires non-trivial algorithms and data structures,
 118 so it is inconvenient and wasteful for every front-end to have to reproduce this
 119 logic.</p>
 120
 121 </div>
 122
 123 <!-- *********************************************************************** -->
 124 <div class="doc_section"><a name="memory">Memory in LLVM</a></div>
 125 <!-- *********************************************************************** -->
 126
 127 <div class="doc_text">
 128
 129 <p>The 'trick' here is that while LLVM does require all register values to be
 130 in SSA form, it does not require (or permit) memory objects to be in SSA form.
 131 In the example above, note that the loads from G and H are direct accesses to
 132 G and H: they are not renamed or versioned.  This differs from some other
 133 compiler systems, which do try to version memory objects.  In LLVM, instead of
 134 encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
 135 href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
 136 demand.</p>
 137
 138 <p>
 139 With this in mind, the high-level idea is that we want to make a stack variable
 140 (which lives in memory, because it is on the stack) for each mutable object in
 141 a function.  To take advantage of this trick, we need to talk about how LLVM
 142 represents stack variables.
 143 </p>
 144
 145 <p>In LLVM, all memory accesses are explicit with load/store instructions, and
 146 it is carefully designed to not have (or need) an "address-of" operator.  Notice
 147 how the type of the @G/@H global variables is actually "i32*" even though the
 148 variable is defined as "i32".  What this means is that @G defines <em>space</em>
 149 for an i32 in the global data area, but its <em>name</em> actually refers to the
 150 address for that space.  Stack variables work the same way, but instead of being
 151 declared with global variable definitions, they are declared with the
 152 <a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
 153
 154 <div class="doc_code">
 155 <pre>
 156 define i32 @test(i1 %Condition) {
 157 entry:
 158         %X = alloca i32           ; type of %X is i32*.
 159         ...
 160         %tmp = load i32* %X       ; load the stack value %X from the stack.
 161         %tmp2 = add i32 %tmp, 1   ; increment it
 162         store i32 %tmp2, i32* %X  ; store it back
 163         ...
 164 </pre>
 165 </div>
 166
 167 <p>This code shows an example of how you can declare and manipulate a stack
 168 variable in the LLVM IR.  Stack memory allocated with the alloca instruction is
 169 fully general: you can pass the address of the stack slot to functions, you can
 170 store it in other variables, etc.  In our example above, we could rewrite the
 171 example to use the alloca technique to avoid using a PHI node:</p>
 172
 173 <div class="doc_code">
 174 <pre>
 175 @G = weak global i32 0   ; type of @G is i32*
 176 @H = weak global i32 0   ; type of @H is i32*
 177
 178 define i32 @test(i1 %Condition) {
 179 entry:
 180         %X = alloca i32           ; type of %X is i32*.
 181         br i1 %Condition, label %cond_true, label %cond_false
 182
 183 cond_true:
 184         %X.0 = load i32* @G
 185         store i32 %X.0, i32* %X   ; Update X
 186         br label %cond_next
 187
 188 cond_false:
 189         %X.1 = load i32* @H
 190         store i32 %X.1, i32* %X   ; Update X
 191         br label %cond_next
 192
 193 cond_next:
 194         %X.2 = load i32* %X       ; Read X
 195         ret i32 %X.2
 196 }
 197 </pre>
 198 </div>
 199
 200 <p>With this, we have discovered a way to handle arbitrary mutable variables
 201 without the need to create Phi nodes at all:</p>
 202
 203 <ol>
 204 <li>Each mutable variable becomes a stack allocation.</li>
 205 <li>Each read of the variable becomes a load from the stack.</li>
 206 <li>Each update of the variable becomes a store to the stack.</li>
 207 <li>Taking the address of a variable just uses the stack address directly.</li>
 208 </ol>
 209
 210 <p>While this solution has solved our immediate problem, it introduced another
 211 one: we have now apparently introduced a lot of stack traffic for very simple
 212 and common operations, a major performance problem.  Fortunately for us, the
 213 LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
 214 this case, promoting allocas like this into SSA registers, inserting Phi nodes
 215 as appropriate.  If you run this example through the pass, for example, you'll
 216 get:</p>
 217
 218 <div class="doc_code">
 219 <pre>
 220 $ <b>llvm-as &lt; example.ll | opt -mem2reg | llvm-dis</b>
 221 @G = weak global i32 0
 222 @H = weak global i32 0
 223
 224 define i32 @test(i1 %Condition) {
 225 entry:
 226         br i1 %Condition, label %cond_true, label %cond_false
 227
 228 cond_true:
 229         %X.0 = load i32* @G
 230         br label %cond_next
 231
 232 cond_false:
 233         %X.1 = load i32* @H
 234         br label %cond_next
 235
 236 cond_next:
 237         %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
 238         ret i32 %X.01
 239 }
 240 </pre>
 241 </div>
 242
 243 <p>The mem2reg pass implements the standard "iterated dominator frontier"
 244 algorithm for constructing SSA form and has a number of optimizations that speed
 245 up very common degenerate cases.  mem2reg really is the answer for dealing with
 246 mutable variables, and we highly recommend that you depend on it.  Note that
 247 mem2reg only works on variables in certain circumstances:</p>
 248
 249 <ol>
 250 <li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
 251 promotes them.  It does not apply to global variables or heap allocations.</li>
 252
 253 <li>mem2reg only looks for alloca instructions in the entry block of the
 254 function.  Being in the entry block guarantees that the alloca is only executed
 255 once, which makes analysis simpler.</li>
 256
 257 <li>mem2reg only promotes allocas whose uses are direct loads and stores.  If
 258 the address of the stack object is passed to a function, or if any funny pointer
 259 arithmetic is involved, the alloca will not be promoted.</li>
 260
 261 <li>mem2reg only works on allocas of <a
 262 href="../LangRef.html#t_classifications">first class</a>
 263 values (such as pointers, scalars and vectors), and only if the array size
 264 of the allocation is 1 (or missing in the .ll file).  mem2reg is not capable of
 265 promoting structs or arrays to registers.  Note that the "scalarrepl" pass is
 266 more powerful and can promote structs, "unions", and arrays in many cases.</li>
 267
 268 </ol>
 269
 270 <p>
 271 All of these properties are easy to satisfy for most imperative languages, and
 272 we'll illustrate this below with Kaleidoscope.  The final question you may be
 273 asking is: should I bother with this nonsense for my front-end?  Wouldn't it be
 274 better if I just did SSA construction directly, avoiding use of the mem2reg
 275 optimization pass?  In short, we strongly recommend that use you this technique
 276 for building SSA form, unless there is an extremely good reason not to.  Using
 277 this technique is:</p>
 278
 279 <ul>
 280 <li>Proven and well tested: llvm-gcc and clang both use this technique for local
 281 mutable variables.  As such, the most common clients of LLVM are using this to
 282 handle a bulk of their variables.  You can be sure that bugs are found fast and
 283 fixed early.</li>
 284
 285 <li>Extremely Fast: mem2reg has a number of special cases that make it fast in
 286 common cases as well as fully general.  For example, it has fast-paths for
 287 variables that are only used in a single block, variables that only have one
 288 assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
 289 </li>
 290
 291 <li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
 292 Debug information in LLVM</a> relies on having the address of the variable
 293 exposed to attach debug info to it.  This technique dovetails very naturally
 294 with this style of debug info.</li>
 295 </ul>
 296
 297 <p>If nothing else, this makes it much easier to get your front-end up and
 298 running, and is very simple to implement.  Lets extend Kaleidoscope with mutable
 299 variables now!
 300 </p>
 301
 302 </div>
 303
 304 <!-- *********************************************************************** -->
 305 <div class="doc_section"><a name="kalvars">Mutable Variables in
 306 Kaleidoscope</a></div>
 307 <!-- *********************************************************************** -->
 308
 309 <div class="doc_text">
 310
 311 <p>Now that we know the sort of problem we want to tackle, lets see what this
 312 looks like in the context of our little Kaleidoscope language.  We're going to
 313 add two features:</p>
 314
 315 <ol>
 316 <li>The ability to mutate variables with the '=' operator.</li>
 317 <li>The ability to define new variables.</li>
 318 </ol>
 319
 320 <p>While the first item is really what this is about, we only have variables
 321 for incoming arguments and for induction variables, and redefining them only
 322 goes so far :).  Also, the ability to define new variables is a
 323 useful thing regardless of whether you will be mutating them.  Here's a
 324 motivating example that shows how we could use these:</p>
 325
 326 <div class="doc_code">
 327 <pre>
 328 # Define ':' for sequencing: as a low-precedence operator that ignores operands
 329 # and just returns the RHS.
 330 def binary : 1 (x y) y;
 331
 332 # Recursive fib, we could do this before.
 333 def fib(x)
 334   if (x &lt; 3) then
 335     1
 336   else
 337     fib(x-1)+fib(x-2);
 338
 339 # Iterative fib.
 340 def fibi(x)
 341   <b>var a = 1, b = 1, c in</b>
 342   (for i = 3, i &;t; x in
 343      <b>c = a + b</b> :
 344      <b>a = b</b> :
 345      <b>b = c</b>) :
 346   b;
 347
 348 # Call it.
 349 fibi(10);
 350 </pre>
 351 </div>
 352
 353 <p>
 354 In order to mutate variables, we have to change our existing variables to use
 355 the "alloca trick".  Once we have that, we'll add our new operator, then extend
 356 Kaleidoscope to support new variable definitions.
 357 </p>
 358
 359 </div>
 360
 361 <!-- *********************************************************************** -->
 362 <div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
 363 Mutation</a></div>
 364 <!-- *********************************************************************** -->
 365
 366 <div class="doc_text">
 367
 368 <p>
 369 The symbol table in Kaleidoscope is managed at code generation time by the
 370 '<tt>NamedValues</tt>' map.  This map currently keeps track of the LLVM "Value*"
 371 that holds the double value for the named variable.  In order to support
 372 mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
 373 the <em>memory location</em> of the variable in question.  Note that this
 374 change is a refactoring: it changes the structure of the code, but does not
 375 (by itself) change the behavior of the compiler.  All of these changes are
 376 isolated in the Kaleidoscope code generator.</p>
 377
 378 <p>
 379 At this point in Kaleidoscope's development, it only supports variables for two
 380 things: incoming arguments to functions and the induction variable of 'for'
 381 loops.  For consistency, we'll allow mutation of these variables in addition to
 382 other user-defined variables.  This means that these will both need memory
 383 locations.
 384 </p>
 385
 386 <p>To start our transformation of Kaleidoscope, we'll change the NamedValues
 387 map to map to AllocaInst* instead of Value*.  Once we do this, the C++ compiler
 388 will tell use what parts of the code we need to update:</p>
 389
 390 <div class="doc_code">
 391 <pre>
 392 static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
 393 </pre>
 394 </div>
 395
 396 <p>Also, since we will need to create these alloca's, we'll use a helper
 397 function that ensures that the allocas are created in the entry block of the
 398 function:</p>
 399
 400 <div class="doc_code">
 401 <pre>
 402 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
 403 /// the function.  This is used for mutable variables etc.
 404 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
 405                                           const std::string &amp;VarName) {
 406   LLVMBuilder TmpB(&amp;TheFunction-&gt;getEntryBlock(),
 407                    TheFunction-&gt;getEntryBlock().begin());
 408   return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
 409 }
 410 </pre>
 411 </div>
 412
 413 <p>This funny looking code creates an LLVMBuilder object that is pointing at
 414 the first instruction (.begin()) of the entry block.  It then creates an alloca
 415 with the expected name and returns it.  Because all values in Kaleidoscope are
 416 doubles, there is no need to pass in a type to use.</p>
 417
 418 <p>With this in place, the first functionality change we want to make is to
 419 variable references.  In our new scheme, variables live on the stack, so code
 420 generating a reference to them actually needs to produce a load from the stack
 421 slot:</p>
 422
 423 <div class="doc_code">
 424 <pre>
 425 Value *VariableExprAST::Codegen() {
 426   // Look this variable up in the function.
 427   Value *V = NamedValues[Name];
 428   if (V == 0) return ErrorV("Unknown variable name");
 429
 430   // Load the value.
 431   return Builder.CreateLoad(V, Name.c_str());
 432 }
 433 </pre>
 434 </div>
 435
 436 <p>As you can see, this is pretty straight-forward.  Next we need to update the
 437 things that define the variables to set up the alloca.  We'll start with
 438 <tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
 439 the unabridged code):</p>
 440
 441 <div class="doc_code">
 442 <pre>
 443   Function *TheFunction = Builder.GetInsertBlock()->getParent();
 444
 445   <b>// Create an alloca for the variable in the entry block.
 446   AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
 447
 448     // Emit the start code first, without 'variable' in scope.
 449   Value *StartVal = Start-&gt;Codegen();
 450   if (StartVal == 0) return 0;
 451
 452   <b>// Store the value into the alloca.
 453   Builder.CreateStore(StartVal, Alloca);</b>
 454   ...
 455
 456   // Compute the end condition.
 457   Value *EndCond = End-&gt;Codegen();
 458   if (EndCond == 0) return EndCond;
 459
 460   <b>// Reload, increment, and restore the alloca.  This handles the case where
 461   // the body of the loop mutates the variable.
 462   Value *CurVar = Builder.CreateLoad(Alloca);
 463   Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
 464   Builder.CreateStore(NextVar, Alloca);</b>
 465   ...
 466 </pre>
 467 </div>
 468
 469 <p>This code is virtually identical to the code <a
 470 href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>.  The
 471 big difference is that we no longer have to construct a PHI node, and we use
 472 load/store to access the variable as needed.</p>
 473
 474 <p>To support mutable argument variables, we need to also make allocas for them.
 475 The code for this is also pretty simple:</p>
 476
 477 <div class="doc_code">
 478 <pre>
 479 /// CreateArgumentAllocas - Create an alloca for each argument and register the
 480 /// argument in the symbol table so that references to it will succeed.
 481 void PrototypeAST::CreateArgumentAllocas(Function *F) {
 482   Function::arg_iterator AI = F-&gt;arg_begin();
 483   for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
 484     // Create an alloca for this variable.
 485     AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
 486
 487     // Store the initial value into the alloca.
 488     Builder.CreateStore(AI, Alloca);
 489
 490     // Add arguments to variable symbol table.
 491     NamedValues[Args[Idx]] = Alloca;
 492   }
 493 }
 494 </pre>
 495 </div>
 496
 497 <p>For each argument, we make an alloca, store the input value to the function
 498 into the alloca, and register the alloca as the memory location for the
 499 argument.  This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
 500 it sets up the entry block for the function.</p>
 501
 502 <p>The final missing piece is adding the 'mem2reg' pass, which allows us to get
 503 good codegen once again:</p>
 504
 505 <div class="doc_code">
 506 <pre>
 507     // Set up the optimizer pipeline.  Start with registering info about how the
 508     // target lays out data structures.
 509     OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
 510     <b>// Promote allocas to registers.
 511     OurFPM.add(createPromoteMemoryToRegisterPass());</b>
 512     // Do simple "peephole" optimizations and bit-twiddling optzns.
 513     OurFPM.add(createInstructionCombiningPass());
 514     // Reassociate expressions.
 515     OurFPM.add(createReassociatePass());
 516 </pre>
 517 </div>
 518
 519 <p>It is interesting to see what the code looks like before and after the
 520 mem2reg optimization runs.  For example, this is the before/after code for our
 521 recursive fib.  Before the optimization:</p>
 522
 523 <div class="doc_code">
 524 <pre>
 525 define double @fib(double %x) {
 526 entry:
 527         <b>%x1 = alloca double
 528         store double %x, double* %x1
 529         %x2 = load double* %x1</b>
 530         %multmp = fcmp ult double %x2, 3.000000e+00
 531         %booltmp = uitofp i1 %multmp to double
 532         %ifcond = fcmp one double %booltmp, 0.000000e+00
 533         br i1 %ifcond, label %then, label %else
 534
 535 then:           ; preds = %entry
 536         br label %ifcont
 537
 538 else:           ; preds = %entry
 539         <b>%x3 = load double* %x1</b>
 540         %subtmp = sub double %x3, 1.000000e+00
 541         %calltmp = call double @fib( double %subtmp )
 542         <b>%x4 = load double* %x1</b>
 543         %subtmp5 = sub double %x4, 2.000000e+00
 544         %calltmp6 = call double @fib( double %subtmp5 )
 545         %addtmp = add double %calltmp, %calltmp6
 546         br label %ifcont
 547
 548 ifcont:         ; preds = %else, %then
 549         %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
 550         ret double %iftmp
 551 }
 552 </pre>
 553 </div>
 554
 555 <p>Here there is only one variable (x, the input argument) but you can still
 556 see the extremely simple-minded code generation strategy we are using.  In the
 557 entry block, an alloca is created, and the initial input value is stored into
 558 it.  Each reference to the variable does a reload from the stack.  Also, note
 559 that we didn't modify the if/then/else expression, so it still inserts a PHI
 560 node.  While we could make an alloca for it, it is actually easier to create a
 561 PHI node for it, so we still just make the PHI.</p>
 562
 563 <p>Here is the code after the mem2reg pass runs:</p>
 564
 565 <div class="doc_code">
 566 <pre>
 567 define double @fib(double %x) {
 568 entry:
 569         %multmp = fcmp ult double <b>%x</b>, 3.000000e+00
 570         %booltmp = uitofp i1 %multmp to double
 571         %ifcond = fcmp one double %booltmp, 0.000000e+00
 572         br i1 %ifcond, label %then, label %else
 573
 574 then:
 575         br label %ifcont
 576
 577 else:
 578         %subtmp = sub double <b>%x</b>, 1.000000e+00
 579         %calltmp = call double @fib( double %subtmp )
 580         %subtmp5 = sub double <b>%x</b>, 2.000000e+00
 581         %calltmp6 = call double @fib( double %subtmp5 )
 582         %addtmp = add double %calltmp, %calltmp6
 583         br label %ifcont
 584
 585 ifcont:         ; preds = %else, %then
 586         %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
 587         ret double %iftmp
 588 }
 589 </pre>
 590 </div>
 591
 592 <p>This is a trivial case for mem2reg, since there are no redefinitions of the
 593 variable.  The point of showing this is to calm your tension about inserting
 594 such blatent inefficiencies :).</p>
 595
 596 <p>After the rest of the optimizers run, we get:</p>
 597
 598 <div class="doc_code">
 599 <pre>
 600 define double @fib(double %x) {
 601 entry:
 602         %multmp = fcmp ult double %x, 3.000000e+00
 603         %booltmp = uitofp i1 %multmp to double
 604         %ifcond = fcmp ueq double %booltmp, 0.000000e+00
 605         br i1 %ifcond, label %else, label %ifcont
 606
 607 else:
 608         %subtmp = sub double %x, 1.000000e+00
 609         %calltmp = call double @fib( double %subtmp )
 610         %subtmp5 = sub double %x, 2.000000e+00
 611         %calltmp6 = call double @fib( double %subtmp5 )
 612         %addtmp = add double %calltmp, %calltmp6
 613         ret double %addtmp
 614
 615 ifcont:
 616         ret double 1.000000e+00
 617 }
 618 </pre>
 619 </div>
 620
 621 <p>Here we see that the simplifycfg pass decided to clone the return instruction
 622 into the end of the 'else' block.  This allowed it to eliminate some branches
 623 and the PHI node.</p>
 624
 625 <p>Now that all symbol table references are updated to use stack variables,
 626 we'll add the assignment operator.</p>
 627
 628 </div>
 629
 630 <!-- *********************************************************************** -->
 631 <div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
 632 <!-- *********************************************************************** -->
 633
 634 <div class="doc_text">
 635
 636 <p>With our current framework, adding a new assignment operator is really
 637 simple.  We will parse it just like any other binary operator, but handle it
 638 internally (instead of allowing the user to define it).  The first step is to
 639 set a precedence:</p>
 640
 641 <div class="doc_code">
 642 <pre>
 643  int main() {
 644    // Install standard binary operators.
 645    // 1 is lowest precedence.
 646    <b>BinopPrecedence['='] = 2;</b>
 647    BinopPrecedence['&lt;'] = 10;
 648    BinopPrecedence['+'] = 20;
 649    BinopPrecedence['-'] = 20;
 650 </pre>
 651 </div>
 652
 653 <p>Now that the parser knows the precedence of the binary operator, it takes
 654 care of all the parsing and AST generation.  We just need to implement codegen
 655 for the assignment operator.  This looks like:</p>
 656
 657 <div class="doc_code">
 658 <pre>
 659 Value *BinaryExprAST::Codegen() {
 660   // Special case '=' because we don't want to emit the LHS as an expression.
 661   if (Op == '=') {
 662     // Assignment requires the LHS to be an identifier.
 663     VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
 664     if (!LHSE)
 665       return ErrorV("destination of '=' must be a variable");
 666 </pre>
 667 </div>
 668
 669 <p>Unlike the rest of the binary operators, our assignment operator doesn't
 670 follow the "emit LHS, emit RHS, do computation" model.  As such, it is handled
 671 as a special case before the other binary operators are handled.  The other
 672 strange thing about it is that it requires the LHS to be a variable directly.
 673 </p>
 674
 675 <div class="doc_code">
 676 <pre>
 677     // Codegen the RHS.
 678     Value *Val = RHS-&gt;Codegen();
 679     if (Val == 0) return 0;
 680
 681     // Look up the name.
 682     Value *Variable = NamedValues[LHSE-&gt;getName()];
 683     if (Variable == 0) return ErrorV("Unknown variable name");
 684
 685     Builder.CreateStore(Val, Variable);
 686     return Val;
 687   }
 688   ...
 689 </pre>
 690 </div>
 691
 692 <p>Once it has the variable, codegen'ing the assignment is straight-forward:
 693 we emit the RHS of the assignment, create a store, and return the computed
 694 value.  Returning a value allows for chained assignments like "X = (Y = Z)".</p>
 695
 696 <p>Now that we have an assignment operator, we can mutate loop variables and
 697 arguments.  For example, we can now run code like this:</p>
 698
 699 <div class="doc_code">
 700 <pre>
 701 # Function to print a double.
 702 extern printd(x);
 703
 704 # Define ':' for sequencing: as a low-precedence operator that ignores operands
 705 # and just returns the RHS.
 706 def binary : 1 (x y) y;
 707
 708 def test(x)
 709   printd(x) :
 710   x = 4 :
 711   printd(x);
 712
 713 test(123);
 714 </pre>
 715 </div>
 716
 717 <p>When run, this example prints "123" and then "4", showing that we did
 718 actually mutate the value!  Okay, we have now officially implemented our goal:
 719 getting this to work requires SSA construction in the general case.  However,
 720 to be really useful, we want the ability to define our own local variables, lets
 721 add this next!
 722 </p>
 723
 724 </div>
 725
 726 <!-- *********************************************************************** -->
 727 <div class="doc_section"><a name="localvars">User-defined Local
 728 Variables</a></div>
 729 <!-- *********************************************************************** -->
 730
 731 <div class="doc_text">
 732
 733 <p>Adding var/in is just like any other other extensions we made to
 734 Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
 735 The first step for adding our new 'var/in' construct is to extend the lexer.
 736 As before, this is pretty trivial, the code looks like this:</p>
 737
 738 <div class="doc_code">
 739 <pre>
 740 enum Token {
 741   ...
 742   <b>// var definition
 743   tok_var = -13</b>
 744 ...
 745 }
 746 ...
 747 static int gettok() {
 748 ...
 749     if (IdentifierStr == "in") return tok_in;
 750     if (IdentifierStr == "binary") return tok_binary;
 751     if (IdentifierStr == "unary") return tok_unary;
 752     <b>if (IdentifierStr == "var") return tok_var;</b>
 753     return tok_identifier;
 754 ...
 755 </pre>
 756 </div>
 757
 758 <p>The next step is to define the AST node that we will construct.  For var/in,
 759 it will look like this:</p>
 760
 761 <div class="doc_code">
 762 <pre>
 763 /// VarExprAST - Expression class for var/in
 764 class VarExprAST : public ExprAST {
 765   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
 766   ExprAST *Body;
 767 public:
 768   VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
 769              ExprAST *body)
 770   : VarNames(varnames), Body(body) {}
 771
 772   virtual Value *Codegen();
 773 };
 774 </pre>
 775 </div>
 776
 777 <p>var/in allows a list of names to be defined all at once, and each name can
 778 optionally have an initializer value.  As such, we capture this information in
 779 the VarNames vector.  Also, var/in has a body, this body is allowed to access
 780 the variables defined by the let/in.</p>
 781
 782 <p>With this ready, we can define the parser pieces.  First thing we do is add
 783 it as a primary expression:</p>
 784
 785 <div class="doc_code">
 786 <pre>
 787 /// primary
 788 ///   ::= identifierexpr
 789 ///   ::= numberexpr
 790 ///   ::= parenexpr
 791 ///   ::= ifexpr
 792 ///   ::= forexpr
 793 <b>///   ::= varexpr</b>
 794 static ExprAST *ParsePrimary() {
 795   switch (CurTok) {
 796   default: return Error("unknown token when expecting an expression");
 797   case tok_identifier: return ParseIdentifierExpr();
 798   case tok_number:     return ParseNumberExpr();
 799   case '(':            return ParseParenExpr();
 800   case tok_if:         return ParseIfExpr();
 801   case tok_for:        return ParseForExpr();
 802   <b>case tok_var:        return ParseVarExpr();</b>
 803   }
 804 }
 805 </pre>
 806 </div>
 807
 808 <p>Next we define ParseVarExpr:</p>
 809
 810 <div class="doc_code">
 811 <pre>
 812 /// varexpr ::= 'var' identifer ('=' expression)?
 813 //                    (',' identifer ('=' expression)?)* 'in' expression
 814 static ExprAST *ParseVarExpr() {
 815   getNextToken();  // eat the var.
 816
 817   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
 818
 819   // At least one variable name is required.
 820   if (CurTok != tok_identifier)
 821     return Error("expected identifier after var");
 822 </pre>
 823 </div>
 824
 825 <p>The first part of this code parses the list of identifier/expr pairs into the
 826 local <tt>VarNames</tt> vector.
 827
 828 <div class="doc_code">
 829 <pre>
 830   while (1) {
 831     std::string Name = IdentifierStr;
 832     getNextToken();  // eat identifer.
 833
 834     // Read the optional initializer.
 835     ExprAST *Init = 0;
 836     if (CurTok == '=') {
 837       getNextToken(); // eat the '='.
 838
 839       Init = ParseExpression();
 840       if (Init == 0) return 0;
 841     }
 842
 843     VarNames.push_back(std::make_pair(Name, Init));
 844
 845     // End of var list, exit loop.
 846     if (CurTok != ',') break;
 847     getNextToken(); // eat the ','.
 848
 849     if (CurTok != tok_identifier)
 850       return Error("expected identifier list after var");
 851   }
 852 </pre>
 853 </div>
 854
 855 <p>Once all the variables are parsed, we then parse the body and create the
 856 AST node:</p>
 857
 858 <div class="doc_code">
 859 <pre>
 860   // At this point, we have to have 'in'.
 861   if (CurTok != tok_in)
 862     return Error("expected 'in' keyword after 'var'");
 863   getNextToken();  // eat 'in'.
 864
 865   ExprAST *Body = ParseExpression();
 866   if (Body == 0) return 0;
 867
 868   return new VarExprAST(VarNames, Body);
 869 }
 870 </pre>
 871 </div>
 872
 873 <p>Now that we can parse and represent the code, we need to support emission of
 874 LLVM IR for it.  This code starts out with:</p>
 875
 876 <div class="doc_code">
 877 <pre>
 878 Value *VarExprAST::Codegen() {
 879   std::vector&lt;AllocaInst *&gt; OldBindings;
 880
 881   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
 882
 883   // Register all variables and emit their initializer.
 884   for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
 885     const std::string &amp;VarName = VarNames[i].first;
 886     ExprAST *Init = VarNames[i].second;
 887 </pre>
 888 </div>
 889
 890 <p>Basically it loops over all the variables, installing them one at a time.
 891 For each variable we put into the symbol table, we remember the previous value
 892 that we replace in OldBindings.</p>
 893
 894 <div class="doc_code">
 895 <pre>
 896     // Emit the initializer before adding the variable to scope, this prevents
 897     // the initializer from referencing the variable itself, and permits stuff
 898     // like this:
 899     //  var a = 1 in
 900     //    var a = a in ...   # refers to outer 'a'.
 901     Value *InitVal;
 902     if (Init) {
 903       InitVal = Init-&gt;Codegen();
 904       if (InitVal == 0) return 0;
 905     } else { // If not specified, use 0.0.
 906       InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
 907     }
 908
 909     AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
 910     Builder.CreateStore(InitVal, Alloca);
 911
 912     // Remember the old variable binding so that we can restore the binding when
 913     // we unrecurse.
 914     OldBindings.push_back(NamedValues[VarName]);
 915
 916     // Remember this binding.
 917     NamedValues[VarName] = Alloca;
 918   }
 919 </pre>
 920 </div>
 921
 922 <p>There are more comments here than code.  The basic idea is that we emit the
 923 initializer, create the alloca, then update the symbol table to point to it.
 924 Once all the variables are installed in the symbol table, we evaluate the body
 925 of the var/in expression:</p>
 926
 927 <div class="doc_code">
 928 <pre>
 929   // Codegen the body, now that all vars are in scope.
 930   Value *BodyVal = Body-&gt;Codegen();
 931   if (BodyVal == 0) return 0;
 932 </pre>
 933 </div>
 934
 935 <p>Finally, before returning, we restore the previous variable bindings:</p>
 936
 937 <div class="doc_code">
 938 <pre>
 939   // Pop all our variables from scope.
 940   for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
 941     NamedValues[VarNames[i].first] = OldBindings[i];
 942
 943   // Return the body computation.
 944   return BodyVal;
 945 }
 946 </pre>
 947 </div>
 948
 949 <p>The end result of all of this is that we get properly scoped variable
 950 definitions, and we even (trivially) allow mutation of them :).</p>
 951
 952 <p>With this, we completed what we set out to do.  Our nice iterative fib
 953 example from the intro compiles and runs just fine.  The mem2reg pass optimizes
 954 all of our stack variables into SSA registers, inserting PHI nodes where needed,
 955 and our front-end remains simple: no iterated dominator frontier computation
 956 anywhere in sight.</p>
 957
 958 </div>
 959
 960 <!-- *********************************************************************** -->
 961 <div class="doc_section"><a name="code">Full Code Listing</a></div>
 962 <!-- *********************************************************************** -->
 963
 964 <div class="doc_text">
 965
 966 <p>
 967 Here is the complete code listing for our running example, enhanced with mutable
 968 variables and var/in support.  To build this example, use:
 969 </p>
 970
 971 <div class="doc_code">
 972 <pre>
 973    # Compile
 974    g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
 975    # Run
 976    ./toy
 977 </pre>
 978 </div>
 979
 980 <p>Here is the code:</p>
 981
 982 <div class="doc_code">
 983 <pre>
 984 #include "llvm/DerivedTypes.h"
 985 #include "llvm/ExecutionEngine/ExecutionEngine.h"
 986 #include "llvm/Module.h"
 987 #include "llvm/ModuleProvider.h"
 988 #include "llvm/PassManager.h"
 989 #include "llvm/Analysis/Verifier.h"
 990 #include "llvm/Target/TargetData.h"
 991 #include "llvm/Transforms/Scalar.h"
 992 #include "llvm/Support/LLVMBuilder.h"
 993 #include &lt;cstdio&gt;
 994 #include &lt;string&gt;
 995 #include &lt;map&gt;
 996 #include &lt;vector&gt;
 997 using namespace llvm;
 998
 999 //===----------------------------------------------------------------------===//
1000 // Lexer
1001 //===----------------------------------------------------------------------===//
1002
1003 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
1004 // of these for known things.
1005 enum Token {
1006   tok_eof = -1,
1007
1008   // commands
1009   tok_def = -2, tok_extern = -3,
1010
1011   // primary
1012   tok_identifier = -4, tok_number = -5,
1013
1014   // control
1015   tok_if = -6, tok_then = -7, tok_else = -8,
1016   tok_for = -9, tok_in = -10,
1017
1018   // operators
1019   tok_binary = -11, tok_unary = -12,
1020
1021   // var definition
1022   tok_var = -13
1023 };
1024
1025 static std::string IdentifierStr;  // Filled in if tok_identifier
1026 static double NumVal;              // Filled in if tok_number
1027
1028 /// gettok - Return the next token from standard input.
1029 static int gettok() {
1030   static int LastChar = ' ';
1031
1032   // Skip any whitespace.
1033   while (isspace(LastChar))
1034     LastChar = getchar();
1035
1036   if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
1037     IdentifierStr = LastChar;
1038     while (isalnum((LastChar = getchar())))
1039       IdentifierStr += LastChar;
1040
1041     if (IdentifierStr == "def") return tok_def;
1042     if (IdentifierStr == "extern") return tok_extern;
1043     if (IdentifierStr == "if") return tok_if;
1044     if (IdentifierStr == "then") return tok_then;
1045     if (IdentifierStr == "else") return tok_else;
1046     if (IdentifierStr == "for") return tok_for;
1047     if (IdentifierStr == "in") return tok_in;
1048     if (IdentifierStr == "binary") return tok_binary;
1049     if (IdentifierStr == "unary") return tok_unary;
1050     if (IdentifierStr == "var") return tok_var;
1051     return tok_identifier;
1052   }
1053
1054   if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
1055     std::string NumStr;
1056     do {
1057       NumStr += LastChar;
1058       LastChar = getchar();
1059     } while (isdigit(LastChar) || LastChar == '.');
1060
1061     NumVal = strtod(NumStr.c_str(), 0);
1062     return tok_number;
1063   }
1064
1065   if (LastChar == '#') {
1066     // Comment until end of line.
1067     do LastChar = getchar();
1068     while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
1069
1070     if (LastChar != EOF)
1071       return gettok();
1072   }
1073
1074   // Check for end of file.  Don't eat the EOF.
1075   if (LastChar == EOF)
1076     return tok_eof;
1077
1078   // Otherwise, just return the character as its ascii value.
1079   int ThisChar = LastChar;
1080   LastChar = getchar();
1081   return ThisChar;
1082 }
1083
1084 //===----------------------------------------------------------------------===//
1085 // Abstract Syntax Tree (aka Parse Tree)
1086 //===----------------------------------------------------------------------===//
1087
1088 /// ExprAST - Base class for all expression nodes.
1089 class ExprAST {
1090 public:
1091   virtual ~ExprAST() {}
1092   virtual Value *Codegen() = 0;
1093 };
1094
1095 /// NumberExprAST - Expression class for numeric literals like "1.0".
1096 class NumberExprAST : public ExprAST {
1097   double Val;
1098 public:
1099   NumberExprAST(double val) : Val(val) {}
1100   virtual Value *Codegen();
1101 };
1102
1103 /// VariableExprAST - Expression class for referencing a variable, like "a".
1104 class VariableExprAST : public ExprAST {
1105   std::string Name;
1106 public:
1107   VariableExprAST(const std::string &amp;name) : Name(name) {}
1108   const std::string &amp;getName() const { return Name; }
1109   virtual Value *Codegen();
1110 };
1111
1112 /// UnaryExprAST - Expression class for a unary operator.
1113 class UnaryExprAST : public ExprAST {
1114   char Opcode;
1115   ExprAST *Operand;
1116 public:
1117   UnaryExprAST(char opcode, ExprAST *operand)
1118     : Opcode(opcode), Operand(operand) {}
1119   virtual Value *Codegen();
1120 };
1121
1122 /// BinaryExprAST - Expression class for a binary operator.
1123 class BinaryExprAST : public ExprAST {
1124   char Op;
1125   ExprAST *LHS, *RHS;
1126 public:
1127   BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1128     : Op(op), LHS(lhs), RHS(rhs) {}
1129   virtual Value *Codegen();
1130 };
1131
1132 /// CallExprAST - Expression class for function calls.
1133 class CallExprAST : public ExprAST {
1134   std::string Callee;
1135   std::vector&lt;ExprAST*&gt; Args;
1136 public:
1137   CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1138     : Callee(callee), Args(args) {}
1139   virtual Value *Codegen();
1140 };
1141
1142 /// IfExprAST - Expression class for if/then/else.
1143 class IfExprAST : public ExprAST {
1144   ExprAST *Cond, *Then, *Else;
1145 public:
1146   IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1147   : Cond(cond), Then(then), Else(_else) {}
1148   virtual Value *Codegen();
1149 };
1150
1151 /// ForExprAST - Expression class for for/in.
1152 class ForExprAST : public ExprAST {
1153   std::string VarName;
1154   ExprAST *Start, *End, *Step, *Body;
1155 public:
1156   ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1157              ExprAST *step, ExprAST *body)
1158     : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1159   virtual Value *Codegen();
1160 };
1161
1162 /// VarExprAST - Expression class for var/in
1163 class VarExprAST : public ExprAST {
1164   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1165   ExprAST *Body;
1166 public:
1167   VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
1168              ExprAST *body)
1169   : VarNames(varnames), Body(body) {}
1170
1171   virtual Value *Codegen();
1172 };
1173
1174 /// PrototypeAST - This class represents the "prototype" for a function,
1175 /// which captures its argument names as well as if it is an operator.
1176 class PrototypeAST {
1177   std::string Name;
1178   std::vector&lt;std::string&gt; Args;
1179   bool isOperator;
1180   unsigned Precedence;  // Precedence if a binary op.
1181 public:
1182   PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args,
1183                bool isoperator = false, unsigned prec = 0)
1184   : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
1185
1186   bool isUnaryOp() const { return isOperator &amp;&amp; Args.size() == 1; }
1187   bool isBinaryOp() const { return isOperator &amp;&amp; Args.size() == 2; }
1188
1189   char getOperatorName() const {
1190     assert(isUnaryOp() || isBinaryOp());
1191     return Name[Name.size()-1];
1192   }
1193
1194   unsigned getBinaryPrecedence() const { return Precedence; }
1195
1196   Function *Codegen();
1197
1198   void CreateArgumentAllocas(Function *F);
1199 };
1200
1201 /// FunctionAST - This class represents a function definition itself.
1202 class FunctionAST {
1203   PrototypeAST *Proto;
1204   ExprAST *Body;
1205 public:
1206   FunctionAST(PrototypeAST *proto, ExprAST *body)
1207     : Proto(proto), Body(body) {}
1208
1209   Function *Codegen();
1210 };
1211
1212 //===----------------------------------------------------------------------===//
1213 // Parser
1214 //===----------------------------------------------------------------------===//
1215
1216 /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
1217 /// token the parser it looking at.  getNextToken reads another token from the
1218 /// lexer and updates CurTok with its results.
1219 static int CurTok;
1220 static int getNextToken() {
1221   return CurTok = gettok();
1222 }
1223
1224 /// BinopPrecedence - This holds the precedence for each binary operator that is
1225 /// defined.
1226 static std::map&lt;char, int&gt; BinopPrecedence;
1227
1228 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
1229 static int GetTokPrecedence() {
1230   if (!isascii(CurTok))
1231     return -1;
1232
1233   // Make sure it's a declared binop.
1234   int TokPrec = BinopPrecedence[CurTok];
1235   if (TokPrec &lt;= 0) return -1;
1236   return TokPrec;
1237 }
1238
1239 /// Error* - These are little helper functions for error handling.
1240 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1241 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1242 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1243
1244 static ExprAST *ParseExpression();
1245
1246 /// identifierexpr
1247 ///   ::= identifer
1248 ///   ::= identifer '(' expression* ')'
1249 static ExprAST *ParseIdentifierExpr() {
1250   std::string IdName = IdentifierStr;
1251
1252   getNextToken();  // eat identifer.
1253
1254   if (CurTok != '(') // Simple variable ref.
1255     return new VariableExprAST(IdName);
1256
1257   // Call.
1258   getNextToken();  // eat (
1259   std::vector&lt;ExprAST*&gt; Args;
1260   if (CurTok != ')') {
1261     while (1) {
1262       ExprAST *Arg = ParseExpression();
1263       if (!Arg) return 0;
1264       Args.push_back(Arg);
1265
1266       if (CurTok == ')') break;
1267
1268       if (CurTok != ',')
1269         return Error("Expected ')'");
1270       getNextToken();
1271     }
1272   }
1273
1274   // Eat the ')'.
1275   getNextToken();
1276
1277   return new CallExprAST(IdName, Args);
1278 }
1279
1280 /// numberexpr ::= number
1281 static ExprAST *ParseNumberExpr() {
1282   ExprAST *Result = new NumberExprAST(NumVal);
1283   getNextToken(); // consume the number
1284   return Result;
1285 }
1286
1287 /// parenexpr ::= '(' expression ')'
1288 static ExprAST *ParseParenExpr() {
1289   getNextToken();  // eat (.
1290   ExprAST *V = ParseExpression();
1291   if (!V) return 0;
1292
1293   if (CurTok != ')')
1294     return Error("expected ')'");
1295   getNextToken();  // eat ).
1296   return V;
1297 }
1298
1299 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
1300 static ExprAST *ParseIfExpr() {
1301   getNextToken();  // eat the if.
1302
1303   // condition.
1304   ExprAST *Cond = ParseExpression();
1305   if (!Cond) return 0;
1306
1307   if (CurTok != tok_then)
1308     return Error("expected then");
1309   getNextToken();  // eat the then
1310
1311   ExprAST *Then = ParseExpression();
1312   if (Then == 0) return 0;
1313
1314   if (CurTok != tok_else)
1315     return Error("expected else");
1316
1317   getNextToken();
1318
1319   ExprAST *Else = ParseExpression();
1320   if (!Else) return 0;
1321
1322   return new IfExprAST(Cond, Then, Else);
1323 }
1324
1325 /// forexpr ::= 'for' identifer '=' expr ',' expr (',' expr)? 'in' expression
1326 static ExprAST *ParseForExpr() {
1327   getNextToken();  // eat the for.
1328
1329   if (CurTok != tok_identifier)
1330     return Error("expected identifier after for");
1331
1332   std::string IdName = IdentifierStr;
1333   getNextToken();  // eat identifer.
1334
1335   if (CurTok != '=')
1336     return Error("expected '=' after for");
1337   getNextToken();  // eat '='.
1338
1339
1340   ExprAST *Start = ParseExpression();
1341   if (Start == 0) return 0;
1342   if (CurTok != ',')
1343     return Error("expected ',' after for start value");
1344   getNextToken();
1345
1346   ExprAST *End = ParseExpression();
1347   if (End == 0) return 0;
1348
1349   // The step value is optional.
1350   ExprAST *Step = 0;
1351   if (CurTok == ',') {
1352     getNextToken();
1353     Step = ParseExpression();
1354     if (Step == 0) return 0;
1355   }
1356
1357   if (CurTok != tok_in)
1358     return Error("expected 'in' after for");
1359   getNextToken();  // eat 'in'.
1360
1361   ExprAST *Body = ParseExpression();
1362   if (Body == 0) return 0;
1363
1364   return new ForExprAST(IdName, Start, End, Step, Body);
1365 }
1366
1367 /// varexpr ::= 'var' identifer ('=' expression)?
1368 //                    (',' identifer ('=' expression)?)* 'in' expression
1369 static ExprAST *ParseVarExpr() {
1370   getNextToken();  // eat the var.
1371
1372   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1373
1374   // At least one variable name is required.
1375   if (CurTok != tok_identifier)
1376     return Error("expected identifier after var");
1377
1378   while (1) {
1379     std::string Name = IdentifierStr;
1380     getNextToken();  // eat identifer.
1381
1382     // Read the optional initializer.
1383     ExprAST *Init = 0;
1384     if (CurTok == '=') {
1385       getNextToken(); // eat the '='.
1386
1387       Init = ParseExpression();
1388       if (Init == 0) return 0;
1389     }
1390
1391     VarNames.push_back(std::make_pair(Name, Init));
1392
1393     // End of var list, exit loop.
1394     if (CurTok != ',') break;
1395     getNextToken(); // eat the ','.
1396
1397     if (CurTok != tok_identifier)
1398       return Error("expected identifier list after var");
1399   }
1400
1401   // At this point, we have to have 'in'.
1402   if (CurTok != tok_in)
1403     return Error("expected 'in' keyword after 'var'");
1404   getNextToken();  // eat 'in'.
1405
1406   ExprAST *Body = ParseExpression();
1407   if (Body == 0) return 0;
1408
1409   return new VarExprAST(VarNames, Body);
1410 }
1411
1412
1413 /// primary
1414 ///   ::= identifierexpr
1415 ///   ::= numberexpr
1416 ///   ::= parenexpr
1417 ///   ::= ifexpr
1418 ///   ::= forexpr
1419 ///   ::= varexpr
1420 static ExprAST *ParsePrimary() {
1421   switch (CurTok) {
1422   default: return Error("unknown token when expecting an expression");
1423   case tok_identifier: return ParseIdentifierExpr();
1424   case tok_number:     return ParseNumberExpr();
1425   case '(':            return ParseParenExpr();
1426   case tok_if:         return ParseIfExpr();
1427   case tok_for:        return ParseForExpr();
1428   case tok_var:        return ParseVarExpr();
1429   }
1430 }
1431
1432 /// unary
1433 ///   ::= primary
1434 ///   ::= '!' unary
1435 static ExprAST *ParseUnary() {
1436   // If the current token is not an operator, it must be a primary expr.
1437   if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1438     return ParsePrimary();
1439
1440   // If this is a unary operator, read it.
1441   int Opc = CurTok;
1442   getNextToken();
1443   if (ExprAST *Operand = ParseUnary())
1444     return new UnaryExprAST(Opc, Operand);
1445   return 0;
1446 }
1447
1448 /// binoprhs
1449 ///   ::= ('+' unary)*
1450 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1451   // If this is a binop, find its precedence.
1452   while (1) {
1453     int TokPrec = GetTokPrecedence();
1454
1455     // If this is a binop that binds at least as tightly as the current binop,
1456     // consume it, otherwise we are done.
1457     if (TokPrec &lt; ExprPrec)
1458       return LHS;
1459
1460     // Okay, we know this is a binop.
1461     int BinOp = CurTok;
1462     getNextToken();  // eat binop
1463
1464     // Parse the unary expression after the binary operator.
1465     ExprAST *RHS = ParseUnary();
1466     if (!RHS) return 0;
1467
1468     // If BinOp binds less tightly with RHS than the operator after RHS, let
1469     // the pending operator take RHS as its LHS.
1470     int NextPrec = GetTokPrecedence();
1471     if (TokPrec &lt; NextPrec) {
1472       RHS = ParseBinOpRHS(TokPrec+1, RHS);
1473       if (RHS == 0) return 0;
1474     }
1475
1476     // Merge LHS/RHS.
1477     LHS = new BinaryExprAST(BinOp, LHS, RHS);
1478   }
1479 }
1480
1481 /// expression
1482 ///   ::= unary binoprhs
1483 ///
1484 static ExprAST *ParseExpression() {
1485   ExprAST *LHS = ParseUnary();
1486   if (!LHS) return 0;
1487
1488   return ParseBinOpRHS(0, LHS);
1489 }
1490
1491 /// prototype
1492 ///   ::= id '(' id* ')'
1493 ///   ::= binary LETTER number? (id, id)
1494 ///   ::= unary LETTER (id)
1495 static PrototypeAST *ParsePrototype() {
1496   std::string FnName;
1497
1498   int Kind = 0;  // 0 = identifier, 1 = unary, 2 = binary.
1499   unsigned BinaryPrecedence = 30;
1500
1501   switch (CurTok) {
1502   default:
1503     return ErrorP("Expected function name in prototype");
1504   case tok_identifier:
1505     FnName = IdentifierStr;
1506     Kind = 0;
1507     getNextToken();
1508     break;
1509   case tok_unary:
1510     getNextToken();
1511     if (!isascii(CurTok))
1512       return ErrorP("Expected unary operator");
1513     FnName = "unary";
1514     FnName += (char)CurTok;
1515     Kind = 1;
1516     getNextToken();
1517     break;
1518   case tok_binary:
1519     getNextToken();
1520     if (!isascii(CurTok))
1521       return ErrorP("Expected binary operator");
1522     FnName = "binary";
1523     FnName += (char)CurTok;
1524     Kind = 2;
1525     getNextToken();
1526
1527     // Read the precedence if present.
1528     if (CurTok == tok_number) {
1529       if (NumVal &lt; 1 || NumVal &gt; 100)
1530         return ErrorP("Invalid precedecnce: must be 1..100");
1531       BinaryPrecedence = (unsigned)NumVal;
1532       getNextToken();
1533     }
1534     break;
1535   }
1536
1537   if (CurTok != '(')
1538     return ErrorP("Expected '(' in prototype");
1539
1540   std::vector&lt;std::string&gt; ArgNames;
1541   while (getNextToken() == tok_identifier)
1542     ArgNames.push_back(IdentifierStr);
1543   if (CurTok != ')')
1544     return ErrorP("Expected ')' in prototype");
1545
1546   // success.
1547   getNextToken();  // eat ')'.
1548
1549   // Verify right number of names for operator.
1550   if (Kind &amp;&amp; ArgNames.size() != Kind)
1551     return ErrorP("Invalid number of operands for operator");
1552
1553   return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1554 }
1555
1556 /// definition ::= 'def' prototype expression
1557 static FunctionAST *ParseDefinition() {
1558   getNextToken();  // eat def.
1559   PrototypeAST *Proto = ParsePrototype();
1560   if (Proto == 0) return 0;
1561
1562   if (ExprAST *E = ParseExpression())
1563     return new FunctionAST(Proto, E);
1564   return 0;
1565 }
1566
1567 /// toplevelexpr ::= expression
1568 static FunctionAST *ParseTopLevelExpr() {
1569   if (ExprAST *E = ParseExpression()) {
1570     // Make an anonymous proto.
1571     PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1572     return new FunctionAST(Proto, E);
1573   }
1574   return 0;
1575 }
1576
1577 /// external ::= 'extern' prototype
1578 static PrototypeAST *ParseExtern() {
1579   getNextToken();  // eat extern.
1580   return ParsePrototype();
1581 }
1582
1583 //===----------------------------------------------------------------------===//
1584 // Code Generation
1585 //===----------------------------------------------------------------------===//
1586
1587 static Module *TheModule;
1588 static LLVMFoldingBuilder Builder;
1589 static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
1590 static FunctionPassManager *TheFPM;
1591
1592 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1593
1594 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
1595 /// the function.  This is used for mutable variables etc.
1596 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
1597                                           const std::string &amp;VarName) {
1598   LLVMBuilder TmpB(&amp;TheFunction-&gt;getEntryBlock(),
1599                    TheFunction-&gt;getEntryBlock().begin());
1600   return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
1601 }
1602
1603
1604 Value *NumberExprAST::Codegen() {
1605   return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1606 }
1607
1608 Value *VariableExprAST::Codegen() {
1609   // Look this variable up in the function.
1610   Value *V = NamedValues[Name];
1611   if (V == 0) return ErrorV("Unknown variable name");
1612
1613   // Load the value.
1614   return Builder.CreateLoad(V, Name.c_str());
1615 }
1616
1617 Value *UnaryExprAST::Codegen() {
1618   Value *OperandV = Operand-&gt;Codegen();
1619   if (OperandV == 0) return 0;
1620
1621   Function *F = TheModule-&gt;getFunction(std::string("unary")+Opcode);
1622   if (F == 0)
1623     return ErrorV("Unknown unary operator");
1624
1625   return Builder.CreateCall(F, OperandV, "unop");
1626 }
1627
1628
1629 Value *BinaryExprAST::Codegen() {
1630   // Special case '=' because we don't want to emit the LHS as an expression.
1631   if (Op == '=') {
1632     // Assignment requires the LHS to be an identifier.
1633     VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
1634     if (!LHSE)
1635       return ErrorV("destination of '=' must be a variable");
1636     // Codegen the RHS.
1637     Value *Val = RHS-&gt;Codegen();
1638     if (Val == 0) return 0;
1639
1640     // Look up the name.
1641     Value *Variable = NamedValues[LHSE-&gt;getName()];
1642     if (Variable == 0) return ErrorV("Unknown variable name");
1643
1644     Builder.CreateStore(Val, Variable);
1645     return Val;
1646   }
1647
1648
1649   Value *L = LHS-&gt;Codegen();
1650   Value *R = RHS-&gt;Codegen();
1651   if (L == 0 || R == 0) return 0;
1652
1653   switch (Op) {
1654   case '+': return Builder.CreateAdd(L, R, "addtmp");
1655   case '-': return Builder.CreateSub(L, R, "subtmp");
1656   case '*': return Builder.CreateMul(L, R, "multmp");
1657   case '&lt;':
1658     L = Builder.CreateFCmpULT(L, R, "multmp");
1659     // Convert bool 0/1 to double 0.0 or 1.0
1660     return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1661   default: break;
1662   }
1663
1664   // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1665   // a call to it.
1666   Function *F = TheModule-&gt;getFunction(std::string("binary")+Op);
1667   assert(F &amp;&amp; "binary operator not found!");
1668
1669   Value *Ops[] = { L, R };
1670   return Builder.CreateCall(F, Ops, Ops+2, "binop");
1671 }
1672
1673 Value *CallExprAST::Codegen() {
1674   // Look up the name in the global module table.
1675   Function *CalleeF = TheModule-&gt;getFunction(Callee);
1676   if (CalleeF == 0)
1677     return ErrorV("Unknown function referenced");
1678
1679   // If argument mismatch error.
1680   if (CalleeF-&gt;arg_size() != Args.size())
1681     return ErrorV("Incorrect # arguments passed");
1682
1683   std::vector&lt;Value*&gt; ArgsV;
1684   for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1685     ArgsV.push_back(Args[i]-&gt;Codegen());
1686     if (ArgsV.back() == 0) return 0;
1687   }
1688
1689   return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1690 }
1691
1692 Value *IfExprAST::Codegen() {
1693   Value *CondV = Cond-&gt;Codegen();
1694   if (CondV == 0) return 0;
1695
1696   // Convert condition to a bool by comparing equal to 0.0.
1697   CondV = Builder.CreateFCmpONE(CondV,
1698                                 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1699                                 "ifcond");
1700
1701   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1702
1703   // Create blocks for the then and else cases.  Insert the 'then' block at the
1704   // end of the function.
1705   BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1706   BasicBlock *ElseBB = new BasicBlock("else");
1707   BasicBlock *MergeBB = new BasicBlock("ifcont");
1708
1709   Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1710
1711   // Emit then value.
1712   Builder.SetInsertPoint(ThenBB);
1713
1714   Value *ThenV = Then-&gt;Codegen();
1715   if (ThenV == 0) return 0;
1716
1717   Builder.CreateBr(MergeBB);
1718   // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1719   ThenBB = Builder.GetInsertBlock();
1720
1721   // Emit else block.
1722   TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1723   Builder.SetInsertPoint(ElseBB);
1724
1725   Value *ElseV = Else-&gt;Codegen();
1726   if (ElseV == 0) return 0;
1727
1728   Builder.CreateBr(MergeBB);
1729   // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1730   ElseBB = Builder.GetInsertBlock();
1731
1732   // Emit merge block.
1733   TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1734   Builder.SetInsertPoint(MergeBB);
1735   PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1736
1737   PN-&gt;addIncoming(ThenV, ThenBB);
1738   PN-&gt;addIncoming(ElseV, ElseBB);
1739   return PN;
1740 }
1741
1742 Value *ForExprAST::Codegen() {
1743   // Output this as:
1744   //   var = alloca double
1745   //   ...
1746   //   start = startexpr
1747   //   store start -&gt; var
1748   //   goto loop
1749   // loop:
1750   //   ...
1751   //   bodyexpr
1752   //   ...
1753   // loopend:
1754   //   step = stepexpr
1755   //   endcond = endexpr
1756   //
1757   //   curvar = load var
1758   //   nextvar = curvar + step
1759   //   store nextvar -&gt; var
1760   //   br endcond, loop, endloop
1761   // outloop:
1762
1763   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1764
1765   // Create an alloca for the variable in the entry block.
1766   AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1767
1768   // Emit the start code first, without 'variable' in scope.
1769   Value *StartVal = Start-&gt;Codegen();
1770   if (StartVal == 0) return 0;
1771
1772   // Store the value into the alloca.
1773   Builder.CreateStore(StartVal, Alloca);
1774
1775   // Make the new basic block for the loop header, inserting after current
1776   // block.
1777   BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1778   BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1779
1780   // Insert an explicit fall through from the current block to the LoopBB.
1781   Builder.CreateBr(LoopBB);
1782
1783   // Start insertion in LoopBB.
1784   Builder.SetInsertPoint(LoopBB);
1785
1786   // Within the loop, the variable is defined equal to the PHI node.  If it
1787   // shadows an existing variable, we have to restore it, so save it now.
1788   AllocaInst *OldVal = NamedValues[VarName];
1789   NamedValues[VarName] = Alloca;
1790
1791   // Emit the body of the loop.  This, like any other expr, can change the
1792   // current BB.  Note that we ignore the value computed by the body, but don't
1793   // allow an error.
1794   if (Body-&gt;Codegen() == 0)
1795     return 0;
1796
1797   // Emit the step value.
1798   Value *StepVal;
1799   if (Step) {
1800     StepVal = Step-&gt;Codegen();
1801     if (StepVal == 0) return 0;
1802   } else {
1803     // If not specified, use 1.0.
1804     StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1805   }
1806
1807   // Compute the end condition.
1808   Value *EndCond = End-&gt;Codegen();
1809   if (EndCond == 0) return EndCond;
1810
1811   // Reload, increment, and restore the alloca.  This handles the case where
1812   // the body of the loop mutates the variable.
1813   Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
1814   Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
1815   Builder.CreateStore(NextVar, Alloca);
1816
1817   // Convert condition to a bool by comparing equal to 0.0.
1818   EndCond = Builder.CreateFCmpONE(EndCond,
1819                                   ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1820                                   "loopcond");
1821
1822   // Create the "after loop" block and insert it.
1823   BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1824   BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1825
1826   // Insert the conditional branch into the end of LoopEndBB.
1827   Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1828
1829   // Any new code will be inserted in AfterBB.
1830   Builder.SetInsertPoint(AfterBB);
1831
1832   // Restore the unshadowed variable.
1833   if (OldVal)
1834     NamedValues[VarName] = OldVal;
1835   else
1836     NamedValues.erase(VarName);
1837
1838
1839   // for expr always returns 0.0.
1840   return Constant::getNullValue(Type::DoubleTy);
1841 }
1842
1843 Value *VarExprAST::Codegen() {
1844   std::vector&lt;AllocaInst *&gt; OldBindings;
1845
1846   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1847
1848   // Register all variables and emit their initializer.
1849   for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
1850     const std::string &amp;VarName = VarNames[i].first;
1851     ExprAST *Init = VarNames[i].second;
1852
1853     // Emit the initializer before adding the variable to scope, this prevents
1854     // the initializer from referencing the variable itself, and permits stuff
1855     // like this:
1856     //  var a = 1 in
1857     //    var a = a in ...   # refers to outer 'a'.
1858     Value *InitVal;
1859     if (Init) {
1860       InitVal = Init-&gt;Codegen();
1861       if (InitVal == 0) return 0;
1862     } else { // If not specified, use 0.0.
1863       InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
1864     }
1865
1866     AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1867     Builder.CreateStore(InitVal, Alloca);
1868
1869     // Remember the old variable binding so that we can restore the binding when
1870     // we unrecurse.
1871     OldBindings.push_back(NamedValues[VarName]);
1872
1873     // Remember this binding.
1874     NamedValues[VarName] = Alloca;
1875   }
1876
1877   // Codegen the body, now that all vars are in scope.
1878   Value *BodyVal = Body-&gt;Codegen();
1879   if (BodyVal == 0) return 0;
1880
1881   // Pop all our variables from scope.
1882   for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
1883     NamedValues[VarNames[i].first] = OldBindings[i];
1884
1885   // Return the body computation.
1886   return BodyVal;
1887 }
1888
1889
1890 Function *PrototypeAST::Codegen() {
1891   // Make the function type:  double(double,double) etc.
1892   std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1893   FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1894
1895   Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1896
1897   // If F conflicted, there was already something named 'Name'.  If it has a
1898   // body, don't allow redefinition or reextern.
1899   if (F-&gt;getName() != Name) {
1900     // Delete the one we just made and get the existing one.
1901     F-&gt;eraseFromParent();
1902     F = TheModule-&gt;getFunction(Name);
1903
1904     // If F already has a body, reject this.
1905     if (!F-&gt;empty()) {
1906       ErrorF("redefinition of function");
1907       return 0;
1908     }
1909
1910     // If F took a different number of args, reject.
1911     if (F-&gt;arg_size() != Args.size()) {
1912       ErrorF("redefinition of function with different # args");
1913       return 0;
1914     }
1915   }
1916
1917   // Set names for all arguments.
1918   unsigned Idx = 0;
1919   for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1920        ++AI, ++Idx)
1921     AI-&gt;setName(Args[Idx]);
1922
1923   return F;
1924 }
1925
1926 /// CreateArgumentAllocas - Create an alloca for each argument and register the
1927 /// argument in the symbol table so that references to it will succeed.
1928 void PrototypeAST::CreateArgumentAllocas(Function *F) {
1929   Function::arg_iterator AI = F-&gt;arg_begin();
1930   for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
1931     // Create an alloca for this variable.
1932     AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
1933
1934     // Store the initial value into the alloca.
1935     Builder.CreateStore(AI, Alloca);
1936
1937     // Add arguments to variable symbol table.
1938     NamedValues[Args[Idx]] = Alloca;
1939   }
1940 }
1941
1942
1943 Function *FunctionAST::Codegen() {
1944   NamedValues.clear();
1945
1946   Function *TheFunction = Proto-&gt;Codegen();
1947   if (TheFunction == 0)
1948     return 0;
1949
1950   // If this is an operator, install it.
1951   if (Proto-&gt;isBinaryOp())
1952     BinopPrecedence[Proto-&gt;getOperatorName()] = Proto-&gt;getBinaryPrecedence();
1953
1954   // Create a new basic block to start insertion into.
1955   BasicBlock *BB = new BasicBlock("entry", TheFunction);
1956   Builder.SetInsertPoint(BB);
1957
1958   // Add all arguments to the symbol table and create their allocas.
1959   Proto-&gt;CreateArgumentAllocas(TheFunction);
1960
1961   if (Value *RetVal = Body-&gt;Codegen()) {
1962     // Finish off the function.
1963     Builder.CreateRet(RetVal);
1964
1965     // Validate the generated code, checking for consistency.
1966     verifyFunction(*TheFunction);
1967
1968     // Optimize the function.
1969     TheFPM-&gt;run(*TheFunction);
1970
1971     return TheFunction;
1972   }
1973
1974   // Error reading body, remove function.
1975   TheFunction-&gt;eraseFromParent();
1976
1977   if (Proto-&gt;isBinaryOp())
1978     BinopPrecedence.erase(Proto-&gt;getOperatorName());
1979   return 0;
1980 }
1981
1982 //===----------------------------------------------------------------------===//
1983 // Top-Level parsing and JIT Driver
1984 //===----------------------------------------------------------------------===//
1985
1986 static ExecutionEngine *TheExecutionEngine;
1987
1988 static void HandleDefinition() {
1989   if (FunctionAST *F = ParseDefinition()) {
1990     if (Function *LF = F-&gt;Codegen()) {
1991       fprintf(stderr, "Read function definition:");
1992       LF-&gt;dump();
1993     }
1994   } else {
1995     // Skip token for error recovery.
1996     getNextToken();
1997   }
1998 }
1999
2000 static void HandleExtern() {
2001   if (PrototypeAST *P = ParseExtern()) {
2002     if (Function *F = P-&gt;Codegen()) {
2003       fprintf(stderr, "Read extern: ");
2004       F-&gt;dump();
2005     }
2006   } else {
2007     // Skip token for error recovery.
2008     getNextToken();
2009   }
2010 }
2011
2012 static void HandleTopLevelExpression() {
2013   // Evaluate a top level expression into an anonymous function.
2014   if (FunctionAST *F = ParseTopLevelExpr()) {
2015     if (Function *LF = F-&gt;Codegen()) {
2016       // JIT the function, returning a function pointer.
2017       void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
2018
2019       // Cast it to the right type (takes no arguments, returns a double) so we
2020       // can call it as a native function.
2021       double (*FP)() = (double (*)())FPtr;
2022       fprintf(stderr, "Evaluated to %f\n", FP());
2023     }
2024   } else {
2025     // Skip token for error recovery.
2026     getNextToken();
2027   }
2028 }
2029
2030 /// top ::= definition | external | expression | ';'
2031 static void MainLoop() {
2032   while (1) {
2033     fprintf(stderr, "ready&gt; ");
2034     switch (CurTok) {
2035     case tok_eof:    return;
2036     case ';':        getNextToken(); break;  // ignore top level semicolons.
2037     case tok_def:    HandleDefinition(); break;
2038     case tok_extern: HandleExtern(); break;
2039     default:         HandleTopLevelExpression(); break;
2040     }
2041   }
2042 }
2043
2044
2045
2046 //===----------------------------------------------------------------------===//
2047 // "Library" functions that can be "extern'd" from user code.
2048 //===----------------------------------------------------------------------===//
2049
2050 /// putchard - putchar that takes a double and returns 0.
2051 extern "C"
2052 double putchard(double X) {
2053   putchar((char)X);
2054   return 0;
2055 }
2056
2057 /// printd - printf that takes a double prints it as "%f\n", returning 0.
2058 extern "C"
2059 double printd(double X) {
2060   printf("%f\n", X);
2061   return 0;
2062 }
2063
2064 //===----------------------------------------------------------------------===//
2065 // Main driver code.
2066 //===----------------------------------------------------------------------===//
2067
2068 int main() {
2069   // Install standard binary operators.
2070   // 1 is lowest precedence.
2071   BinopPrecedence['='] = 2;
2072   BinopPrecedence['&lt;'] = 10;
2073   BinopPrecedence['+'] = 20;
2074   BinopPrecedence['-'] = 20;
2075   BinopPrecedence['*'] = 40;  // highest.
2076
2077   // Prime the first token.
2078   fprintf(stderr, "ready&gt; ");
2079   getNextToken();
2080
2081   // Make the module, which holds all the code.
2082   TheModule = new Module("my cool jit");
2083
2084   // Create the JIT.
2085   TheExecutionEngine = ExecutionEngine::create(TheModule);
2086
2087   {
2088     ExistingModuleProvider OurModuleProvider(TheModule);
2089     FunctionPassManager OurFPM(&amp;OurModuleProvider);
2090
2091     // Set up the optimizer pipeline.  Start with registering info about how the
2092     // target lays out data structures.
2093     OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
2094     // Promote allocas to registers.
2095     OurFPM.add(createPromoteMemoryToRegisterPass());
2096     // Do simple "peephole" optimizations and bit-twiddling optzns.
2097     OurFPM.add(createInstructionCombiningPass());
2098     // Reassociate expressions.
2099     OurFPM.add(createReassociatePass());
2100     // Eliminate Common SubExpressions.
2101     OurFPM.add(createGVNPass());
2102     // Simplify the control flow graph (deleting unreachable blocks, etc).
2103     OurFPM.add(createCFGSimplificationPass());
2104
2105     // Set the global so the code gen can use this.
2106     TheFPM = &amp;OurFPM;
2107
2108     // Run the main "interpreter loop" now.
2109     MainLoop();
2110
2111     TheFPM = 0;
2112   }  // Free module provider and pass manager.
2113
2114
2115   // Print out all of the generated code.
2116   TheModule-&gt;dump();
2117   return 0;
2118 }
2119 </pre>
2120 </div>
2121
2122 </div>
2123
2124 <!-- *********************************************************************** -->
2125 <hr>
2126 <address>
2127   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2128   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
2129   <a href="http://validator.w3.org/check/referer"><img
2130   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
2131
2132   <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
2133   <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
2134   Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
2135 </address>
2136 </body>
2137 </html>