1 ========================================
2 Kaleidoscope: Code generation to LLVM IR
3 ========================================
8 Written by `Chris Lattner <mailto:sabre@nondot.org>`_
10 Chapter 3 Introduction
11 ======================
13 Welcome to Chapter 3 of the "`Implementing a language with
14 LLVM <index.html>`_" tutorial. This chapter shows you how to transform
15 the `Abstract Syntax Tree <LangImpl2.html>`_, built in Chapter 2, into
16 LLVM IR. This will teach you a little bit about how LLVM does things, as
17 well as demonstrate how easy it is to use. It's much more work to build
18 a lexer and parser than it is to generate LLVM IR code. :)
20 **Please note**: the code in this chapter and later require LLVM 2.2 or
21 later. LLVM 2.1 and before will not work with it. Also note that you
22 need to use a version of this tutorial that matches your LLVM release:
23 If you are using an official LLVM release, use the version of the
24 documentation included with your release or on the `llvm.org releases
25 page <http://llvm.org/releases/>`_.
30 In order to generate LLVM IR, we want some simple setup to get started.
31 First we define virtual code generation (codegen) methods in each AST
36 /// ExprAST - Base class for all expression nodes.
40 virtual Value *Codegen() = 0;
43 /// NumberExprAST - Expression class for numeric literals like "1.0".
44 class NumberExprAST : public ExprAST {
47 NumberExprAST(double val) : Val(val) {}
48 virtual Value *Codegen();
52 The Codegen() method says to emit IR for that AST node along with all
53 the things it depends on, and they all return an LLVM Value object.
54 "Value" is the class used to represent a "`Static Single Assignment
55 (SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
56 register" or "SSA value" in LLVM. The most distinct aspect of SSA values
57 is that their value is computed as the related instruction executes, and
58 it does not get a new value until (and if) the instruction re-executes.
59 In other words, there is no way to "change" an SSA value. For more
60 information, please read up on `Static Single
61 Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
62 - the concepts are really quite natural once you grok them.
64 Note that instead of adding virtual methods to the ExprAST class
65 hierarchy, it could also make sense to use a `visitor
66 pattern <http://en.wikipedia.org/wiki/Visitor_pattern>`_ or some other
67 way to model this. Again, this tutorial won't dwell on good software
68 engineering practices: for our purposes, adding a virtual method is
71 The second thing we want is an "Error" method like we used for the
72 parser, which will be used to report errors found during code generation
73 (for example, use of an undeclared parameter):
77 Value *ErrorV(const char *Str) { Error(Str); return 0; }
79 static Module *TheModule;
80 static IRBuilder<> Builder(getGlobalContext());
81 static std::map<std::string, Value*> NamedValues;
83 The static variables will be used during code generation. ``TheModule``
84 is the LLVM construct that contains all of the functions and global
85 variables in a chunk of code. In many ways, it is the top-level
86 structure that the LLVM IR uses to contain code.
88 The ``Builder`` object is a helper object that makes it easy to generate
89 LLVM instructions. Instances of the
90 ```IRBuilder`` <http://llvm.org/doxygen/IRBuilder_8h-source.html>`_
91 class template keep track of the current place to insert instructions
92 and has methods to create new instructions.
94 The ``NamedValues`` map keeps track of which values are defined in the
95 current scope and what their LLVM representation is. (In other words, it
96 is a symbol table for the code). In this form of Kaleidoscope, the only
97 things that can be referenced are function parameters. As such, function
98 parameters will be in this map when generating code for their function
101 With these basics in place, we can start talking about how to generate
102 code for each expression. Note that this assumes that the ``Builder``
103 has been set up to generate code *into* something. For now, we'll assume
104 that this has already been done, and we'll just use it to emit code.
106 Expression Code Generation
107 ==========================
109 Generating LLVM code for expression nodes is very straightforward: less
110 than 45 lines of commented code for all four of our expression nodes.
111 First we'll do numeric literals:
115 Value *NumberExprAST::Codegen() {
116 return ConstantFP::get(getGlobalContext(), APFloat(Val));
119 In the LLVM IR, numeric constants are represented with the
120 ``ConstantFP`` class, which holds the numeric value in an ``APFloat``
121 internally (``APFloat`` has the capability of holding floating point
122 constants of Arbitrary Precision). This code basically just creates
123 and returns a ``ConstantFP``. Note that in the LLVM IR that constants
124 are all uniqued together and shared. For this reason, the API uses the
125 "foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)".
129 Value *VariableExprAST::Codegen() {
130 // Look this variable up in the function.
131 Value *V = NamedValues[Name];
132 return V ? V : ErrorV("Unknown variable name");
135 References to variables are also quite simple using LLVM. In the simple
136 version of Kaleidoscope, we assume that the variable has already been
137 emitted somewhere and its value is available. In practice, the only
138 values that can be in the ``NamedValues`` map are function arguments.
139 This code simply checks to see that the specified name is in the map (if
140 not, an unknown variable is being referenced) and returns the value for
141 it. In future chapters, we'll add support for `loop induction
142 variables <LangImpl5.html#for>`_ in the symbol table, and for `local
143 variables <LangImpl7.html#localvars>`_.
147 Value *BinaryExprAST::Codegen() {
148 Value *L = LHS->Codegen();
149 Value *R = RHS->Codegen();
150 if (L == 0 || R == 0) return 0;
153 case '+': return Builder.CreateFAdd(L, R, "addtmp");
154 case '-': return Builder.CreateFSub(L, R, "subtmp");
155 case '*': return Builder.CreateFMul(L, R, "multmp");
157 L = Builder.CreateFCmpULT(L, R, "cmptmp");
158 // Convert bool 0/1 to double 0.0 or 1.0
159 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
161 default: return ErrorV("invalid binary operator");
165 Binary operators start to get more interesting. The basic idea here is
166 that we recursively emit code for the left-hand side of the expression,
167 then the right-hand side, then we compute the result of the binary
168 expression. In this code, we do a simple switch on the opcode to create
169 the right LLVM instruction.
171 In the example above, the LLVM builder class is starting to show its
172 value. IRBuilder knows where to insert the newly created instruction,
173 all you have to do is specify what instruction to create (e.g. with
174 ``CreateFAdd``), which operands to use (``L`` and ``R`` here) and
175 optionally provide a name for the generated instruction.
177 One nice thing about LLVM is that the name is just a hint. For instance,
178 if the code above emits multiple "addtmp" variables, LLVM will
179 automatically provide each one with an increasing, unique numeric
180 suffix. Local value names for instructions are purely optional, but it
181 makes it much easier to read the IR dumps.
183 `LLVM instructions <../LangRef.html#instref>`_ are constrained by strict
184 rules: for example, the Left and Right operators of an `add
185 instruction <../LangRef.html#i_add>`_ must have the same type, and the
186 result type of the add must match the operand types. Because all values
187 in Kaleidoscope are doubles, this makes for very simple code for add,
190 On the other hand, LLVM specifies that the `fcmp
191 instruction <../LangRef.html#i_fcmp>`_ always returns an 'i1' value (a
192 one bit integer). The problem with this is that Kaleidoscope wants the
193 value to be a 0.0 or 1.0 value. In order to get these semantics, we
194 combine the fcmp instruction with a `uitofp
195 instruction <../LangRef.html#i_uitofp>`_. This instruction converts its
196 input integer into a floating point value by treating the input as an
197 unsigned value. In contrast, if we used the `sitofp
198 instruction <../LangRef.html#i_sitofp>`_, the Kaleidoscope '<' operator
199 would return 0.0 and -1.0, depending on the input value.
203 Value *CallExprAST::Codegen() {
204 // Look up the name in the global module table.
205 Function *CalleeF = TheModule->getFunction(Callee);
207 return ErrorV("Unknown function referenced");
209 // If argument mismatch error.
210 if (CalleeF->arg_size() != Args.size())
211 return ErrorV("Incorrect # arguments passed");
213 std::vector<Value*> ArgsV;
214 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
215 ArgsV.push_back(Args[i]->Codegen());
216 if (ArgsV.back() == 0) return 0;
219 return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
222 Code generation for function calls is quite straightforward with LLVM.
223 The code above initially does a function name lookup in the LLVM
224 Module's symbol table. Recall that the LLVM Module is the container that
225 holds all of the functions we are JIT'ing. By giving each function the
226 same name as what the user specifies, we can use the LLVM symbol table
227 to resolve function names for us.
229 Once we have the function to call, we recursively codegen each argument
230 that is to be passed in, and create an LLVM `call
231 instruction <../LangRef.html#i_call>`_. Note that LLVM uses the native C
232 calling conventions by default, allowing these calls to also call into
233 standard library functions like "sin" and "cos", with no additional
236 This wraps up our handling of the four basic expressions that we have so
237 far in Kaleidoscope. Feel free to go in and add some more. For example,
238 by browsing the `LLVM language reference <../LangRef.html>`_ you'll find
239 several other interesting instructions that are really easy to plug into
242 Function Code Generation
243 ========================
245 Code generation for prototypes and functions must handle a number of
246 details, which make their code less beautiful than expression code
247 generation, but allows us to illustrate some important points. First,
248 lets talk about code generation for prototypes: they are used both for
249 function bodies and external function declarations. The code starts
254 Function *PrototypeAST::Codegen() {
255 // Make the function type: double(double,double) etc.
256 std::vector<Type*> Doubles(Args.size(),
257 Type::getDoubleTy(getGlobalContext()));
258 FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
261 Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
263 This code packs a lot of power into a few lines. Note first that this
264 function returns a "Function\*" instead of a "Value\*". Because a
265 "prototype" really talks about the external interface for a function
266 (not the value computed by an expression), it makes sense for it to
267 return the LLVM Function it corresponds to when codegen'd.
269 The call to ``FunctionType::get`` creates the ``FunctionType`` that
270 should be used for a given Prototype. Since all function arguments in
271 Kaleidoscope are of type double, the first line creates a vector of "N"
272 LLVM double types. It then uses the ``Functiontype::get`` method to
273 create a function type that takes "N" doubles as arguments, returns one
274 double as a result, and that is not vararg (the false parameter
275 indicates this). Note that Types in LLVM are uniqued just like Constants
276 are, so you don't "new" a type, you "get" it.
278 The final line above actually creates the function that the prototype
279 will correspond to. This indicates the type, linkage and name to use, as
280 well as which module to insert into. "`external
281 linkage <../LangRef.html#linkage>`_" means that the function may be
282 defined outside the current module and/or that it is callable by
283 functions outside the module. The Name passed in is the name the user
284 specified: since "``TheModule``" is specified, this name is registered
285 in "``TheModule``"s symbol table, which is used by the function call
290 // If F conflicted, there was already something named 'Name'. If it has a
291 // body, don't allow redefinition or reextern.
292 if (F->getName() != Name) {
293 // Delete the one we just made and get the existing one.
294 F->eraseFromParent();
295 F = TheModule->getFunction(Name);
297 The Module symbol table works just like the Function symbol table when
298 it comes to name conflicts: if a new function is created with a name
299 that was previously added to the symbol table, the new function will get
300 implicitly renamed when added to the Module. The code above exploits
301 this fact to determine if there was a previous definition of this
304 In Kaleidoscope, I choose to allow redefinitions of functions in two
305 cases: first, we want to allow 'extern'ing a function more than once, as
306 long as the prototypes for the externs match (since all arguments have
307 the same type, we just have to check that the number of arguments
308 match). Second, we want to allow 'extern'ing a function and then
309 defining a body for it. This is useful when defining mutually recursive
312 In order to implement this, the code above first checks to see if there
313 is a collision on the name of the function. If so, it deletes the
314 function we just created (by calling ``eraseFromParent``) and then
315 calling ``getFunction`` to get the existing function with the specified
316 name. Note that many APIs in LLVM have "erase" forms and "remove" forms.
317 The "remove" form unlinks the object from its parent (e.g. a Function
318 from a Module) and returns it. The "erase" form unlinks the object and
323 // If F already has a body, reject this.
325 ErrorF("redefinition of function");
329 // If F took a different number of args, reject.
330 if (F->arg_size() != Args.size()) {
331 ErrorF("redefinition of function with different # args");
336 In order to verify the logic above, we first check to see if the
337 pre-existing function is "empty". In this case, empty means that it has
338 no basic blocks in it, which means it has no body. If it has no body, it
339 is a forward declaration. Since we don't allow anything after a full
340 definition of the function, the code rejects this case. If the previous
341 reference to a function was an 'extern', we simply verify that the
342 number of arguments for that definition and this one match up. If not,
347 // Set names for all arguments.
349 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
351 AI->setName(Args[Idx]);
353 // Add arguments to variable symbol table.
354 NamedValues[Args[Idx]] = AI;
359 The last bit of code for prototypes loops over all of the arguments in
360 the function, setting the name of the LLVM Argument objects to match,
361 and registering the arguments in the ``NamedValues`` map for future use
362 by the ``VariableExprAST`` AST node. Once this is set up, it returns the
363 Function object to the caller. Note that we don't check for conflicting
364 argument names here (e.g. "extern foo(a b a)"). Doing so would be very
365 straight-forward with the mechanics we have already used above.
369 Function *FunctionAST::Codegen() {
372 Function *TheFunction = Proto->Codegen();
373 if (TheFunction == 0)
376 Code generation for function definitions starts out simply enough: we
377 just codegen the prototype (Proto) and verify that it is ok. We then
378 clear out the ``NamedValues`` map to make sure that there isn't anything
379 in it from the last function we compiled. Code generation of the
380 prototype ensures that there is an LLVM Function object that is ready to
385 // Create a new basic block to start insertion into.
386 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
387 Builder.SetInsertPoint(BB);
389 if (Value *RetVal = Body->Codegen()) {
391 Now we get to the point where the ``Builder`` is set up. The first line
392 creates a new `basic block <http://en.wikipedia.org/wiki/Basic_block>`_
393 (named "entry"), which is inserted into ``TheFunction``. The second line
394 then tells the builder that new instructions should be inserted into the
395 end of the new basic block. Basic blocks in LLVM are an important part
396 of functions that define the `Control Flow
397 Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we
398 don't have any control flow, our functions will only contain one block
399 at this point. We'll fix this in `Chapter 5 <LangImpl5.html>`_ :).
403 if (Value *RetVal = Body->Codegen()) {
404 // Finish off the function.
405 Builder.CreateRet(RetVal);
407 // Validate the generated code, checking for consistency.
408 verifyFunction(*TheFunction);
413 Once the insertion point is set up, we call the ``CodeGen()`` method for
414 the root expression of the function. If no error happens, this emits
415 code to compute the expression into the entry block and returns the
416 value that was computed. Assuming no error, we then create an LLVM `ret
417 instruction <../LangRef.html#i_ret>`_, which completes the function.
418 Once the function is built, we call ``verifyFunction``, which is
419 provided by LLVM. This function does a variety of consistency checks on
420 the generated code, to determine if our compiler is doing everything
421 right. Using this is important: it can catch a lot of bugs. Once the
422 function is finished and validated, we return it.
426 // Error reading body, remove function.
427 TheFunction->eraseFromParent();
431 The only piece left here is handling of the error case. For simplicity,
432 we handle this by merely deleting the function we produced with the
433 ``eraseFromParent`` method. This allows the user to redefine a function
434 that they incorrectly typed in before: if we didn't delete it, it would
435 live in the symbol table, with a body, preventing future redefinition.
437 This code does have a bug, though. Since the ``PrototypeAST::Codegen``
438 can return a previously defined forward declaration, our code can
439 actually delete a forward declaration. There are a number of ways to fix
440 this bug, see what you can come up with! Here is a testcase:
444 extern foo(a b); # ok, defines foo.
445 def foo(a b) c; # error, 'c' is invalid.
446 def bar() foo(1, 2); # error, unknown function "foo"
448 Driver Changes and Closing Thoughts
449 ===================================
451 For now, code generation to LLVM doesn't really get us much, except that
452 we can look at the pretty IR calls. The sample code inserts calls to
453 Codegen into the "``HandleDefinition``", "``HandleExtern``" etc
454 functions, and then dumps out the LLVM IR. This gives a nice way to look
455 at the LLVM IR for simple functions. For example:
460 Read top-level expression:
463 ret double 9.000000e+00
466 Note how the parser turns the top-level expression into anonymous
467 functions for us. This will be handy when we add `JIT
468 support <LangImpl4.html#jit>`_ in the next chapter. Also note that the
469 code is very literally transcribed, no optimizations are being performed
470 except simple constant folding done by IRBuilder. We will `add
471 optimizations <LangImpl4.html#trivialconstfold>`_ explicitly in the next
476 ready> def foo(a b) a*a + 2*a*b + b*b;
477 Read function definition:
478 define double @foo(double %a, double %b) {
480 %multmp = fmul double %a, %a
481 %multmp1 = fmul double 2.000000e+00, %a
482 %multmp2 = fmul double %multmp1, %b
483 %addtmp = fadd double %multmp, %multmp2
484 %multmp3 = fmul double %b, %b
485 %addtmp4 = fadd double %addtmp, %multmp3
489 This shows some simple arithmetic. Notice the striking similarity to the
490 LLVM builder calls that we use to create the instructions.
494 ready> def bar(a) foo(a, 4.0) + bar(31337);
495 Read function definition:
496 define double @bar(double %a) {
498 %calltmp = call double @foo(double %a, double 4.000000e+00)
499 %calltmp1 = call double @bar(double 3.133700e+04)
500 %addtmp = fadd double %calltmp, %calltmp1
504 This shows some function calls. Note that this function will take a long
505 time to execute if you call it. In the future we'll add conditional
506 control flow to actually make recursion useful :).
510 ready> extern cos(x);
512 declare double @cos(double)
515 Read top-level expression:
518 %calltmp = call double @cos(double 1.234000e+00)
522 This shows an extern for the libm "cos" function, and a call to it.
524 .. TODO:: Abandon Pygments' horrible `llvm` lexer. It just totally gives up
525 on highlighting this due to the first line.
530 ; ModuleID = 'my cool jit'
534 %addtmp = fadd double 4.000000e+00, 5.000000e+00
538 define double @foo(double %a, double %b) {
540 %multmp = fmul double %a, %a
541 %multmp1 = fmul double 2.000000e+00, %a
542 %multmp2 = fmul double %multmp1, %b
543 %addtmp = fadd double %multmp, %multmp2
544 %multmp3 = fmul double %b, %b
545 %addtmp4 = fadd double %addtmp, %multmp3
549 define double @bar(double %a) {
551 %calltmp = call double @foo(double %a, double 4.000000e+00)
552 %calltmp1 = call double @bar(double 3.133700e+04)
553 %addtmp = fadd double %calltmp, %calltmp1
557 declare double @cos(double)
561 %calltmp = call double @cos(double 1.234000e+00)
565 When you quit the current demo, it dumps out the IR for the entire
566 module generated. Here you can see the big picture with all the
567 functions referencing each other.
569 This wraps up the third chapter of the Kaleidoscope tutorial. Up next,
570 we'll describe how to `add JIT codegen and optimizer
571 support <LangImpl4.html>`_ to this so we can actually start running
577 Here is the complete code listing for our running example, enhanced with
578 the LLVM code generator. Because this uses the LLVM libraries, we need
579 to link them in. To do this, we use the
580 `llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform
581 our makefile/command line about which options to use:
586 clang++ -g -O3 toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy
595 // See example below.
597 #include "llvm/DerivedTypes.h"
598 #include "llvm/IRBuilder.h"
599 #include "llvm/LLVMContext.h"
600 #include "llvm/Module.h"
601 #include "llvm/Analysis/Verifier.h"
606 using namespace llvm;
608 //===----------------------------------------------------------------------===//
610 //===----------------------------------------------------------------------===//
612 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
613 // of these for known things.
618 tok_def = -2, tok_extern = -3,
621 tok_identifier = -4, tok_number = -5
624 static std::string IdentifierStr; // Filled in if tok_identifier
625 static double NumVal; // Filled in if tok_number
627 /// gettok - Return the next token from standard input.
628 static int gettok() {
629 static int LastChar = ' ';
631 // Skip any whitespace.
632 while (isspace(LastChar))
633 LastChar = getchar();
635 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
636 IdentifierStr = LastChar;
637 while (isalnum((LastChar = getchar())))
638 IdentifierStr += LastChar;
640 if (IdentifierStr == "def") return tok_def;
641 if (IdentifierStr == "extern") return tok_extern;
642 return tok_identifier;
645 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
649 LastChar = getchar();
650 } while (isdigit(LastChar) || LastChar == '.');
652 NumVal = strtod(NumStr.c_str(), 0);
656 if (LastChar == '#') {
657 // Comment until end of line.
658 do LastChar = getchar();
659 while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
665 // Check for end of file. Don't eat the EOF.
669 // Otherwise, just return the character as its ascii value.
670 int ThisChar = LastChar;
671 LastChar = getchar();
675 //===----------------------------------------------------------------------===//
676 // Abstract Syntax Tree (aka Parse Tree)
677 //===----------------------------------------------------------------------===//
679 /// ExprAST - Base class for all expression nodes.
682 virtual ~ExprAST() {}
683 virtual Value *Codegen() = 0;
686 /// NumberExprAST - Expression class for numeric literals like "1.0".
687 class NumberExprAST : public ExprAST {
690 NumberExprAST(double val) : Val(val) {}
691 virtual Value *Codegen();
694 /// VariableExprAST - Expression class for referencing a variable, like "a".
695 class VariableExprAST : public ExprAST {
698 VariableExprAST(const std::string &name) : Name(name) {}
699 virtual Value *Codegen();
702 /// BinaryExprAST - Expression class for a binary operator.
703 class BinaryExprAST : public ExprAST {
707 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
708 : Op(op), LHS(lhs), RHS(rhs) {}
709 virtual Value *Codegen();
712 /// CallExprAST - Expression class for function calls.
713 class CallExprAST : public ExprAST {
715 std::vector<ExprAST*> Args;
717 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
718 : Callee(callee), Args(args) {}
719 virtual Value *Codegen();
722 /// PrototypeAST - This class represents the "prototype" for a function,
723 /// which captures its name, and its argument names (thus implicitly the number
724 /// of arguments the function takes).
727 std::vector<std::string> Args;
729 PrototypeAST(const std::string &name, const std::vector<std::string> &args)
730 : Name(name), Args(args) {}
735 /// FunctionAST - This class represents a function definition itself.
740 FunctionAST(PrototypeAST *proto, ExprAST *body)
741 : Proto(proto), Body(body) {}
746 //===----------------------------------------------------------------------===//
748 //===----------------------------------------------------------------------===//
750 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
751 /// token the parser is looking at. getNextToken reads another token from the
752 /// lexer and updates CurTok with its results.
754 static int getNextToken() {
755 return CurTok = gettok();
758 /// BinopPrecedence - This holds the precedence for each binary operator that is
760 static std::map<char, int> BinopPrecedence;
762 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
763 static int GetTokPrecedence() {
764 if (!isascii(CurTok))
767 // Make sure it's a declared binop.
768 int TokPrec = BinopPrecedence[CurTok];
769 if (TokPrec <= 0) return -1;
773 /// Error* - These are little helper functions for error handling.
774 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
775 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
776 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
778 static ExprAST *ParseExpression();
782 /// ::= identifier '(' expression* ')'
783 static ExprAST *ParseIdentifierExpr() {
784 std::string IdName = IdentifierStr;
786 getNextToken(); // eat identifier.
788 if (CurTok != '(') // Simple variable ref.
789 return new VariableExprAST(IdName);
792 getNextToken(); // eat (
793 std::vector<ExprAST*> Args;
796 ExprAST *Arg = ParseExpression();
800 if (CurTok == ')') break;
803 return Error("Expected ')' or ',' in argument list");
811 return new CallExprAST(IdName, Args);
814 /// numberexpr ::= number
815 static ExprAST *ParseNumberExpr() {
816 ExprAST *Result = new NumberExprAST(NumVal);
817 getNextToken(); // consume the number
821 /// parenexpr ::= '(' expression ')'
822 static ExprAST *ParseParenExpr() {
823 getNextToken(); // eat (.
824 ExprAST *V = ParseExpression();
828 return Error("expected ')'");
829 getNextToken(); // eat ).
834 /// ::= identifierexpr
837 static ExprAST *ParsePrimary() {
839 default: return Error("unknown token when expecting an expression");
840 case tok_identifier: return ParseIdentifierExpr();
841 case tok_number: return ParseNumberExpr();
842 case '(': return ParseParenExpr();
847 /// ::= ('+' primary)*
848 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
849 // If this is a binop, find its precedence.
851 int TokPrec = GetTokPrecedence();
853 // If this is a binop that binds at least as tightly as the current binop,
854 // consume it, otherwise we are done.
855 if (TokPrec < ExprPrec)
858 // Okay, we know this is a binop.
860 getNextToken(); // eat binop
862 // Parse the primary expression after the binary operator.
863 ExprAST *RHS = ParsePrimary();
866 // If BinOp binds less tightly with RHS than the operator after RHS, let
867 // the pending operator take RHS as its LHS.
868 int NextPrec = GetTokPrecedence();
869 if (TokPrec < NextPrec) {
870 RHS = ParseBinOpRHS(TokPrec+1, RHS);
871 if (RHS == 0) return 0;
875 LHS = new BinaryExprAST(BinOp, LHS, RHS);
880 /// ::= primary binoprhs
882 static ExprAST *ParseExpression() {
883 ExprAST *LHS = ParsePrimary();
886 return ParseBinOpRHS(0, LHS);
890 /// ::= id '(' id* ')'
891 static PrototypeAST *ParsePrototype() {
892 if (CurTok != tok_identifier)
893 return ErrorP("Expected function name in prototype");
895 std::string FnName = IdentifierStr;
899 return ErrorP("Expected '(' in prototype");
901 std::vector<std::string> ArgNames;
902 while (getNextToken() == tok_identifier)
903 ArgNames.push_back(IdentifierStr);
905 return ErrorP("Expected ')' in prototype");
908 getNextToken(); // eat ')'.
910 return new PrototypeAST(FnName, ArgNames);
913 /// definition ::= 'def' prototype expression
914 static FunctionAST *ParseDefinition() {
915 getNextToken(); // eat def.
916 PrototypeAST *Proto = ParsePrototype();
917 if (Proto == 0) return 0;
919 if (ExprAST *E = ParseExpression())
920 return new FunctionAST(Proto, E);
924 /// toplevelexpr ::= expression
925 static FunctionAST *ParseTopLevelExpr() {
926 if (ExprAST *E = ParseExpression()) {
927 // Make an anonymous proto.
928 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
929 return new FunctionAST(Proto, E);
934 /// external ::= 'extern' prototype
935 static PrototypeAST *ParseExtern() {
936 getNextToken(); // eat extern.
937 return ParsePrototype();
940 //===----------------------------------------------------------------------===//
942 //===----------------------------------------------------------------------===//
944 static Module *TheModule;
945 static IRBuilder<> Builder(getGlobalContext());
946 static std::map<std::string, Value*> NamedValues;
948 Value *ErrorV(const char *Str) { Error(Str); return 0; }
950 Value *NumberExprAST::Codegen() {
951 return ConstantFP::get(getGlobalContext(), APFloat(Val));
954 Value *VariableExprAST::Codegen() {
955 // Look this variable up in the function.
956 Value *V = NamedValues[Name];
957 return V ? V : ErrorV("Unknown variable name");
960 Value *BinaryExprAST::Codegen() {
961 Value *L = LHS->Codegen();
962 Value *R = RHS->Codegen();
963 if (L == 0 || R == 0) return 0;
966 case '+': return Builder.CreateFAdd(L, R, "addtmp");
967 case '-': return Builder.CreateFSub(L, R, "subtmp");
968 case '*': return Builder.CreateFMul(L, R, "multmp");
970 L = Builder.CreateFCmpULT(L, R, "cmptmp");
971 // Convert bool 0/1 to double 0.0 or 1.0
972 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
974 default: return ErrorV("invalid binary operator");
978 Value *CallExprAST::Codegen() {
979 // Look up the name in the global module table.
980 Function *CalleeF = TheModule->getFunction(Callee);
982 return ErrorV("Unknown function referenced");
984 // If argument mismatch error.
985 if (CalleeF->arg_size() != Args.size())
986 return ErrorV("Incorrect # arguments passed");
988 std::vector<Value*> ArgsV;
989 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
990 ArgsV.push_back(Args[i]->Codegen());
991 if (ArgsV.back() == 0) return 0;
994 return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
997 Function *PrototypeAST::Codegen() {
998 // Make the function type: double(double,double) etc.
999 std::vector<Type*> Doubles(Args.size(),
1000 Type::getDoubleTy(getGlobalContext()));
1001 FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
1004 Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
1006 // If F conflicted, there was already something named 'Name'. If it has a
1007 // body, don't allow redefinition or reextern.
1008 if (F->getName() != Name) {
1009 // Delete the one we just made and get the existing one.
1010 F->eraseFromParent();
1011 F = TheModule->getFunction(Name);
1013 // If F already has a body, reject this.
1015 ErrorF("redefinition of function");
1019 // If F took a different number of args, reject.
1020 if (F->arg_size() != Args.size()) {
1021 ErrorF("redefinition of function with different # args");
1026 // Set names for all arguments.
1028 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
1030 AI->setName(Args[Idx]);
1032 // Add arguments to variable symbol table.
1033 NamedValues[Args[Idx]] = AI;
1039 Function *FunctionAST::Codegen() {
1040 NamedValues.clear();
1042 Function *TheFunction = Proto->Codegen();
1043 if (TheFunction == 0)
1046 // Create a new basic block to start insertion into.
1047 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
1048 Builder.SetInsertPoint(BB);
1050 if (Value *RetVal = Body->Codegen()) {
1051 // Finish off the function.
1052 Builder.CreateRet(RetVal);
1054 // Validate the generated code, checking for consistency.
1055 verifyFunction(*TheFunction);
1060 // Error reading body, remove function.
1061 TheFunction->eraseFromParent();
1065 //===----------------------------------------------------------------------===//
1066 // Top-Level parsing and JIT Driver
1067 //===----------------------------------------------------------------------===//
1069 static void HandleDefinition() {
1070 if (FunctionAST *F = ParseDefinition()) {
1071 if (Function *LF = F->Codegen()) {
1072 fprintf(stderr, "Read function definition:");
1076 // Skip token for error recovery.
1081 static void HandleExtern() {
1082 if (PrototypeAST *P = ParseExtern()) {
1083 if (Function *F = P->Codegen()) {
1084 fprintf(stderr, "Read extern: ");
1088 // Skip token for error recovery.
1093 static void HandleTopLevelExpression() {
1094 // Evaluate a top-level expression into an anonymous function.
1095 if (FunctionAST *F = ParseTopLevelExpr()) {
1096 if (Function *LF = F->Codegen()) {
1097 fprintf(stderr, "Read top-level expression:");
1101 // Skip token for error recovery.
1106 /// top ::= definition | external | expression | ';'
1107 static void MainLoop() {
1109 fprintf(stderr, "ready> ");
1111 case tok_eof: return;
1112 case ';': getNextToken(); break; // ignore top-level semicolons.
1113 case tok_def: HandleDefinition(); break;
1114 case tok_extern: HandleExtern(); break;
1115 default: HandleTopLevelExpression(); break;
1120 //===----------------------------------------------------------------------===//
1121 // "Library" functions that can be "extern'd" from user code.
1122 //===----------------------------------------------------------------------===//
1124 /// putchard - putchar that takes a double and returns 0.
1126 double putchard(double X) {
1131 //===----------------------------------------------------------------------===//
1132 // Main driver code.
1133 //===----------------------------------------------------------------------===//
1136 LLVMContext &Context = getGlobalContext();
1138 // Install standard binary operators.
1139 // 1 is lowest precedence.
1140 BinopPrecedence['<'] = 10;
1141 BinopPrecedence['+'] = 20;
1142 BinopPrecedence['-'] = 20;
1143 BinopPrecedence['*'] = 40; // highest.
1145 // Prime the first token.
1146 fprintf(stderr, "ready> ");
1149 // Make the module, which holds all the code.
1150 TheModule = new Module("my cool jit", Context);
1152 // Run the main "interpreter loop" now.
1155 // Print out all of the generated code.
1161 `Next: Adding JIT and Optimizer Support <LangImpl4.html>`_