1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
6 <title>Kaleidoscope: Implementing code generation to LLVM IR</title>
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <meta name="author" content="Chris Lattner">
9 <link rel="stylesheet" href="../llvm.css" type="text/css">
14 <div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div>
16 <div class="doc_author">
17 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
20 <!-- *********************************************************************** -->
21 <div class="doc_section"><a name="intro">Part 3 Introduction</a></div>
22 <!-- *********************************************************************** -->
24 <div class="doc_text">
26 <p>Welcome to part 3 of the "<a href="index.html">Implementing a language with
27 LLVM</a>" tutorial. This chapter shows you how to transform the <a
28 href="LangImpl2.html">Abstract Syntax Tree built in Chapter 2</a> into LLVM IR.
29 This will teach you a little bit about how LLVM does things, as well as
30 demonstrate how easy it is to use. It's much more work to build a lexer and
31 parser than it is to generate LLVM IR code.
36 <!-- *********************************************************************** -->
37 <div class="doc_section"><a name="basics">Code Generation setup</a></div>
38 <!-- *********************************************************************** -->
40 <div class="doc_text">
43 In order to generate LLVM IR, we want some simple setup to get started. First,
44 we define virtual codegen methods in each AST class:</p>
46 <div class="doc_code">
48 /// ExprAST - Base class for all expression nodes.
52 virtual Value *Codegen() = 0;
55 /// NumberExprAST - Expression class for numeric literals like "1.0".
56 class NumberExprAST : public ExprAST {
59 explicit NumberExprAST(double val) : Val(val) {}
60 virtual Value *Codegen();
66 <p>The Codegen() method says to emit IR for that AST node and all things it
67 depends on, and they all return an LLVM Value object.
68 "Value" is the class used to represent a "<a
69 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
70 Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect
71 of SSA values is that their value is computed as the related instruction
72 executes, and it does not get a new value until (and if) the instruction
73 re-executes. In order words, there is no way to "change" an SSA value. For
74 more information, please read up on <a
75 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
76 Assignment</a> - the concepts are really quite natural once you grok them.</p>
79 second thing we want is an "Error" method like we used for parser, which will
80 be used to report errors found during code generation (for example, use of an
81 undeclared parameter):</p>
83 <div class="doc_code">
85 Value *ErrorV(const char *Str) { Error(Str); return 0; }
87 static Module *TheModule;
88 static LLVMBuilder Builder;
89 static std::map<std::string, Value*> NamedValues;
93 <p>The static variables will be used during code generation. <tt>TheModule</tt>
94 is the LLVM construct that contains all of the functions and global variables in
95 a chunk of code. In many ways, it is the top-level structure that the LLVM IR
96 uses to contain code.</p>
98 <p>The <tt>Builder</tt> object is a helper object that makes it easy to generate
99 LLVM instructions. The <tt>Builder</tt> keeps track of the current place to
100 insert instructions and has methods to create new instructions.</p>
102 <p>The <tt>NamedValues</tt> map keeps track of which values are defined in the
103 current scope and what their LLVM representation is. In this form of
104 Kaleidoscope, the only things that can be referenced are function parameters.
105 As such, function parameters will be in this map when generating code for their
109 With these basics in place, we can start talking about how to generate code for
110 each expression. Note that this assumes that the <tt>Builder</tt> has been set
111 up to generate code <em>into</em> something. For now, we'll assume that this
112 has already been done, and we'll just use it to emit code.
117 <!-- *********************************************************************** -->
118 <div class="doc_section"><a name="exprs">Expression Code Generation</a></div>
119 <!-- *********************************************************************** -->
121 <div class="doc_text">
123 <p>Generating LLVM code for expression nodes is very straight-forward: less
124 than 45 lines of commented code for all four of our expression nodes. First,
125 we'll do numeric literals:</p>
127 <div class="doc_code">
129 Value *NumberExprAST::Codegen() {
130 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
135 <p>In the LLVM IR, numeric constants are represented with the ConstantFP class,
136 which holds the numeric value in an APFloat internally (APFloat has the
137 capability of holding floating point constants of arbitrary precision). This
138 code basically just creates and returns a ConstantFP. Note that in the LLVM IR
139 that constants are all uniqued together and shared. For this reason, the API
140 uses "the foo::get(...)" idiom instead of a "create" method or "new foo".</p>
142 <div class="doc_code">
144 Value *VariableExprAST::Codegen() {
145 // Look this variable up in the function.
146 Value *V = NamedValues[Name];
147 return V ? V : ErrorV("Unknown variable name");
152 <p>References to variables is also quite simple here. In our system, we assume
153 that the variable has already been emited somewhere and its value is available.
154 In practice, the only values in the NamedValues map will be arguments. This
155 code simply checks to see that the specified name is in the map (if not, an
156 unknown variable is being referenced) and returns the value for it.</p>
158 <div class="doc_code">
160 Value *BinaryExprAST::Codegen() {
161 Value *L = LHS->Codegen();
162 Value *R = RHS->Codegen();
163 if (L == 0 || R == 0) return 0;
166 case '+': return Builder.CreateAdd(L, R, "addtmp");
167 case '-': return Builder.CreateSub(L, R, "subtmp");
168 case '*': return Builder.CreateMul(L, R, "multmp");
170 L = Builder.CreateFCmpULT(L, R, "multmp");
171 // Convert bool 0/1 to double 0.0 or 1.0
172 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
173 default: return ErrorV("invalid binary operator");
181 <div class="doc_code">
183 Value *CallExprAST::Codegen() {
184 // Look up the name in the global module table.
185 Function *CalleeF = TheModule->getFunction(Callee);
187 return ErrorV("Unknown function referenced");
189 // If argument mismatch error.
190 if (CalleeF->arg_size() != Args.size())
191 return ErrorV("Incorrect # arguments passed");
193 std::vector<Value*> ArgsV;
194 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
195 ArgsV.push_back(Args[i]->Codegen());
196 if (ArgsV.back() == 0) return 0;
199 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
208 <!-- *********************************************************************** -->
209 <div class="doc_section"><a name="code">Conclusions and the Full Code</a></div>
210 <!-- *********************************************************************** -->
212 <div class="doc_text">
214 <div class="doc_code">
217 // g++ -g toy.cpp `llvm-config --cppflags` `llvm-config --ldflags` \
218 // `llvm-config --libs core` -I ~/llvm/include/
220 // See example below.
222 #include "llvm/DerivedTypes.h"
223 #include "llvm/Module.h"
224 #include "llvm/Support/LLVMBuilder.h"
225 #include <cstdio>
226 #include <string>
228 #include <vector>
229 using namespace llvm;
231 //===----------------------------------------------------------------------===//
233 //===----------------------------------------------------------------------===//
235 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
236 // of these for known things.
241 tok_def = -2, tok_extern = -3,
244 tok_identifier = -4, tok_number = -5,
247 static std::string IdentifierStr; // Filled in if tok_identifier
248 static double NumVal; // Filled in if tok_number
250 /// gettok - Return the next token from standard input.
251 static int gettok() {
252 static int LastChar = ' ';
254 // Skip any whitespace.
255 while (isspace(LastChar))
256 LastChar = getchar();
258 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
259 IdentifierStr = LastChar;
260 while (isalnum((LastChar = getchar())))
261 IdentifierStr += LastChar;
263 if (IdentifierStr == "def") return tok_def;
264 if (IdentifierStr == "extern") return tok_extern;
265 return tok_identifier;
268 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
272 LastChar = getchar();
273 } while (isdigit(LastChar) || LastChar == '.');
275 NumVal = strtod(NumStr.c_str(), 0);
279 if (LastChar == '#') {
280 // Comment until end of line.
281 do LastChar = getchar();
282 while (LastChar != EOF && LastChar != '\n' & LastChar != '\r');
288 // Check for end of file. Don't eat the EOF.
292 // Otherwise, just return the character as its ascii value.
293 int ThisChar = LastChar;
294 LastChar = getchar();
298 //===----------------------------------------------------------------------===//
299 // Abstract Syntax Tree (aka Parse Tree)
300 //===----------------------------------------------------------------------===//
302 /// ExprAST - Base class for all expression nodes.
305 virtual ~ExprAST() {}
306 virtual Value *Codegen() = 0;
309 /// NumberExprAST - Expression class for numeric literals like "1.0".
310 class NumberExprAST : public ExprAST {
313 explicit NumberExprAST(double val) : Val(val) {}
314 virtual Value *Codegen();
317 /// VariableExprAST - Expression class for referencing a variable, like "a".
318 class VariableExprAST : public ExprAST {
321 explicit VariableExprAST(const std::string &name) : Name(name) {}
322 virtual Value *Codegen();
325 /// BinaryExprAST - Expression class for a binary operator.
326 class BinaryExprAST : public ExprAST {
330 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
331 : Op(op), LHS(lhs), RHS(rhs) {}
332 virtual Value *Codegen();
335 /// CallExprAST - Expression class for function calls.
336 class CallExprAST : public ExprAST {
338 std::vector<ExprAST*> Args;
340 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
341 : Callee(callee), Args(args) {}
342 virtual Value *Codegen();
345 /// PrototypeAST - This class represents the "prototype" for a function,
346 /// which captures its argument names as well as if it is an operator.
349 std::vector<std::string> Args;
351 PrototypeAST(const std::string &name, const std::vector<std::string> &args)
352 : Name(name), Args(args) {}
357 /// FunctionAST - This class represents a function definition itself.
362 FunctionAST(PrototypeAST *proto, ExprAST *body)
363 : Proto(proto), Body(body) {}
368 //===----------------------------------------------------------------------===//
370 //===----------------------------------------------------------------------===//
372 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
373 /// token the parser it looking at. getNextToken reads another token from the
374 /// lexer and updates CurTok with its results.
376 static int getNextToken() {
377 return CurTok = gettok();
380 /// BinopPrecedence - This holds the precedence for each binary operator that is
382 static std::map<char, int> BinopPrecedence;
384 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
385 static int GetTokPrecedence() {
386 if (!isascii(CurTok))
389 // Make sure it's a declared binop.
390 int TokPrec = BinopPrecedence[CurTok];
391 if (TokPrec <= 0) return -1;
395 /// Error* - These are little helper functions for error handling.
396 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
397 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
398 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
400 static ExprAST *ParseExpression();
404 /// ::= identifer '(' expression* ')'
405 static ExprAST *ParseIdentifierExpr() {
406 std::string IdName = IdentifierStr;
408 getNextToken(); // eat identifer.
410 if (CurTok != '(') // Simple variable ref.
411 return new VariableExprAST(IdName);
414 getNextToken(); // eat (
415 std::vector<ExprAST*> Args;
417 ExprAST *Arg = ParseExpression();
421 if (CurTok == ')') break;
424 return Error("Expected ')'");
431 return new CallExprAST(IdName, Args);
434 /// numberexpr ::= number
435 static ExprAST *ParseNumberExpr() {
436 ExprAST *Result = new NumberExprAST(NumVal);
437 getNextToken(); // consume the number
441 /// parenexpr ::= '(' expression ')'
442 static ExprAST *ParseParenExpr() {
443 getNextToken(); // eat (.
444 ExprAST *V = ParseExpression();
448 return Error("expected ')'");
449 getNextToken(); // eat ).
454 /// ::= identifierexpr
457 static ExprAST *ParsePrimary() {
459 default: return Error("unknown token when expecting an expression");
460 case tok_identifier: return ParseIdentifierExpr();
461 case tok_number: return ParseNumberExpr();
462 case '(': return ParseParenExpr();
467 /// ::= ('+' primary)*
468 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
469 // If this is a binop, find its precedence.
471 int TokPrec = GetTokPrecedence();
473 // If this is a binop that binds at least as tightly as the current binop,
474 // consume it, otherwise we are done.
475 if (TokPrec < ExprPrec)
478 // Okay, we know this is a binop.
480 getNextToken(); // eat binop
482 // Parse the primary expression after the binary operator.
483 ExprAST *RHS = ParsePrimary();
486 // If BinOp binds less tightly with RHS than the operator after RHS, let
487 // the pending operator take RHS as its LHS.
488 int NextPrec = GetTokPrecedence();
489 if (TokPrec < NextPrec) {
490 RHS = ParseBinOpRHS(TokPrec+1, RHS);
491 if (RHS == 0) return 0;
495 LHS = new BinaryExprAST(BinOp, LHS, RHS);
500 /// ::= primary binoprhs
502 static ExprAST *ParseExpression() {
503 ExprAST *LHS = ParsePrimary();
506 return ParseBinOpRHS(0, LHS);
510 /// ::= id '(' id* ')'
511 static PrototypeAST *ParsePrototype() {
512 if (CurTok != tok_identifier)
513 return ErrorP("Expected function name in prototype");
515 std::string FnName = IdentifierStr;
519 return ErrorP("Expected '(' in prototype");
521 std::vector<std::string> ArgNames;
522 while (getNextToken() == tok_identifier)
523 ArgNames.push_back(IdentifierStr);
525 return ErrorP("Expected ')' in prototype");
528 getNextToken(); // eat ')'.
530 return new PrototypeAST(FnName, ArgNames);
533 /// definition ::= 'def' prototype expression
534 static FunctionAST *ParseDefinition() {
535 getNextToken(); // eat def.
536 PrototypeAST *Proto = ParsePrototype();
537 if (Proto == 0) return 0;
539 if (ExprAST *E = ParseExpression())
540 return new FunctionAST(Proto, E);
544 /// toplevelexpr ::= expression
545 static FunctionAST *ParseTopLevelExpr() {
546 if (ExprAST *E = ParseExpression()) {
547 // Make an anonymous proto.
548 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
549 return new FunctionAST(Proto, E);
554 /// external ::= 'extern' prototype
555 static PrototypeAST *ParseExtern() {
556 getNextToken(); // eat extern.
557 return ParsePrototype();
560 //===----------------------------------------------------------------------===//
562 //===----------------------------------------------------------------------===//
564 static Module *TheModule;
565 static LLVMBuilder Builder;
566 static std::map<std::string, Value*> NamedValues;
568 Value *ErrorV(const char *Str) { Error(Str); return 0; }
570 Value *NumberExprAST::Codegen() {
571 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
574 Value *VariableExprAST::Codegen() {
575 // Look this variable up in the function.
576 Value *V = NamedValues[Name];
577 return V ? V : ErrorV("Unknown variable name");
580 Value *BinaryExprAST::Codegen() {
581 Value *L = LHS->Codegen();
582 Value *R = RHS->Codegen();
583 if (L == 0 || R == 0) return 0;
586 case '+': return Builder.CreateAdd(L, R, "addtmp");
587 case '-': return Builder.CreateSub(L, R, "subtmp");
588 case '*': return Builder.CreateMul(L, R, "multmp");
590 L = Builder.CreateFCmpULT(L, R, "multmp");
591 // Convert bool 0/1 to double 0.0 or 1.0
592 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
593 default: return ErrorV("invalid binary operator");
597 Value *CallExprAST::Codegen() {
598 // Look up the name in the global module table.
599 Function *CalleeF = TheModule->getFunction(Callee);
601 return ErrorV("Unknown function referenced");
603 // If argument mismatch error.
604 if (CalleeF->arg_size() != Args.size())
605 return ErrorV("Incorrect # arguments passed");
607 std::vector<Value*> ArgsV;
608 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
609 ArgsV.push_back(Args[i]->Codegen());
610 if (ArgsV.back() == 0) return 0;
613 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
616 Function *PrototypeAST::Codegen() {
617 // Make the function type: double(double,double) etc.
619 FunctionType::get(Type::DoubleTy, std::vector<const Type*>(Args.size(),
623 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
625 // If F conflicted, there was already something named 'Name'. If it has a
626 // body, don't allow redefinition or reextern.
627 if (F->getName() != Name) {
628 // Delete the one we just made and get the existing one.
629 F->eraseFromParent();
630 F = TheModule->getFunction(Name);
632 // If F already has a body, reject this.
633 if (!F->empty()) {
634 ErrorF("redefinition of function");
638 // If F took a different number of args, reject.
639 if (F->arg_size() != Args.size()) {
640 ErrorF("redefinition of function with different # args");
645 // Set names for all arguments.
647 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
649 AI->setName(Args[Idx]);
651 // Add arguments to variable symbol table.
652 NamedValues[Args[Idx]] = AI;
658 Function *FunctionAST::Codegen() {
661 Function *TheFunction = Proto->Codegen();
662 if (TheFunction == 0)
665 // Create a new basic block to start insertion into.
666 Builder.SetInsertPoint(new BasicBlock("entry", TheFunction));
668 if (Value *RetVal = Body->Codegen()) {
669 // Finish off the function.
670 Builder.CreateRet(RetVal);
674 // Error reading body, remove function.
675 TheFunction->eraseFromParent();
679 //===----------------------------------------------------------------------===//
680 // Top-Level parsing and JIT Driver
681 //===----------------------------------------------------------------------===//
683 static void HandleDefinition() {
684 if (FunctionAST *F = ParseDefinition()) {
685 if (Function *LF = F->Codegen()) {
686 fprintf(stderr, "Read function definition:");
690 // Skip token for error recovery.
695 static void HandleExtern() {
696 if (PrototypeAST *P = ParseExtern()) {
697 if (Function *F = P->Codegen()) {
698 fprintf(stderr, "Read extern: ");
702 // Skip token for error recovery.
707 static void HandleTopLevelExpression() {
708 // Evaluate a top level expression into an anonymous function.
709 if (FunctionAST *F = ParseTopLevelExpr()) {
710 if (Function *LF = F->Codegen()) {
711 fprintf(stderr, "Read top-level expression:");
715 // Skip token for error recovery.
720 /// top ::= definition | external | expression | ';'
721 static void MainLoop() {
723 fprintf(stderr, "ready> ");
725 case tok_eof: return;
726 case ';': getNextToken(); break; // ignore top level semicolons.
727 case tok_def: HandleDefinition(); break;
728 case tok_extern: HandleExtern(); break;
729 default: HandleTopLevelExpression(); break;
736 //===----------------------------------------------------------------------===//
737 // "Library" functions that can be "extern'd" from user code.
738 //===----------------------------------------------------------------------===//
740 /// putchard - putchar that takes a double and returns 0.
742 double putchard(double X) {
747 //===----------------------------------------------------------------------===//
749 //===----------------------------------------------------------------------===//
752 TheModule = new Module("my cool jit");
754 // Install standard binary operators.
755 // 1 is lowest precedence.
756 BinopPrecedence['<'] = 10;
757 BinopPrecedence['+'] = 20;
758 BinopPrecedence['-'] = 20;
759 BinopPrecedence['*'] = 40; // highest.
761 // Prime the first token.
762 fprintf(stderr, "ready> ");
766 TheModule->dump();
785 <!-- *********************************************************************** -->
788 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
789 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
790 <a href="http://validator.w3.org/check/referer"><img
791 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!" /></a>
793 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
794 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
795 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $