1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
6 <title>Kaleidoscope: Implementing code generation to LLVM IR</title>
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <meta name="author" content="Chris Lattner">
9 <link rel="stylesheet" href="../llvm.css" type="text/css">
14 <div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div>
16 <div class="doc_author">
17 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
20 <!-- *********************************************************************** -->
21 <div class="doc_section"><a name="intro">Part 3 Introduction</a></div>
22 <!-- *********************************************************************** -->
24 <div class="doc_text">
26 <p>Welcome to part 3 of the "<a href="index.html">Implementing a language with
27 LLVM</a>" tutorial. This chapter shows you how to transform the <a
28 href="LangImpl2.html">Abstract Syntax Tree built in Chapter 2</a> into LLVM IR.
29 This will teach you a little bit about how LLVM does things, as well as
30 demonstrate how easy it is to use. It's much more work to build a lexer and
31 parser than it is to generate LLVM IR code.
36 <!-- *********************************************************************** -->
37 <div class="doc_section"><a name="basics">Code Generation setup</a></div>
38 <!-- *********************************************************************** -->
40 <div class="doc_text">
43 In order to generate LLVM IR, we want some simple setup to get started. First,
44 we define virtual codegen methods in each AST class:</p>
46 <div class="doc_code">
48 /// ExprAST - Base class for all expression nodes.
52 virtual Value *Codegen() = 0;
55 /// NumberExprAST - Expression class for numeric literals like "1.0".
56 class NumberExprAST : public ExprAST {
59 NumberExprAST(double val) : Val(val) {}
60 virtual Value *Codegen();
66 <p>"Value" is the class used to represent a "register" in LLVM. The Codegen()
67 method says to emit IR for that AST node and all things it depends on. The
68 second thing we want is an "Error" method like we used for parser, which will
69 be used to report errors found during code generation (for example, use of an
70 undeclared parameter):</p>
72 <div class="doc_code">
74 Value *ErrorV(const char *Str) { Error(Str); return 0; }
76 static Module *TheModule;
77 static LLVMBuilder Builder;
78 static std::map<std::string, Value*> NamedValues;
82 <p>The static variables will be used during code generation. <tt>TheModule</tt>
83 is the LLVM construct that contains all of the functions and global variables in
84 a chunk of code. In many ways, it is the top-level structure that the LLVM IR
85 uses to contain code.</p>
87 <p>The <tt>Builder</tt> object is a helper object that makes it easy to generate
88 LLVM instructions. The <tt>Builder</tt> keeps track of the current place to
89 insert instructions and has methods to create new instructions.</p>
91 <p>The <tt>NamedValues</tt> map keeps track of which values are defined in the
92 current scope and what their LLVM representation is. In this form of
93 Kaleidoscope, the only things that can be referenced are function parameters.
94 As such, function parameters will be in this map when generating code for their
98 With these basics in place, we can start talking about how to generate code for
99 each expression. Note that this assumes that the <tt>Builder</tt> has been set
100 up to generate code <em>into</em> something. For now, we'll assume that this
101 has already been done, and we'll just use it to emit code.
106 <!-- *********************************************************************** -->
107 <div class="doc_section"><a name="exprs">Expression Code Generation</a></div>
108 <!-- *********************************************************************** -->
110 <div class="doc_text">
112 <p>Generating LLVM code for expression nodes is very straight-forward: less
113 than 45 lines of commented code for all four of our expression nodes. First,
114 we'll do numeric literals:</p>
116 <div class="doc_code">
118 Value *NumberExprAST::Codegen() {
119 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
124 <p>In the LLVM IR, numeric constants are represented with the ConstantFP class,
125 which holds the numeric value in an APFloat internally (APFloat has the
126 capability of holding floating point constants of arbitrary precision). This
127 code basically just creates and returns a ConstantFP. Note that in the LLVM IR
128 that constants are all uniqued together and shared. For this reason, the API
129 uses "the foo::get(...)" idiom instead of a "create" method or "new foo".</p>
131 <div class="doc_code">
133 Value *VariableExprAST::Codegen() {
134 // Look this variable up in the function.
135 Value *V = NamedValues[Name];
136 return V ? V : ErrorV("Unknown variable name");
141 <p>References to variables is also quite simple here. In our system, we assume
142 that the variable has already been emited somewhere and its value is available.
143 In practice, the only values in the NamedValues map will be arguments. This
144 code simply checks to see that the specified name is in the map (if not, an
145 unknown variable is being referenced) and returns the value for it.</p>
147 <div class="doc_code">
149 Value *BinaryExprAST::Codegen() {
150 Value *L = LHS->Codegen();
151 Value *R = RHS->Codegen();
152 if (L == 0 || R == 0) return 0;
155 case '+': return Builder.CreateAdd(L, R, "addtmp");
156 case '-': return Builder.CreateSub(L, R, "subtmp");
157 case '*': return Builder.CreateMul(L, R, "multmp");
159 L = Builder.CreateFCmpULT(L, R, "multmp");
160 // Convert bool 0/1 to double 0.0 or 1.0
161 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
162 default: return ErrorV("invalid binary operator");
170 <div class="doc_code">
172 Value *CallExprAST::Codegen() {
173 // Look up the name in the global module table.
174 Function *CalleeF = TheModule->getFunction(Callee);
176 return ErrorV("Unknown function referenced");
178 // If argument mismatch error.
179 if (CalleeF->arg_size() != Args.size())
180 return ErrorV("Incorrect # arguments passed");
182 std::vector<Value*> ArgsV;
183 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
184 ArgsV.push_back(Args[i]->Codegen());
185 if (ArgsV.back() == 0) return 0;
188 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
197 <!-- *********************************************************************** -->
198 <div class="doc_section"><a name="code">Conclusions and the Full Code</a></div>
199 <!-- *********************************************************************** -->
201 <div class="doc_text">
203 <div class="doc_code">
206 // g++ -g toy.cpp `llvm-config --cppflags` `llvm-config --ldflags` \
207 // `llvm-config --libs core` -I ~/llvm/include/
209 // See example below.
211 #include "llvm/DerivedTypes.h"
212 #include "llvm/Module.h"
213 #include "llvm/Support/LLVMBuilder.h"
214 #include <cstdio>
215 #include <string>
217 #include <vector>
218 using namespace llvm;
220 //===----------------------------------------------------------------------===//
222 //===----------------------------------------------------------------------===//
224 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
225 // of these for known things.
230 tok_def = -2, tok_extern = -3,
233 tok_identifier = -4, tok_number = -5,
236 static std::string IdentifierStr; // Filled in if tok_identifier
237 static double NumVal; // Filled in if tok_number
239 /// gettok - Return the next token from standard input.
240 static int gettok() {
241 static int LastChar = ' ';
243 // Skip any whitespace.
244 while (isspace(LastChar))
245 LastChar = getchar();
247 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
248 IdentifierStr = LastChar;
249 while (isalnum((LastChar = getchar())))
250 IdentifierStr += LastChar;
252 if (IdentifierStr == "def") return tok_def;
253 if (IdentifierStr == "extern") return tok_extern;
254 return tok_identifier;
257 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
261 LastChar = getchar();
262 } while (isdigit(LastChar) || LastChar == '.');
264 NumVal = strtod(NumStr.c_str(), 0);
268 if (LastChar == '#') {
269 // Comment until end of line.
270 do LastChar = getchar();
271 while (LastChar != EOF && LastChar != '\n' & LastChar != '\r');
277 // Check for end of file. Don't eat the EOF.
281 // Otherwise, just return the character as its ascii value.
282 int ThisChar = LastChar;
283 LastChar = getchar();
287 //===----------------------------------------------------------------------===//
288 // Abstract Syntax Tree (aka Parse Tree)
289 //===----------------------------------------------------------------------===//
291 /// ExprAST - Base class for all expression nodes.
294 virtual ~ExprAST() {}
295 virtual Value *Codegen() = 0;
298 /// NumberExprAST - Expression class for numeric literals like "1.0".
299 class NumberExprAST : public ExprAST {
302 NumberExprAST(double val) : Val(val) {}
303 virtual Value *Codegen();
306 /// VariableExprAST - Expression class for referencing a variable, like "a".
307 class VariableExprAST : public ExprAST {
310 VariableExprAST(const std::string &name) : Name(name) {}
311 virtual Value *Codegen();
314 /// BinaryExprAST - Expression class for a binary operator.
315 class BinaryExprAST : public ExprAST {
319 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
320 : Op(op), LHS(lhs), RHS(rhs) {}
321 virtual Value *Codegen();
324 /// CallExprAST - Expression class for function calls.
325 class CallExprAST : public ExprAST {
327 std::vector<ExprAST*> Args;
329 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
330 : Callee(callee), Args(args) {}
331 virtual Value *Codegen();
334 /// PrototypeAST - This class represents the "prototype" for a function,
335 /// which captures its argument names as well as if it is an operator.
338 std::vector<std::string> Args;
340 PrototypeAST(const std::string &name, const std::vector<std::string> &args)
341 : Name(name), Args(args) {}
346 /// FunctionAST - This class represents a function definition itself.
351 FunctionAST(PrototypeAST *proto, ExprAST *body)
352 : Proto(proto), Body(body) {}
357 //===----------------------------------------------------------------------===//
359 //===----------------------------------------------------------------------===//
361 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
362 /// token the parser it looking at. getNextToken reads another token from the
363 /// lexer and updates CurTok with its results.
365 static int getNextToken() {
366 return CurTok = gettok();
369 /// BinopPrecedence - This holds the precedence for each binary operator that is
371 static std::map<char, int> BinopPrecedence;
373 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
374 static int GetTokPrecedence() {
375 if (!isascii(CurTok))
378 // Make sure it's a declared binop.
379 int TokPrec = BinopPrecedence[CurTok];
380 if (TokPrec <= 0) return -1;
384 /// Error* - These are little helper functions for error handling.
385 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
386 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
387 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
389 static ExprAST *ParseExpression();
393 /// ::= identifer '(' expression* ')'
394 static ExprAST *ParseIdentifierExpr() {
395 std::string IdName = IdentifierStr;
397 getNextToken(); // eat identifer.
399 if (CurTok != '(') // Simple variable ref.
400 return new VariableExprAST(IdName);
403 getNextToken(); // eat (
404 std::vector<ExprAST*> Args;
406 ExprAST *Arg = ParseExpression();
410 if (CurTok == ')') break;
413 return Error("Expected ')'");
420 return new CallExprAST(IdName, Args);
423 /// numberexpr ::= number
424 static ExprAST *ParseNumberExpr() {
425 ExprAST *Result = new NumberExprAST(NumVal);
426 getNextToken(); // consume the number
430 /// parenexpr ::= '(' expression ')'
431 static ExprAST *ParseParenExpr() {
432 getNextToken(); // eat (.
433 ExprAST *V = ParseExpression();
437 return Error("expected ')'");
438 getNextToken(); // eat ).
443 /// ::= identifierexpr
446 static ExprAST *ParsePrimary() {
448 default: return Error("unknown token when expecting an expression");
449 case tok_identifier: return ParseIdentifierExpr();
450 case tok_number: return ParseNumberExpr();
451 case '(': return ParseParenExpr();
456 /// ::= ('+' primary)*
457 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
458 // If this is a binop, find its precedence.
460 int TokPrec = GetTokPrecedence();
462 // If this is a binop that binds at least as tightly as the current binop,
463 // consume it, otherwise we are done.
464 if (TokPrec < ExprPrec)
467 // Okay, we know this is a binop.
469 getNextToken(); // eat binop
471 // Parse the primary expression after the binary operator.
472 ExprAST *RHS = ParsePrimary();
475 // If BinOp binds less tightly with RHS than the operator after RHS, let
476 // the pending operator take RHS as its LHS.
477 int NextPrec = GetTokPrecedence();
478 if (TokPrec < NextPrec) {
479 RHS = ParseBinOpRHS(TokPrec+1, RHS);
480 if (RHS == 0) return 0;
484 LHS = new BinaryExprAST(BinOp, LHS, RHS);
489 /// ::= primary binoprhs
491 static ExprAST *ParseExpression() {
492 ExprAST *LHS = ParsePrimary();
495 return ParseBinOpRHS(0, LHS);
499 /// ::= id '(' id* ')'
500 static PrototypeAST *ParsePrototype() {
501 if (CurTok != tok_identifier)
502 return ErrorP("Expected function name in prototype");
504 std::string FnName = IdentifierStr;
508 return ErrorP("Expected '(' in prototype");
510 std::vector<std::string> ArgNames;
511 while (getNextToken() == tok_identifier)
512 ArgNames.push_back(IdentifierStr);
514 return ErrorP("Expected ')' in prototype");
517 getNextToken(); // eat ')'.
519 return new PrototypeAST(FnName, ArgNames);
522 /// definition ::= 'def' prototype expression
523 static FunctionAST *ParseDefinition() {
524 getNextToken(); // eat def.
525 PrototypeAST *Proto = ParsePrototype();
526 if (Proto == 0) return 0;
528 if (ExprAST *E = ParseExpression())
529 return new FunctionAST(Proto, E);
533 /// toplevelexpr ::= expression
534 static FunctionAST *ParseTopLevelExpr() {
535 if (ExprAST *E = ParseExpression()) {
536 // Make an anonymous proto.
537 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
538 return new FunctionAST(Proto, E);
543 /// external ::= 'extern' prototype
544 static PrototypeAST *ParseExtern() {
545 getNextToken(); // eat extern.
546 return ParsePrototype();
549 //===----------------------------------------------------------------------===//
551 //===----------------------------------------------------------------------===//
553 static Module *TheModule;
554 static LLVMBuilder Builder;
555 static std::map<std::string, Value*> NamedValues;
557 Value *ErrorV(const char *Str) { Error(Str); return 0; }
559 Value *NumberExprAST::Codegen() {
560 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
563 Value *VariableExprAST::Codegen() {
564 // Look this variable up in the function.
565 Value *V = NamedValues[Name];
566 return V ? V : ErrorV("Unknown variable name");
569 Value *BinaryExprAST::Codegen() {
570 Value *L = LHS->Codegen();
571 Value *R = RHS->Codegen();
572 if (L == 0 || R == 0) return 0;
575 case '+': return Builder.CreateAdd(L, R, "addtmp");
576 case '-': return Builder.CreateSub(L, R, "subtmp");
577 case '*': return Builder.CreateMul(L, R, "multmp");
579 L = Builder.CreateFCmpULT(L, R, "multmp");
580 // Convert bool 0/1 to double 0.0 or 1.0
581 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
582 default: return ErrorV("invalid binary operator");
586 Value *CallExprAST::Codegen() {
587 // Look up the name in the global module table.
588 Function *CalleeF = TheModule->getFunction(Callee);
590 return ErrorV("Unknown function referenced");
592 // If argument mismatch error.
593 if (CalleeF->arg_size() != Args.size())
594 return ErrorV("Incorrect # arguments passed");
596 std::vector<Value*> ArgsV;
597 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
598 ArgsV.push_back(Args[i]->Codegen());
599 if (ArgsV.back() == 0) return 0;
602 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
605 Function *PrototypeAST::Codegen() {
606 // Make the function type: double(double,double) etc.
608 FunctionType::get(Type::DoubleTy, std::vector<const Type*>(Args.size(),
612 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
614 // If F conflicted, there was already something named 'Name'. If it has a
615 // body, don't allow redefinition or reextern.
616 if (F->getName() != Name) {
617 // Delete the one we just made and get the existing one.
618 F->eraseFromParent();
619 F = TheModule->getFunction(Name);
621 // If F already has a body, reject this.
622 if (!F->empty()) {
623 ErrorF("redefinition of function");
627 // If F took a different number of args, reject.
628 if (F->arg_size() != Args.size()) {
629 ErrorF("redefinition of function with different # args");
634 // Set names for all arguments.
636 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
638 AI->setName(Args[Idx]);
640 // Add arguments to variable symbol table.
641 NamedValues[Args[Idx]] = AI;
647 Function *FunctionAST::Codegen() {
650 Function *TheFunction = Proto->Codegen();
651 if (TheFunction == 0)
654 // Create a new basic block to start insertion into.
655 Builder.SetInsertPoint(new BasicBlock("entry", TheFunction));
657 if (Value *RetVal = Body->Codegen()) {
658 // Finish off the function.
659 Builder.CreateRet(RetVal);
663 // Error reading body, remove function.
664 TheFunction->eraseFromParent();
668 //===----------------------------------------------------------------------===//
669 // Top-Level parsing and JIT Driver
670 //===----------------------------------------------------------------------===//
672 static void HandleDefinition() {
673 if (FunctionAST *F = ParseDefinition()) {
674 if (Function *LF = F->Codegen()) {
675 fprintf(stderr, "Read function definition:");
679 // Skip token for error recovery.
684 static void HandleExtern() {
685 if (PrototypeAST *P = ParseExtern()) {
686 if (Function *F = P->Codegen()) {
687 fprintf(stderr, "Read extern: ");
691 // Skip token for error recovery.
696 static void HandleTopLevelExpression() {
697 // Evaluate a top level expression into an anonymous function.
698 if (FunctionAST *F = ParseTopLevelExpr()) {
699 if (Function *LF = F->Codegen()) {
700 fprintf(stderr, "Read top-level expression:");
704 // Skip token for error recovery.
709 /// top ::= definition | external | expression | ';'
710 static void MainLoop() {
712 fprintf(stderr, "ready> ");
714 case tok_eof: return;
715 case ';': getNextToken(); break; // ignore top level semicolons.
716 case tok_def: HandleDefinition(); break;
717 case tok_extern: HandleExtern(); break;
718 default: HandleTopLevelExpression(); break;
725 //===----------------------------------------------------------------------===//
726 // "Library" functions that can be "extern'd" from user code.
727 //===----------------------------------------------------------------------===//
729 /// putchard - putchar that takes a double and returns 0.
731 double putchard(double X) {
736 //===----------------------------------------------------------------------===//
738 //===----------------------------------------------------------------------===//
741 TheModule = new Module("my cool jit");
743 // Install standard binary operators.
744 // 1 is lowest precedence.
745 BinopPrecedence['<'] = 10;
746 BinopPrecedence['+'] = 20;
747 BinopPrecedence['-'] = 20;
748 BinopPrecedence['*'] = 40; // highest.
750 // Prime the first token.
751 fprintf(stderr, "ready> ");
755 TheModule->dump();
774 <!-- *********************************************************************** -->
777 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
778 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
779 <a href="http://validator.w3.org/check/referer"><img
780 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!" /></a>
782 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
783 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
784 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $