1 ============================================================
2 Kaleidoscope: Extending the Language: User-defined Operators
3 ============================================================
11 Welcome to Chapter 6 of the "`Implementing a language with
12 LLVM <index.html>`_" tutorial. At this point in our tutorial, we now
13 have a fully functional language that is fairly minimal, but also
14 useful. There is still one big problem with it, however. Our language
15 doesn't have many useful operators (like division, logical negation, or
16 even any comparisons besides less-than).
18 This chapter of the tutorial takes a wild digression into adding
19 user-defined operators to the simple and beautiful Kaleidoscope
20 language. This digression now gives us a simple and ugly language in
21 some ways, but also a powerful one at the same time. One of the great
22 things about creating your own language is that you get to decide what
23 is good or bad. In this tutorial we'll assume that it is okay to use
24 this as a way to show some interesting parsing techniques.
26 At the end of this tutorial, we'll run through an example Kaleidoscope
27 application that `renders the Mandelbrot set <#example>`_. This gives an
28 example of what you can build with Kaleidoscope and its feature set.
30 User-defined Operators: the Idea
31 ================================
33 The "operator overloading" that we will add to Kaleidoscope is more
34 general than languages like C++. In C++, you are only allowed to
35 redefine existing operators: you can't programatically change the
36 grammar, introduce new operators, change precedence levels, etc. In this
37 chapter, we will add this capability to Kaleidoscope, which will let the
38 user round out the set of operators that are supported.
40 The point of going into user-defined operators in a tutorial like this
41 is to show the power and flexibility of using a hand-written parser.
42 Thus far, the parser we have been implementing uses recursive descent
43 for most parts of the grammar and operator precedence parsing for the
44 expressions. See `Chapter 2 <LangImpl2.html>`_ for details. Without
45 using operator precedence parsing, it would be very difficult to allow
46 the programmer to introduce new operators into the grammar: the grammar
47 is dynamically extensible as the JIT runs.
49 The two specific features we'll add are programmable unary operators
50 (right now, Kaleidoscope has no unary operators at all) as well as
51 binary operators. An example of this is:
62 # Define > with the same precedence as <.
63 def binary> 10 (LHS RHS)
66 # Binary "logical or", (note that it does not "short circuit")
67 def binary| 5 (LHS RHS)
75 # Define = with slightly lower precedence than relationals.
76 def binary= 9 (LHS RHS)
77 !(LHS < RHS | LHS > RHS);
79 Many languages aspire to being able to implement their standard runtime
80 library in the language itself. In Kaleidoscope, we can implement
81 significant parts of the language in the library!
83 We will break down implementation of these features into two parts:
84 implementing support for user-defined binary operators and adding unary
87 User-defined Binary Operators
88 =============================
90 Adding support for user-defined binary operators is pretty simple with
91 our current framework. We'll first add support for the unary/binary
99 tok_binary = -11, tok_unary = -12
102 static int gettok() {
104 if (IdentifierStr == "for") return tok_for;
105 if (IdentifierStr == "in") return tok_in;
106 if (IdentifierStr == "binary") return tok_binary;
107 if (IdentifierStr == "unary") return tok_unary;
108 return tok_identifier;
110 This just adds lexer support for the unary and binary keywords, like we
111 did in `previous chapters <LangImpl5.html#iflexer>`_. One nice thing
112 about our current AST, is that we represent binary operators with full
113 generalisation by using their ASCII code as the opcode. For our extended
114 operators, we'll use this same representation, so we don't need any new
115 AST or parser support.
117 On the other hand, we have to be able to represent the definitions of
118 these new operators, in the "def binary\| 5" part of the function
119 definition. In our grammar so far, the "name" for the function
120 definition is parsed as the "prototype" production and into the
121 ``PrototypeAST`` AST node. To represent our new user-defined operators
122 as prototypes, we have to extend the ``PrototypeAST`` AST node like
127 /// PrototypeAST - This class represents the "prototype" for a function,
128 /// which captures its argument names as well as if it is an operator.
131 std::vector<std::string> Args;
133 unsigned Precedence; // Precedence if a binary op.
135 PrototypeAST(const std::string &name, std::vector<std::string> Args,
136 bool IsOperator = false, unsigned Prec = 0)
137 : Name(name), Args(std::move(Args)), IsOperator(IsOperator),
140 bool isUnaryOp() const { return IsOperator && Args.size() == 1; }
141 bool isBinaryOp() const { return IsOperator && Args.size() == 2; }
143 char getOperatorName() const {
144 assert(isUnaryOp() || isBinaryOp());
145 return Name[Name.size()-1];
148 unsigned getBinaryPrecedence() const { return Precedence; }
153 Basically, in addition to knowing a name for the prototype, we now keep
154 track of whether it was an operator, and if it was, what precedence
155 level the operator is at. The precedence is only used for binary
156 operators (as you'll see below, it just doesn't apply for unary
157 operators). Now that we have a way to represent the prototype for a
158 user-defined operator, we need to parse it:
163 /// ::= id '(' id* ')'
164 /// ::= binary LETTER number? (id, id)
165 static std::unique_ptr<PrototypeAST> ParsePrototype() {
168 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
169 unsigned BinaryPrecedence = 30;
173 return ErrorP("Expected function name in prototype");
175 FnName = IdentifierStr;
181 if (!isascii(CurTok))
182 return ErrorP("Expected binary operator");
184 FnName += (char)CurTok;
188 // Read the precedence if present.
189 if (CurTok == tok_number) {
190 if (NumVal < 1 || NumVal > 100)
191 return ErrorP("Invalid precedecnce: must be 1..100");
192 BinaryPrecedence = (unsigned)NumVal;
199 return ErrorP("Expected '(' in prototype");
201 std::vector<std::string> ArgNames;
202 while (getNextToken() == tok_identifier)
203 ArgNames.push_back(IdentifierStr);
205 return ErrorP("Expected ')' in prototype");
208 getNextToken(); // eat ')'.
210 // Verify right number of names for operator.
211 if (Kind && ArgNames.size() != Kind)
212 return ErrorP("Invalid number of operands for operator");
214 return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames),
215 Kind != 0, BinaryPrecedence);
218 This is all fairly straightforward parsing code, and we have already
219 seen a lot of similar code in the past. One interesting part about the
220 code above is the couple lines that set up ``FnName`` for binary
221 operators. This builds names like "binary@" for a newly defined "@"
222 operator. This then takes advantage of the fact that symbol names in the
223 LLVM symbol table are allowed to have any character in them, including
224 embedded nul characters.
226 The next interesting thing to add, is codegen support for these binary
227 operators. Given our current structure, this is a simple addition of a
228 default case for our existing binary operator node:
232 Value *BinaryExprAST::Codegen() {
233 Value *L = LHS->Codegen();
234 Value *R = RHS->Codegen();
235 if (L == 0 || R == 0) return 0;
238 case '+': return Builder.CreateFAdd(L, R, "addtmp");
239 case '-': return Builder.CreateFSub(L, R, "subtmp");
240 case '*': return Builder.CreateFMul(L, R, "multmp");
242 L = Builder.CreateFCmpULT(L, R, "cmptmp");
243 // Convert bool 0/1 to double 0.0 or 1.0
244 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
249 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
251 Function *F = TheModule->getFunction(std::string("binary")+Op);
252 assert(F && "binary operator not found!");
254 Value *Ops[2] = { L, R };
255 return Builder.CreateCall(F, Ops, "binop");
258 As you can see above, the new code is actually really simple. It just
259 does a lookup for the appropriate operator in the symbol table and
260 generates a function call to it. Since user-defined operators are just
261 built as normal functions (because the "prototype" boils down to a
262 function with the right name) everything falls into place.
264 The final piece of code we are missing, is a bit of top-level magic:
268 Function *FunctionAST::Codegen() {
271 Function *TheFunction = Proto->Codegen();
272 if (TheFunction == 0)
275 // If this is an operator, install it.
276 if (Proto->isBinaryOp())
277 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
279 // Create a new basic block to start insertion into.
280 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
281 Builder.SetInsertPoint(BB);
283 if (Value *RetVal = Body->Codegen()) {
286 Basically, before codegening a function, if it is a user-defined
287 operator, we register it in the precedence table. This allows the binary
288 operator parsing logic we already have in place to handle it. Since we
289 are working on a fully-general operator precedence parser, this is all
290 we need to do to "extend the grammar".
292 Now we have useful user-defined binary operators. This builds a lot on
293 the previous framework we built for other operators. Adding unary
294 operators is a bit more challenging, because we don't have any framework
295 for it yet - lets see what it takes.
297 User-defined Unary Operators
298 ============================
300 Since we don't currently support unary operators in the Kaleidoscope
301 language, we'll need to add everything to support them. Above, we added
302 simple support for the 'unary' keyword to the lexer. In addition to
303 that, we need an AST node:
307 /// UnaryExprAST - Expression class for a unary operator.
308 class UnaryExprAST : public ExprAST {
310 std::unique_ptr<ExprAST> Operand;
312 UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand)
313 : Opcode(Opcode), Operand(std::move(Operand)) {}
314 virtual Value *Codegen();
317 This AST node is very simple and obvious by now. It directly mirrors the
318 binary operator AST node, except that it only has one child. With this,
319 we need to add the parsing logic. Parsing a unary operator is pretty
320 simple: we'll add a new function to do it:
327 static std::unique_ptr<ExprAST> ParseUnary() {
328 // If the current token is not an operator, it must be a primary expr.
329 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
330 return ParsePrimary();
332 // If this is a unary operator, read it.
335 if (auto Operand = ParseUnary())
336 return llvm::unique_ptr<UnaryExprAST>(Opc, std::move(Operand));
340 The grammar we add is pretty straightforward here. If we see a unary
341 operator when parsing a primary operator, we eat the operator as a
342 prefix and parse the remaining piece as another unary operator. This
343 allows us to handle multiple unary operators (e.g. "!!x"). Note that
344 unary operators can't have ambiguous parses like binary operators can,
345 so there is no need for precedence information.
347 The problem with this function, is that we need to call ParseUnary from
348 somewhere. To do this, we change previous callers of ParsePrimary to
349 call ParseUnary instead:
355 static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
356 std::unique_ptr<ExprAST> LHS) {
358 // Parse the unary expression after the binary operator.
359 auto RHS = ParseUnary();
360 if (!RHS) return nullptr;
364 /// ::= unary binoprhs
366 static std::unique_ptr<ExprAST> ParseExpression() {
367 auto LHS = ParseUnary();
368 if (!LHS) return nullptr;
370 return ParseBinOpRHS(0, std::move(LHS));
373 With these two simple changes, we are now able to parse unary operators
374 and build the AST for them. Next up, we need to add parser support for
375 prototypes, to parse the unary operator prototype. We extend the binary
376 operator code above with:
381 /// ::= id '(' id* ')'
382 /// ::= binary LETTER number? (id, id)
383 /// ::= unary LETTER (id)
384 static std::unique_ptr<PrototypeAST> ParsePrototype() {
387 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
388 unsigned BinaryPrecedence = 30;
392 return ErrorP("Expected function name in prototype");
394 FnName = IdentifierStr;
400 if (!isascii(CurTok))
401 return ErrorP("Expected unary operator");
403 FnName += (char)CurTok;
410 As with binary operators, we name unary operators with a name that
411 includes the operator character. This assists us at code generation
412 time. Speaking of, the final piece we need to add is codegen support for
413 unary operators. It looks like this:
417 Value *UnaryExprAST::Codegen() {
418 Value *OperandV = Operand->Codegen();
419 if (OperandV == 0) return 0;
421 Function *F = TheModule->getFunction(std::string("unary")+Opcode);
423 return ErrorV("Unknown unary operator");
425 return Builder.CreateCall(F, OperandV, "unop");
428 This code is similar to, but simpler than, the code for binary
429 operators. It is simpler primarily because it doesn't need to handle any
430 predefined operators.
435 It is somewhat hard to believe, but with a few simple extensions we've
436 covered in the last chapters, we have grown a real-ish language. With
437 this, we can do a lot of interesting things, including I/O, math, and a
438 bunch of other things. For example, we can now add a nice sequencing
439 operator (printd is defined to print out the specified value and a
444 ready> extern printd(x);
446 declare double @printd(double)
448 ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.
450 ready> printd(123) : printd(456) : printd(789);
454 Evaluated to 0.000000
456 We can also define a bunch of other "primitive" operations, such as:
471 # Define > with the same precedence as <.
472 def binary> 10 (LHS RHS)
475 # Binary logical or, which does not short circuit.
476 def binary| 5 (LHS RHS)
484 # Binary logical and, which does not short circuit.
485 def binary& 6 (LHS RHS)
491 # Define = with slightly lower precedence than relationals.
492 def binary = 9 (LHS RHS)
493 !(LHS < RHS | LHS > RHS);
495 # Define ':' for sequencing: as a low-precedence operator that ignores operands
496 # and just returns the RHS.
497 def binary : 1 (x y) y;
499 Given the previous if/then/else support, we can also define interesting
500 functions for I/O. For example, the following prints out a character
501 whose "density" reflects the value passed in: the lower the value, the
502 denser the character:
508 extern putchard(char)
519 ready> printdensity(1): printdensity(2): printdensity(3):
520 printdensity(4): printdensity(5): printdensity(9):
523 Evaluated to 0.000000
525 Based on these simple primitive operations, we can start to define more
526 interesting things. For example, here's a little function that solves
527 for the number of iterations it takes a function in the complex plane to
532 # Determine whether the specific location diverges.
533 # Solve for z = z^2 + c in the complex plane.
534 def mandleconverger(real imag iters creal cimag)
535 if iters > 255 | (real*real + imag*imag > 4) then
538 mandleconverger(real*real - imag*imag + creal,
540 iters+1, creal, cimag);
542 # Return the number of iterations required for the iteration to escape
543 def mandleconverge(real imag)
544 mandleconverger(real, imag, 0, real, imag);
546 This "``z = z2 + c``" function is a beautiful little creature that is
547 the basis for computation of the `Mandelbrot
548 Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our
549 ``mandelconverge`` function returns the number of iterations that it
550 takes for a complex orbit to escape, saturating to 255. This is not a
551 very useful function by itself, but if you plot its value over a
552 two-dimensional plane, you can see the Mandelbrot set. Given that we are
553 limited to using putchard here, our amazing graphical output is limited,
554 but we can whip together something using the density plotter above:
558 # Compute and plot the mandlebrot set with the specified 2 dimensional range
560 def mandelhelp(xmin xmax xstep ymin ymax ystep)
561 for y = ymin, y < ymax, ystep in (
562 (for x = xmin, x < xmax, xstep in
563 printdensity(mandleconverge(x,y)))
567 # mandel - This is a convenient helper function for plotting the mandelbrot set
568 # from the specified position with the specified Magnification.
569 def mandel(realstart imagstart realmag imagmag)
570 mandelhelp(realstart, realstart+realmag*78, realmag,
571 imagstart, imagstart+imagmag*40, imagmag);
573 Given this, we can try plotting out the mandlebrot set! Lets try it out:
577 ready> mandel(-2.3, -1.3, 0.05, 0.07);
578 *******************************+++++++++++*************************************
579 *************************+++++++++++++++++++++++*******************************
580 **********************+++++++++++++++++++++++++++++****************************
581 *******************+++++++++++++++++++++.. ...++++++++*************************
582 *****************++++++++++++++++++++++.... ...+++++++++***********************
583 ***************+++++++++++++++++++++++..... ...+++++++++*********************
584 **************+++++++++++++++++++++++.... ....+++++++++********************
585 *************++++++++++++++++++++++...... .....++++++++*******************
586 ************+++++++++++++++++++++....... .......+++++++******************
587 ***********+++++++++++++++++++.... ... .+++++++*****************
588 **********+++++++++++++++++....... .+++++++****************
589 *********++++++++++++++........... ...+++++++***************
590 ********++++++++++++............ ...++++++++**************
591 ********++++++++++... .......... .++++++++**************
592 *******+++++++++..... .+++++++++*************
593 *******++++++++...... ..+++++++++*************
594 *******++++++....... ..+++++++++*************
595 *******+++++...... ..+++++++++*************
596 *******.... .... ...+++++++++*************
597 *******.... . ...+++++++++*************
598 *******+++++...... ...+++++++++*************
599 *******++++++....... ..+++++++++*************
600 *******++++++++...... .+++++++++*************
601 *******+++++++++..... ..+++++++++*************
602 ********++++++++++... .......... .++++++++**************
603 ********++++++++++++............ ...++++++++**************
604 *********++++++++++++++.......... ...+++++++***************
605 **********++++++++++++++++........ .+++++++****************
606 **********++++++++++++++++++++.... ... ..+++++++****************
607 ***********++++++++++++++++++++++....... .......++++++++*****************
608 ************+++++++++++++++++++++++...... ......++++++++******************
609 **************+++++++++++++++++++++++.... ....++++++++********************
610 ***************+++++++++++++++++++++++..... ...+++++++++*********************
611 *****************++++++++++++++++++++++.... ...++++++++***********************
612 *******************+++++++++++++++++++++......++++++++*************************
613 *********************++++++++++++++++++++++.++++++++***************************
614 *************************+++++++++++++++++++++++*******************************
615 ******************************+++++++++++++************************************
616 *******************************************************************************
617 *******************************************************************************
618 *******************************************************************************
619 Evaluated to 0.000000
620 ready> mandel(-2, -1, 0.02, 0.04);
621 **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
622 ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
623 *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
624 *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
625 *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
626 ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
627 **************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
628 ************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
629 ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ .
630 **********++++++++++++++++++++++++++++++++++++++++++++++.............
631 ********+++++++++++++++++++++++++++++++++++++++++++..................
632 *******+++++++++++++++++++++++++++++++++++++++.......................
633 ******+++++++++++++++++++++++++++++++++++...........................
634 *****++++++++++++++++++++++++++++++++............................
635 *****++++++++++++++++++++++++++++...............................
636 ****++++++++++++++++++++++++++...... .........................
637 ***++++++++++++++++++++++++......... ...... ...........
638 ***++++++++++++++++++++++............
639 **+++++++++++++++++++++..............
640 **+++++++++++++++++++................
641 *++++++++++++++++++.................
642 *++++++++++++++++............ ...
643 *++++++++++++++..............
644 *+++....++++................
645 *.......... ...........
647 *.......... ...........
648 *+++....++++................
649 *++++++++++++++..............
650 *++++++++++++++++............ ...
651 *++++++++++++++++++.................
652 **+++++++++++++++++++................
653 **+++++++++++++++++++++..............
654 ***++++++++++++++++++++++............
655 ***++++++++++++++++++++++++......... ...... ...........
656 ****++++++++++++++++++++++++++...... .........................
657 *****++++++++++++++++++++++++++++...............................
658 *****++++++++++++++++++++++++++++++++............................
659 ******+++++++++++++++++++++++++++++++++++...........................
660 *******+++++++++++++++++++++++++++++++++++++++.......................
661 ********+++++++++++++++++++++++++++++++++++++++++++..................
662 Evaluated to 0.000000
663 ready> mandel(-0.9, -1.4, 0.02, 0.03);
664 *******************************************************************************
665 *******************************************************************************
666 *******************************************************************************
667 **********+++++++++++++++++++++************************************************
668 *+++++++++++++++++++++++++++++++++++++++***************************************
669 +++++++++++++++++++++++++++++++++++++++++++++**********************************
670 ++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
671 ++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
672 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
673 +++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
674 +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++****************
675 +++++++++++++++++++++++++++++....... ........+++++++++++++++++++**************
676 ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************
677 +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++**********
678 ++++++++++++++++++++++++++........... ....++++++++++++++++++++++********
679 ++++++++++++++++++++++++............. .......++++++++++++++++++++++******
680 +++++++++++++++++++++++............. ........+++++++++++++++++++++++****
681 ++++++++++++++++++++++........... ..........++++++++++++++++++++++***
682 ++++++++++++++++++++........... .........++++++++++++++++++++++*
683 ++++++++++++++++++............ ...........++++++++++++++++++++
684 ++++++++++++++++............... .............++++++++++++++++++
685 ++++++++++++++................. ...............++++++++++++++++
686 ++++++++++++.................. .................++++++++++++++
687 +++++++++.................. .................+++++++++++++
688 ++++++........ . ......... ..++++++++++++
689 ++............ ...... ....++++++++++
690 .............. ...++++++++++
691 .............. ....+++++++++
692 .............. .....++++++++
693 ............. ......++++++++
694 ........... .......++++++++
695 ......... ........+++++++
696 ......... ........+++++++
697 ......... ....+++++++
705 Evaluated to 0.000000
708 At this point, you may be starting to realize that Kaleidoscope is a
709 real and powerful language. It may not be self-similar :), but it can be
710 used to plot things that are!
712 With this, we conclude the "adding user-defined operators" chapter of
713 the tutorial. We have successfully augmented our language, adding the
714 ability to extend the language in the library, and we have shown how
715 this can be used to build a simple but interesting end-user application
716 in Kaleidoscope. At this point, Kaleidoscope can build a variety of
717 applications that are functional and can call functions with
718 side-effects, but it can't actually define and mutate a variable itself.
720 Strikingly, variable mutation is an important feature of some languages,
721 and it is not at all obvious how to `add support for mutable
722 variables <LangImpl7.html>`_ without having to add an "SSA construction"
723 phase to your front-end. In the next chapter, we will describe how you
724 can add variable mutation without building SSA in your front-end.
729 Here is the complete code listing for our running example, enhanced with
730 the if/then/else and for expressions.. To build this example, use:
735 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
739 On some platforms, you will need to specify -rdynamic or
740 -Wl,--export-dynamic when linking. This ensures that symbols defined in
741 the main executable are exported to the dynamic linker and so are
742 available for symbol resolution at run time. This is not needed if you
743 compile your support code into a shared library, although doing that
744 will cause problems on Windows.
748 .. literalinclude:: ../../examples/Kaleidoscope/Chapter6/toy.cpp
751 `Next: Extending the language: mutable variables / SSA
752 construction <LangImpl7.html>`_