1 ============================================================
2 Kaleidoscope: Extending the Language: User-defined Operators
3 ============================================================
11 Welcome to Chapter 6 of the "`Implementing a language with
12 LLVM <index.html>`_" tutorial. At this point in our tutorial, we now
13 have a fully functional language that is fairly minimal, but also
14 useful. There is still one big problem with it, however. Our language
15 doesn't have many useful operators (like division, logical negation, or
16 even any comparisons besides less-than).
18 This chapter of the tutorial takes a wild digression into adding
19 user-defined operators to the simple and beautiful Kaleidoscope
20 language. This digression now gives us a simple and ugly language in
21 some ways, but also a powerful one at the same time. One of the great
22 things about creating your own language is that you get to decide what
23 is good or bad. In this tutorial we'll assume that it is okay to use
24 this as a way to show some interesting parsing techniques.
26 At the end of this tutorial, we'll run through an example Kaleidoscope
27 application that `renders the Mandelbrot set <#example>`_. This gives an
28 example of what you can build with Kaleidoscope and its feature set.
30 User-defined Operators: the Idea
31 ================================
33 The "operator overloading" that we will add to Kaleidoscope is more
34 general than languages like C++. In C++, you are only allowed to
35 redefine existing operators: you can't programatically change the
36 grammar, introduce new operators, change precedence levels, etc. In this
37 chapter, we will add this capability to Kaleidoscope, which will let the
38 user round out the set of operators that are supported.
40 The point of going into user-defined operators in a tutorial like this
41 is to show the power and flexibility of using a hand-written parser.
42 Thus far, the parser we have been implementing uses recursive descent
43 for most parts of the grammar and operator precedence parsing for the
44 expressions. See `Chapter 2 <LangImpl2.html>`_ for details. Without
45 using operator precedence parsing, it would be very difficult to allow
46 the programmer to introduce new operators into the grammar: the grammar
47 is dynamically extensible as the JIT runs.
49 The two specific features we'll add are programmable unary operators
50 (right now, Kaleidoscope has no unary operators at all) as well as
51 binary operators. An example of this is:
62 # Define > with the same precedence as <.
63 def binary> 10 (LHS RHS)
66 # Binary "logical or", (note that it does not "short circuit")
67 def binary| 5 (LHS RHS)
75 # Define = with slightly lower precedence than relationals.
76 def binary= 9 (LHS RHS)
77 !(LHS < RHS | LHS > RHS);
79 Many languages aspire to being able to implement their standard runtime
80 library in the language itself. In Kaleidoscope, we can implement
81 significant parts of the language in the library!
83 We will break down implementation of these features into two parts:
84 implementing support for user-defined binary operators and adding unary
87 User-defined Binary Operators
88 =============================
90 Adding support for user-defined binary operators is pretty simple with
91 our current framework. We'll first add support for the unary/binary
103 static int gettok() {
105 if (IdentifierStr == "for")
107 if (IdentifierStr == "in")
109 if (IdentifierStr == "binary")
111 if (IdentifierStr == "unary")
113 return tok_identifier;
115 This just adds lexer support for the unary and binary keywords, like we
116 did in `previous chapters <LangImpl5.html#iflexer>`_. One nice thing
117 about our current AST, is that we represent binary operators with full
118 generalisation by using their ASCII code as the opcode. For our extended
119 operators, we'll use this same representation, so we don't need any new
120 AST or parser support.
122 On the other hand, we have to be able to represent the definitions of
123 these new operators, in the "def binary\| 5" part of the function
124 definition. In our grammar so far, the "name" for the function
125 definition is parsed as the "prototype" production and into the
126 ``PrototypeAST`` AST node. To represent our new user-defined operators
127 as prototypes, we have to extend the ``PrototypeAST`` AST node like
132 /// PrototypeAST - This class represents the "prototype" for a function,
133 /// which captures its argument names as well as if it is an operator.
136 std::vector<std::string> Args;
138 unsigned Precedence; // Precedence if a binary op.
141 PrototypeAST(const std::string &name, std::vector<std::string> Args,
142 bool IsOperator = false, unsigned Prec = 0)
143 : Name(name), Args(std::move(Args)), IsOperator(IsOperator),
146 bool isUnaryOp() const { return IsOperator && Args.size() == 1; }
147 bool isBinaryOp() const { return IsOperator && Args.size() == 2; }
149 char getOperatorName() const {
150 assert(isUnaryOp() || isBinaryOp());
151 return Name[Name.size()-1];
154 unsigned getBinaryPrecedence() const { return Precedence; }
159 Basically, in addition to knowing a name for the prototype, we now keep
160 track of whether it was an operator, and if it was, what precedence
161 level the operator is at. The precedence is only used for binary
162 operators (as you'll see below, it just doesn't apply for unary
163 operators). Now that we have a way to represent the prototype for a
164 user-defined operator, we need to parse it:
169 /// ::= id '(' id* ')'
170 /// ::= binary LETTER number? (id, id)
171 static std::unique_ptr<PrototypeAST> ParsePrototype() {
174 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
175 unsigned BinaryPrecedence = 30;
179 return ErrorP("Expected function name in prototype");
181 FnName = IdentifierStr;
187 if (!isascii(CurTok))
188 return ErrorP("Expected binary operator");
190 FnName += (char)CurTok;
194 // Read the precedence if present.
195 if (CurTok == tok_number) {
196 if (NumVal < 1 || NumVal > 100)
197 return ErrorP("Invalid precedecnce: must be 1..100");
198 BinaryPrecedence = (unsigned)NumVal;
205 return ErrorP("Expected '(' in prototype");
207 std::vector<std::string> ArgNames;
208 while (getNextToken() == tok_identifier)
209 ArgNames.push_back(IdentifierStr);
211 return ErrorP("Expected ')' in prototype");
214 getNextToken(); // eat ')'.
216 // Verify right number of names for operator.
217 if (Kind && ArgNames.size() != Kind)
218 return ErrorP("Invalid number of operands for operator");
220 return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames), Kind != 0,
224 This is all fairly straightforward parsing code, and we have already
225 seen a lot of similar code in the past. One interesting part about the
226 code above is the couple lines that set up ``FnName`` for binary
227 operators. This builds names like "binary@" for a newly defined "@"
228 operator. This then takes advantage of the fact that symbol names in the
229 LLVM symbol table are allowed to have any character in them, including
230 embedded nul characters.
232 The next interesting thing to add, is codegen support for these binary
233 operators. Given our current structure, this is a simple addition of a
234 default case for our existing binary operator node:
238 Value *BinaryExprAST::codegen() {
239 Value *L = LHS->codegen();
240 Value *R = RHS->codegen();
246 return Builder.CreateFAdd(L, R, "addtmp");
248 return Builder.CreateFSub(L, R, "subtmp");
250 return Builder.CreateFMul(L, R, "multmp");
252 L = Builder.CreateFCmpULT(L, R, "cmptmp");
253 // Convert bool 0/1 to double 0.0 or 1.0
254 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
260 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
262 Function *F = TheModule->getFunction(std::string("binary") + Op);
263 assert(F && "binary operator not found!");
265 Value *Ops[2] = { L, R };
266 return Builder.CreateCall(F, Ops, "binop");
269 As you can see above, the new code is actually really simple. It just
270 does a lookup for the appropriate operator in the symbol table and
271 generates a function call to it. Since user-defined operators are just
272 built as normal functions (because the "prototype" boils down to a
273 function with the right name) everything falls into place.
275 The final piece of code we are missing, is a bit of top-level magic:
279 Function *FunctionAST::codegen() {
282 Function *TheFunction = Proto->codegen();
286 // If this is an operator, install it.
287 if (Proto->isBinaryOp())
288 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
290 // Create a new basic block to start insertion into.
291 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
292 Builder.SetInsertPoint(BB);
294 if (Value *RetVal = Body->codegen()) {
297 Basically, before codegening a function, if it is a user-defined
298 operator, we register it in the precedence table. This allows the binary
299 operator parsing logic we already have in place to handle it. Since we
300 are working on a fully-general operator precedence parser, this is all
301 we need to do to "extend the grammar".
303 Now we have useful user-defined binary operators. This builds a lot on
304 the previous framework we built for other operators. Adding unary
305 operators is a bit more challenging, because we don't have any framework
306 for it yet - lets see what it takes.
308 User-defined Unary Operators
309 ============================
311 Since we don't currently support unary operators in the Kaleidoscope
312 language, we'll need to add everything to support them. Above, we added
313 simple support for the 'unary' keyword to the lexer. In addition to
314 that, we need an AST node:
318 /// UnaryExprAST - Expression class for a unary operator.
319 class UnaryExprAST : public ExprAST {
321 std::unique_ptr<ExprAST> Operand;
324 UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand)
325 : Opcode(Opcode), Operand(std::move(Operand)) {}
326 virtual Value *codegen();
329 This AST node is very simple and obvious by now. It directly mirrors the
330 binary operator AST node, except that it only has one child. With this,
331 we need to add the parsing logic. Parsing a unary operator is pretty
332 simple: we'll add a new function to do it:
339 static std::unique_ptr<ExprAST> ParseUnary() {
340 // If the current token is not an operator, it must be a primary expr.
341 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
342 return ParsePrimary();
344 // If this is a unary operator, read it.
347 if (auto Operand = ParseUnary())
348 return llvm::unique_ptr<UnaryExprAST>(Opc, std::move(Operand));
352 The grammar we add is pretty straightforward here. If we see a unary
353 operator when parsing a primary operator, we eat the operator as a
354 prefix and parse the remaining piece as another unary operator. This
355 allows us to handle multiple unary operators (e.g. "!!x"). Note that
356 unary operators can't have ambiguous parses like binary operators can,
357 so there is no need for precedence information.
359 The problem with this function, is that we need to call ParseUnary from
360 somewhere. To do this, we change previous callers of ParsePrimary to
361 call ParseUnary instead:
367 static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
368 std::unique_ptr<ExprAST> LHS) {
370 // Parse the unary expression after the binary operator.
371 auto RHS = ParseUnary();
377 /// ::= unary binoprhs
379 static std::unique_ptr<ExprAST> ParseExpression() {
380 auto LHS = ParseUnary();
384 return ParseBinOpRHS(0, std::move(LHS));
387 With these two simple changes, we are now able to parse unary operators
388 and build the AST for them. Next up, we need to add parser support for
389 prototypes, to parse the unary operator prototype. We extend the binary
390 operator code above with:
395 /// ::= id '(' id* ')'
396 /// ::= binary LETTER number? (id, id)
397 /// ::= unary LETTER (id)
398 static std::unique_ptr<PrototypeAST> ParsePrototype() {
401 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
402 unsigned BinaryPrecedence = 30;
406 return ErrorP("Expected function name in prototype");
408 FnName = IdentifierStr;
414 if (!isascii(CurTok))
415 return ErrorP("Expected unary operator");
417 FnName += (char)CurTok;
424 As with binary operators, we name unary operators with a name that
425 includes the operator character. This assists us at code generation
426 time. Speaking of, the final piece we need to add is codegen support for
427 unary operators. It looks like this:
431 Value *UnaryExprAST::codegen() {
432 Value *OperandV = Operand->codegen();
436 Function *F = TheModule->getFunction(std::string("unary")+Opcode);
438 return ErrorV("Unknown unary operator");
440 return Builder.CreateCall(F, OperandV, "unop");
443 This code is similar to, but simpler than, the code for binary
444 operators. It is simpler primarily because it doesn't need to handle any
445 predefined operators.
450 It is somewhat hard to believe, but with a few simple extensions we've
451 covered in the last chapters, we have grown a real-ish language. With
452 this, we can do a lot of interesting things, including I/O, math, and a
453 bunch of other things. For example, we can now add a nice sequencing
454 operator (printd is defined to print out the specified value and a
459 ready> extern printd(x);
461 declare double @printd(double)
463 ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.
465 ready> printd(123) : printd(456) : printd(789);
469 Evaluated to 0.000000
471 We can also define a bunch of other "primitive" operations, such as:
486 # Define > with the same precedence as <.
487 def binary> 10 (LHS RHS)
490 # Binary logical or, which does not short circuit.
491 def binary| 5 (LHS RHS)
499 # Binary logical and, which does not short circuit.
500 def binary& 6 (LHS RHS)
506 # Define = with slightly lower precedence than relationals.
507 def binary = 9 (LHS RHS)
508 !(LHS < RHS | LHS > RHS);
510 # Define ':' for sequencing: as a low-precedence operator that ignores operands
511 # and just returns the RHS.
512 def binary : 1 (x y) y;
514 Given the previous if/then/else support, we can also define interesting
515 functions for I/O. For example, the following prints out a character
516 whose "density" reflects the value passed in: the lower the value, the
517 denser the character:
523 extern putchard(char)
534 ready> printdensity(1): printdensity(2): printdensity(3):
535 printdensity(4): printdensity(5): printdensity(9):
538 Evaluated to 0.000000
540 Based on these simple primitive operations, we can start to define more
541 interesting things. For example, here's a little function that solves
542 for the number of iterations it takes a function in the complex plane to
547 # Determine whether the specific location diverges.
548 # Solve for z = z^2 + c in the complex plane.
549 def mandleconverger(real imag iters creal cimag)
550 if iters > 255 | (real*real + imag*imag > 4) then
553 mandleconverger(real*real - imag*imag + creal,
555 iters+1, creal, cimag);
557 # Return the number of iterations required for the iteration to escape
558 def mandleconverge(real imag)
559 mandleconverger(real, imag, 0, real, imag);
561 This "``z = z2 + c``" function is a beautiful little creature that is
562 the basis for computation of the `Mandelbrot
563 Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our
564 ``mandelconverge`` function returns the number of iterations that it
565 takes for a complex orbit to escape, saturating to 255. This is not a
566 very useful function by itself, but if you plot its value over a
567 two-dimensional plane, you can see the Mandelbrot set. Given that we are
568 limited to using putchard here, our amazing graphical output is limited,
569 but we can whip together something using the density plotter above:
573 # Compute and plot the mandlebrot set with the specified 2 dimensional range
575 def mandelhelp(xmin xmax xstep ymin ymax ystep)
576 for y = ymin, y < ymax, ystep in (
577 (for x = xmin, x < xmax, xstep in
578 printdensity(mandleconverge(x,y)))
582 # mandel - This is a convenient helper function for plotting the mandelbrot set
583 # from the specified position with the specified Magnification.
584 def mandel(realstart imagstart realmag imagmag)
585 mandelhelp(realstart, realstart+realmag*78, realmag,
586 imagstart, imagstart+imagmag*40, imagmag);
588 Given this, we can try plotting out the mandlebrot set! Lets try it out:
592 ready> mandel(-2.3, -1.3, 0.05, 0.07);
593 *******************************+++++++++++*************************************
594 *************************+++++++++++++++++++++++*******************************
595 **********************+++++++++++++++++++++++++++++****************************
596 *******************+++++++++++++++++++++.. ...++++++++*************************
597 *****************++++++++++++++++++++++.... ...+++++++++***********************
598 ***************+++++++++++++++++++++++..... ...+++++++++*********************
599 **************+++++++++++++++++++++++.... ....+++++++++********************
600 *************++++++++++++++++++++++...... .....++++++++*******************
601 ************+++++++++++++++++++++....... .......+++++++******************
602 ***********+++++++++++++++++++.... ... .+++++++*****************
603 **********+++++++++++++++++....... .+++++++****************
604 *********++++++++++++++........... ...+++++++***************
605 ********++++++++++++............ ...++++++++**************
606 ********++++++++++... .......... .++++++++**************
607 *******+++++++++..... .+++++++++*************
608 *******++++++++...... ..+++++++++*************
609 *******++++++....... ..+++++++++*************
610 *******+++++...... ..+++++++++*************
611 *******.... .... ...+++++++++*************
612 *******.... . ...+++++++++*************
613 *******+++++...... ...+++++++++*************
614 *******++++++....... ..+++++++++*************
615 *******++++++++...... .+++++++++*************
616 *******+++++++++..... ..+++++++++*************
617 ********++++++++++... .......... .++++++++**************
618 ********++++++++++++............ ...++++++++**************
619 *********++++++++++++++.......... ...+++++++***************
620 **********++++++++++++++++........ .+++++++****************
621 **********++++++++++++++++++++.... ... ..+++++++****************
622 ***********++++++++++++++++++++++....... .......++++++++*****************
623 ************+++++++++++++++++++++++...... ......++++++++******************
624 **************+++++++++++++++++++++++.... ....++++++++********************
625 ***************+++++++++++++++++++++++..... ...+++++++++*********************
626 *****************++++++++++++++++++++++.... ...++++++++***********************
627 *******************+++++++++++++++++++++......++++++++*************************
628 *********************++++++++++++++++++++++.++++++++***************************
629 *************************+++++++++++++++++++++++*******************************
630 ******************************+++++++++++++************************************
631 *******************************************************************************
632 *******************************************************************************
633 *******************************************************************************
634 Evaluated to 0.000000
635 ready> mandel(-2, -1, 0.02, 0.04);
636 **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
637 ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
638 *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
639 *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
640 *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
641 ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
642 **************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
643 ************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
644 ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ .
645 **********++++++++++++++++++++++++++++++++++++++++++++++.............
646 ********+++++++++++++++++++++++++++++++++++++++++++..................
647 *******+++++++++++++++++++++++++++++++++++++++.......................
648 ******+++++++++++++++++++++++++++++++++++...........................
649 *****++++++++++++++++++++++++++++++++............................
650 *****++++++++++++++++++++++++++++...............................
651 ****++++++++++++++++++++++++++...... .........................
652 ***++++++++++++++++++++++++......... ...... ...........
653 ***++++++++++++++++++++++............
654 **+++++++++++++++++++++..............
655 **+++++++++++++++++++................
656 *++++++++++++++++++.................
657 *++++++++++++++++............ ...
658 *++++++++++++++..............
659 *+++....++++................
660 *.......... ...........
662 *.......... ...........
663 *+++....++++................
664 *++++++++++++++..............
665 *++++++++++++++++............ ...
666 *++++++++++++++++++.................
667 **+++++++++++++++++++................
668 **+++++++++++++++++++++..............
669 ***++++++++++++++++++++++............
670 ***++++++++++++++++++++++++......... ...... ...........
671 ****++++++++++++++++++++++++++...... .........................
672 *****++++++++++++++++++++++++++++...............................
673 *****++++++++++++++++++++++++++++++++............................
674 ******+++++++++++++++++++++++++++++++++++...........................
675 *******+++++++++++++++++++++++++++++++++++++++.......................
676 ********+++++++++++++++++++++++++++++++++++++++++++..................
677 Evaluated to 0.000000
678 ready> mandel(-0.9, -1.4, 0.02, 0.03);
679 *******************************************************************************
680 *******************************************************************************
681 *******************************************************************************
682 **********+++++++++++++++++++++************************************************
683 *+++++++++++++++++++++++++++++++++++++++***************************************
684 +++++++++++++++++++++++++++++++++++++++++++++**********************************
685 ++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
686 ++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
687 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
688 +++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
689 +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++****************
690 +++++++++++++++++++++++++++++....... ........+++++++++++++++++++**************
691 ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************
692 +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++**********
693 ++++++++++++++++++++++++++........... ....++++++++++++++++++++++********
694 ++++++++++++++++++++++++............. .......++++++++++++++++++++++******
695 +++++++++++++++++++++++............. ........+++++++++++++++++++++++****
696 ++++++++++++++++++++++........... ..........++++++++++++++++++++++***
697 ++++++++++++++++++++........... .........++++++++++++++++++++++*
698 ++++++++++++++++++............ ...........++++++++++++++++++++
699 ++++++++++++++++............... .............++++++++++++++++++
700 ++++++++++++++................. ...............++++++++++++++++
701 ++++++++++++.................. .................++++++++++++++
702 +++++++++.................. .................+++++++++++++
703 ++++++........ . ......... ..++++++++++++
704 ++............ ...... ....++++++++++
705 .............. ...++++++++++
706 .............. ....+++++++++
707 .............. .....++++++++
708 ............. ......++++++++
709 ........... .......++++++++
710 ......... ........+++++++
711 ......... ........+++++++
712 ......... ....+++++++
720 Evaluated to 0.000000
723 At this point, you may be starting to realize that Kaleidoscope is a
724 real and powerful language. It may not be self-similar :), but it can be
725 used to plot things that are!
727 With this, we conclude the "adding user-defined operators" chapter of
728 the tutorial. We have successfully augmented our language, adding the
729 ability to extend the language in the library, and we have shown how
730 this can be used to build a simple but interesting end-user application
731 in Kaleidoscope. At this point, Kaleidoscope can build a variety of
732 applications that are functional and can call functions with
733 side-effects, but it can't actually define and mutate a variable itself.
735 Strikingly, variable mutation is an important feature of some languages,
736 and it is not at all obvious how to `add support for mutable
737 variables <LangImpl7.html>`_ without having to add an "SSA construction"
738 phase to your front-end. In the next chapter, we will describe how you
739 can add variable mutation without building SSA in your front-end.
744 Here is the complete code listing for our running example, enhanced with
745 the if/then/else and for expressions.. To build this example, use:
750 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
754 On some platforms, you will need to specify -rdynamic or
755 -Wl,--export-dynamic when linking. This ensures that symbols defined in
756 the main executable are exported to the dynamic linker and so are
757 available for symbol resolution at run time. This is not needed if you
758 compile your support code into a shared library, although doing that
759 will cause problems on Windows.
763 .. literalinclude:: ../../examples/Kaleidoscope/Chapter6/toy.cpp
766 `Next: Extending the language: mutable variables / SSA
767 construction <LangImpl7.html>`_