1 ============================================================
2 Kaleidoscope: Extending the Language: User-defined Operators
3 ============================================================
11 Welcome to Chapter 6 of the "`Implementing a language with
12 LLVM <index.html>`_" tutorial. At this point in our tutorial, we now
13 have a fully functional language that is fairly minimal, but also
14 useful. There is still one big problem with it, however. Our language
15 doesn't have many useful operators (like division, logical negation, or
16 even any comparisons besides less-than).
18 This chapter of the tutorial takes a wild digression into adding
19 user-defined operators to the simple and beautiful Kaleidoscope
20 language. This digression now gives us a simple and ugly language in
21 some ways, but also a powerful one at the same time. One of the great
22 things about creating your own language is that you get to decide what
23 is good or bad. In this tutorial we'll assume that it is okay to use
24 this as a way to show some interesting parsing techniques.
26 At the end of this tutorial, we'll run through an example Kaleidoscope
27 application that `renders the Mandelbrot set <#example>`_. This gives an
28 example of what you can build with Kaleidoscope and its feature set.
30 User-defined Operators: the Idea
31 ================================
33 The "operator overloading" that we will add to Kaleidoscope is more
34 general than languages like C++. In C++, you are only allowed to
35 redefine existing operators: you can't programatically change the
36 grammar, introduce new operators, change precedence levels, etc. In this
37 chapter, we will add this capability to Kaleidoscope, which will let the
38 user round out the set of operators that are supported.
40 The point of going into user-defined operators in a tutorial like this
41 is to show the power and flexibility of using a hand-written parser.
42 Thus far, the parser we have been implementing uses recursive descent
43 for most parts of the grammar and operator precedence parsing for the
44 expressions. See `Chapter 2 <LangImpl2.html>`_ for details. Without
45 using operator precedence parsing, it would be very difficult to allow
46 the programmer to introduce new operators into the grammar: the grammar
47 is dynamically extensible as the JIT runs.
49 The two specific features we'll add are programmable unary operators
50 (right now, Kaleidoscope has no unary operators at all) as well as
51 binary operators. An example of this is:
62 # Define > with the same precedence as <.
63 def binary> 10 (LHS RHS)
66 # Binary "logical or", (note that it does not "short circuit")
67 def binary| 5 (LHS RHS)
75 # Define = with slightly lower precedence than relationals.
76 def binary= 9 (LHS RHS)
77 !(LHS < RHS | LHS > RHS);
79 Many languages aspire to being able to implement their standard runtime
80 library in the language itself. In Kaleidoscope, we can implement
81 significant parts of the language in the library!
83 We will break down implementation of these features into two parts:
84 implementing support for user-defined binary operators and adding unary
87 User-defined Binary Operators
88 =============================
90 Adding support for user-defined binary operators is pretty simple with
91 our current framework. We'll first add support for the unary/binary
99 tok_binary = -11, tok_unary = -12
102 static int gettok() {
104 if (IdentifierStr == "for") return tok_for;
105 if (IdentifierStr == "in") return tok_in;
106 if (IdentifierStr == "binary") return tok_binary;
107 if (IdentifierStr == "unary") return tok_unary;
108 return tok_identifier;
110 This just adds lexer support for the unary and binary keywords, like we
111 did in `previous chapters <LangImpl5.html#iflexer>`_. One nice thing
112 about our current AST, is that we represent binary operators with full
113 generalisation by using their ASCII code as the opcode. For our extended
114 operators, we'll use this same representation, so we don't need any new
115 AST or parser support.
117 On the other hand, we have to be able to represent the definitions of
118 these new operators, in the "def binary\| 5" part of the function
119 definition. In our grammar so far, the "name" for the function
120 definition is parsed as the "prototype" production and into the
121 ``PrototypeAST`` AST node. To represent our new user-defined operators
122 as prototypes, we have to extend the ``PrototypeAST`` AST node like
127 /// PrototypeAST - This class represents the "prototype" for a function,
128 /// which captures its argument names as well as if it is an operator.
131 std::vector<std::string> Args;
133 unsigned Precedence; // Precedence if a binary op.
135 PrototypeAST(const std::string &name, const std::vector<std::string> &args,
136 bool isoperator = false, unsigned prec = 0)
137 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
139 bool isUnaryOp() const { return isOperator && Args.size() == 1; }
140 bool isBinaryOp() const { return isOperator && Args.size() == 2; }
142 char getOperatorName() const {
143 assert(isUnaryOp() || isBinaryOp());
144 return Name[Name.size()-1];
147 unsigned getBinaryPrecedence() const { return Precedence; }
152 Basically, in addition to knowing a name for the prototype, we now keep
153 track of whether it was an operator, and if it was, what precedence
154 level the operator is at. The precedence is only used for binary
155 operators (as you'll see below, it just doesn't apply for unary
156 operators). Now that we have a way to represent the prototype for a
157 user-defined operator, we need to parse it:
162 /// ::= id '(' id* ')'
163 /// ::= binary LETTER number? (id, id)
164 static PrototypeAST *ParsePrototype() {
167 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
168 unsigned BinaryPrecedence = 30;
172 return ErrorP("Expected function name in prototype");
174 FnName = IdentifierStr;
180 if (!isascii(CurTok))
181 return ErrorP("Expected binary operator");
183 FnName += (char)CurTok;
187 // Read the precedence if present.
188 if (CurTok == tok_number) {
189 if (NumVal < 1 || NumVal > 100)
190 return ErrorP("Invalid precedecnce: must be 1..100");
191 BinaryPrecedence = (unsigned)NumVal;
198 return ErrorP("Expected '(' in prototype");
200 std::vector<std::string> ArgNames;
201 while (getNextToken() == tok_identifier)
202 ArgNames.push_back(IdentifierStr);
204 return ErrorP("Expected ')' in prototype");
207 getNextToken(); // eat ')'.
209 // Verify right number of names for operator.
210 if (Kind && ArgNames.size() != Kind)
211 return ErrorP("Invalid number of operands for operator");
213 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
216 This is all fairly straightforward parsing code, and we have already
217 seen a lot of similar code in the past. One interesting part about the
218 code above is the couple lines that set up ``FnName`` for binary
219 operators. This builds names like "binary@" for a newly defined "@"
220 operator. This then takes advantage of the fact that symbol names in the
221 LLVM symbol table are allowed to have any character in them, including
222 embedded nul characters.
224 The next interesting thing to add, is codegen support for these binary
225 operators. Given our current structure, this is a simple addition of a
226 default case for our existing binary operator node:
230 Value *BinaryExprAST::Codegen() {
231 Value *L = LHS->Codegen();
232 Value *R = RHS->Codegen();
233 if (L == 0 || R == 0) return 0;
236 case '+': return Builder.CreateFAdd(L, R, "addtmp");
237 case '-': return Builder.CreateFSub(L, R, "subtmp");
238 case '*': return Builder.CreateFMul(L, R, "multmp");
240 L = Builder.CreateFCmpULT(L, R, "cmptmp");
241 // Convert bool 0/1 to double 0.0 or 1.0
242 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
247 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
249 Function *F = TheModule->getFunction(std::string("binary")+Op);
250 assert(F && "binary operator not found!");
252 Value *Ops[2] = { L, R };
253 return Builder.CreateCall(F, Ops, "binop");
256 As you can see above, the new code is actually really simple. It just
257 does a lookup for the appropriate operator in the symbol table and
258 generates a function call to it. Since user-defined operators are just
259 built as normal functions (because the "prototype" boils down to a
260 function with the right name) everything falls into place.
262 The final piece of code we are missing, is a bit of top-level magic:
266 Function *FunctionAST::Codegen() {
269 Function *TheFunction = Proto->Codegen();
270 if (TheFunction == 0)
273 // If this is an operator, install it.
274 if (Proto->isBinaryOp())
275 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
277 // Create a new basic block to start insertion into.
278 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
279 Builder.SetInsertPoint(BB);
281 if (Value *RetVal = Body->Codegen()) {
284 Basically, before codegening a function, if it is a user-defined
285 operator, we register it in the precedence table. This allows the binary
286 operator parsing logic we already have in place to handle it. Since we
287 are working on a fully-general operator precedence parser, this is all
288 we need to do to "extend the grammar".
290 Now we have useful user-defined binary operators. This builds a lot on
291 the previous framework we built for other operators. Adding unary
292 operators is a bit more challenging, because we don't have any framework
293 for it yet - lets see what it takes.
295 User-defined Unary Operators
296 ============================
298 Since we don't currently support unary operators in the Kaleidoscope
299 language, we'll need to add everything to support them. Above, we added
300 simple support for the 'unary' keyword to the lexer. In addition to
301 that, we need an AST node:
305 /// UnaryExprAST - Expression class for a unary operator.
306 class UnaryExprAST : public ExprAST {
310 UnaryExprAST(char opcode, ExprAST *operand)
311 : Opcode(opcode), Operand(operand) {}
312 virtual Value *Codegen();
315 This AST node is very simple and obvious by now. It directly mirrors the
316 binary operator AST node, except that it only has one child. With this,
317 we need to add the parsing logic. Parsing a unary operator is pretty
318 simple: we'll add a new function to do it:
325 static ExprAST *ParseUnary() {
326 // If the current token is not an operator, it must be a primary expr.
327 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
328 return ParsePrimary();
330 // If this is a unary operator, read it.
333 if (ExprAST *Operand = ParseUnary())
334 return new UnaryExprAST(Opc, Operand);
338 The grammar we add is pretty straightforward here. If we see a unary
339 operator when parsing a primary operator, we eat the operator as a
340 prefix and parse the remaining piece as another unary operator. This
341 allows us to handle multiple unary operators (e.g. "!!x"). Note that
342 unary operators can't have ambiguous parses like binary operators can,
343 so there is no need for precedence information.
345 The problem with this function, is that we need to call ParseUnary from
346 somewhere. To do this, we change previous callers of ParsePrimary to
347 call ParseUnary instead:
353 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
355 // Parse the unary expression after the binary operator.
356 ExprAST *RHS = ParseUnary();
361 /// ::= unary binoprhs
363 static ExprAST *ParseExpression() {
364 ExprAST *LHS = ParseUnary();
367 return ParseBinOpRHS(0, LHS);
370 With these two simple changes, we are now able to parse unary operators
371 and build the AST for them. Next up, we need to add parser support for
372 prototypes, to parse the unary operator prototype. We extend the binary
373 operator code above with:
378 /// ::= id '(' id* ')'
379 /// ::= binary LETTER number? (id, id)
380 /// ::= unary LETTER (id)
381 static PrototypeAST *ParsePrototype() {
384 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
385 unsigned BinaryPrecedence = 30;
389 return ErrorP("Expected function name in prototype");
391 FnName = IdentifierStr;
397 if (!isascii(CurTok))
398 return ErrorP("Expected unary operator");
400 FnName += (char)CurTok;
407 As with binary operators, we name unary operators with a name that
408 includes the operator character. This assists us at code generation
409 time. Speaking of, the final piece we need to add is codegen support for
410 unary operators. It looks like this:
414 Value *UnaryExprAST::Codegen() {
415 Value *OperandV = Operand->Codegen();
416 if (OperandV == 0) return 0;
418 Function *F = TheModule->getFunction(std::string("unary")+Opcode);
420 return ErrorV("Unknown unary operator");
422 return Builder.CreateCall(F, OperandV, "unop");
425 This code is similar to, but simpler than, the code for binary
426 operators. It is simpler primarily because it doesn't need to handle any
427 predefined operators.
432 It is somewhat hard to believe, but with a few simple extensions we've
433 covered in the last chapters, we have grown a real-ish language. With
434 this, we can do a lot of interesting things, including I/O, math, and a
435 bunch of other things. For example, we can now add a nice sequencing
436 operator (printd is defined to print out the specified value and a
441 ready> extern printd(x);
443 declare double @printd(double)
445 ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.
447 ready> printd(123) : printd(456) : printd(789);
451 Evaluated to 0.000000
453 We can also define a bunch of other "primitive" operations, such as:
468 # Define > with the same precedence as <.
469 def binary> 10 (LHS RHS)
472 # Binary logical or, which does not short circuit.
473 def binary| 5 (LHS RHS)
481 # Binary logical and, which does not short circuit.
482 def binary& 6 (LHS RHS)
488 # Define = with slightly lower precedence than relationals.
489 def binary = 9 (LHS RHS)
490 !(LHS < RHS | LHS > RHS);
492 # Define ':' for sequencing: as a low-precedence operator that ignores operands
493 # and just returns the RHS.
494 def binary : 1 (x y) y;
496 Given the previous if/then/else support, we can also define interesting
497 functions for I/O. For example, the following prints out a character
498 whose "density" reflects the value passed in: the lower the value, the
499 denser the character:
505 extern putchard(char)
516 ready> printdensity(1): printdensity(2): printdensity(3):
517 printdensity(4): printdensity(5): printdensity(9):
520 Evaluated to 0.000000
522 Based on these simple primitive operations, we can start to define more
523 interesting things. For example, here's a little function that solves
524 for the number of iterations it takes a function in the complex plane to
529 # Determine whether the specific location diverges.
530 # Solve for z = z^2 + c in the complex plane.
531 def mandleconverger(real imag iters creal cimag)
532 if iters > 255 | (real*real + imag*imag > 4) then
535 mandleconverger(real*real - imag*imag + creal,
537 iters+1, creal, cimag);
539 # Return the number of iterations required for the iteration to escape
540 def mandleconverge(real imag)
541 mandleconverger(real, imag, 0, real, imag);
543 This "``z = z2 + c``" function is a beautiful little creature that is
544 the basis for computation of the `Mandelbrot
545 Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our
546 ``mandelconverge`` function returns the number of iterations that it
547 takes for a complex orbit to escape, saturating to 255. This is not a
548 very useful function by itself, but if you plot its value over a
549 two-dimensional plane, you can see the Mandelbrot set. Given that we are
550 limited to using putchard here, our amazing graphical output is limited,
551 but we can whip together something using the density plotter above:
555 # Compute and plot the mandlebrot set with the specified 2 dimensional range
557 def mandelhelp(xmin xmax xstep ymin ymax ystep)
558 for y = ymin, y < ymax, ystep in (
559 (for x = xmin, x < xmax, xstep in
560 printdensity(mandleconverge(x,y)))
564 # mandel - This is a convenient helper function for plotting the mandelbrot set
565 # from the specified position with the specified Magnification.
566 def mandel(realstart imagstart realmag imagmag)
567 mandelhelp(realstart, realstart+realmag*78, realmag,
568 imagstart, imagstart+imagmag*40, imagmag);
570 Given this, we can try plotting out the mandlebrot set! Lets try it out:
574 ready> mandel(-2.3, -1.3, 0.05, 0.07);
575 *******************************+++++++++++*************************************
576 *************************+++++++++++++++++++++++*******************************
577 **********************+++++++++++++++++++++++++++++****************************
578 *******************+++++++++++++++++++++.. ...++++++++*************************
579 *****************++++++++++++++++++++++.... ...+++++++++***********************
580 ***************+++++++++++++++++++++++..... ...+++++++++*********************
581 **************+++++++++++++++++++++++.... ....+++++++++********************
582 *************++++++++++++++++++++++...... .....++++++++*******************
583 ************+++++++++++++++++++++....... .......+++++++******************
584 ***********+++++++++++++++++++.... ... .+++++++*****************
585 **********+++++++++++++++++....... .+++++++****************
586 *********++++++++++++++........... ...+++++++***************
587 ********++++++++++++............ ...++++++++**************
588 ********++++++++++... .......... .++++++++**************
589 *******+++++++++..... .+++++++++*************
590 *******++++++++...... ..+++++++++*************
591 *******++++++....... ..+++++++++*************
592 *******+++++...... ..+++++++++*************
593 *******.... .... ...+++++++++*************
594 *******.... . ...+++++++++*************
595 *******+++++...... ...+++++++++*************
596 *******++++++....... ..+++++++++*************
597 *******++++++++...... .+++++++++*************
598 *******+++++++++..... ..+++++++++*************
599 ********++++++++++... .......... .++++++++**************
600 ********++++++++++++............ ...++++++++**************
601 *********++++++++++++++.......... ...+++++++***************
602 **********++++++++++++++++........ .+++++++****************
603 **********++++++++++++++++++++.... ... ..+++++++****************
604 ***********++++++++++++++++++++++....... .......++++++++*****************
605 ************+++++++++++++++++++++++...... ......++++++++******************
606 **************+++++++++++++++++++++++.... ....++++++++********************
607 ***************+++++++++++++++++++++++..... ...+++++++++*********************
608 *****************++++++++++++++++++++++.... ...++++++++***********************
609 *******************+++++++++++++++++++++......++++++++*************************
610 *********************++++++++++++++++++++++.++++++++***************************
611 *************************+++++++++++++++++++++++*******************************
612 ******************************+++++++++++++************************************
613 *******************************************************************************
614 *******************************************************************************
615 *******************************************************************************
616 Evaluated to 0.000000
617 ready> mandel(-2, -1, 0.02, 0.04);
618 **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
619 ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
620 *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
621 *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
622 *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
623 ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
624 **************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
625 ************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
626 ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ .
627 **********++++++++++++++++++++++++++++++++++++++++++++++.............
628 ********+++++++++++++++++++++++++++++++++++++++++++..................
629 *******+++++++++++++++++++++++++++++++++++++++.......................
630 ******+++++++++++++++++++++++++++++++++++...........................
631 *****++++++++++++++++++++++++++++++++............................
632 *****++++++++++++++++++++++++++++...............................
633 ****++++++++++++++++++++++++++...... .........................
634 ***++++++++++++++++++++++++......... ...... ...........
635 ***++++++++++++++++++++++............
636 **+++++++++++++++++++++..............
637 **+++++++++++++++++++................
638 *++++++++++++++++++.................
639 *++++++++++++++++............ ...
640 *++++++++++++++..............
641 *+++....++++................
642 *.......... ...........
644 *.......... ...........
645 *+++....++++................
646 *++++++++++++++..............
647 *++++++++++++++++............ ...
648 *++++++++++++++++++.................
649 **+++++++++++++++++++................
650 **+++++++++++++++++++++..............
651 ***++++++++++++++++++++++............
652 ***++++++++++++++++++++++++......... ...... ...........
653 ****++++++++++++++++++++++++++...... .........................
654 *****++++++++++++++++++++++++++++...............................
655 *****++++++++++++++++++++++++++++++++............................
656 ******+++++++++++++++++++++++++++++++++++...........................
657 *******+++++++++++++++++++++++++++++++++++++++.......................
658 ********+++++++++++++++++++++++++++++++++++++++++++..................
659 Evaluated to 0.000000
660 ready> mandel(-0.9, -1.4, 0.02, 0.03);
661 *******************************************************************************
662 *******************************************************************************
663 *******************************************************************************
664 **********+++++++++++++++++++++************************************************
665 *+++++++++++++++++++++++++++++++++++++++***************************************
666 +++++++++++++++++++++++++++++++++++++++++++++**********************************
667 ++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
668 ++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
669 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
670 +++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
671 +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++****************
672 +++++++++++++++++++++++++++++....... ........+++++++++++++++++++**************
673 ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************
674 +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++**********
675 ++++++++++++++++++++++++++........... ....++++++++++++++++++++++********
676 ++++++++++++++++++++++++............. .......++++++++++++++++++++++******
677 +++++++++++++++++++++++............. ........+++++++++++++++++++++++****
678 ++++++++++++++++++++++........... ..........++++++++++++++++++++++***
679 ++++++++++++++++++++........... .........++++++++++++++++++++++*
680 ++++++++++++++++++............ ...........++++++++++++++++++++
681 ++++++++++++++++............... .............++++++++++++++++++
682 ++++++++++++++................. ...............++++++++++++++++
683 ++++++++++++.................. .................++++++++++++++
684 +++++++++.................. .................+++++++++++++
685 ++++++........ . ......... ..++++++++++++
686 ++............ ...... ....++++++++++
687 .............. ...++++++++++
688 .............. ....+++++++++
689 .............. .....++++++++
690 ............. ......++++++++
691 ........... .......++++++++
692 ......... ........+++++++
693 ......... ........+++++++
694 ......... ....+++++++
702 Evaluated to 0.000000
705 At this point, you may be starting to realize that Kaleidoscope is a
706 real and powerful language. It may not be self-similar :), but it can be
707 used to plot things that are!
709 With this, we conclude the "adding user-defined operators" chapter of
710 the tutorial. We have successfully augmented our language, adding the
711 ability to extend the language in the library, and we have shown how
712 this can be used to build a simple but interesting end-user application
713 in Kaleidoscope. At this point, Kaleidoscope can build a variety of
714 applications that are functional and can call functions with
715 side-effects, but it can't actually define and mutate a variable itself.
717 Strikingly, variable mutation is an important feature of some languages,
718 and it is not at all obvious how to `add support for mutable
719 variables <LangImpl7.html>`_ without having to add an "SSA construction"
720 phase to your front-end. In the next chapter, we will describe how you
721 can add variable mutation without building SSA in your front-end.
726 Here is the complete code listing for our running example, enhanced with
727 the if/then/else and for expressions.. To build this example, use:
732 clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
736 On some platforms, you will need to specify -rdynamic or
737 -Wl,--export-dynamic when linking. This ensures that symbols defined in
738 the main executable are exported to the dynamic linker and so are
739 available for symbol resolution at run time. This is not needed if you
740 compile your support code into a shared library, although doing that
741 will cause problems on Windows.
745 .. literalinclude:: ../../examples/Kaleidoscope/Chapter6/toy.cpp
748 `Next: Extending the language: mutable variables / SSA
749 construction <LangImpl7.html>`_