</li>
<li><a href="#targetimpls">Target-specific Implementation Notes</a>
<ul>
+ <li><a href="#tailcallopt">Tail call optimization</a></li>
<li><a href="#x86">The X86 backend</a></li>
<li><a href="#ppc">The PowerPC backend</a>
<ul>
<div class="doc_code">
<pre>
-int %test(int %X, int %Y) {
- %Z = div int %X, %Y
- ret int %Z
+define i32 @test(i32 %X, i32 %Y) {
+ %Z = udiv i32 %X, %Y
+ ret i32 %Z
}
</pre>
</div>
edges are represented by instances of the <tt>SDOperand</tt> class, which is
a <tt><SDNode, unsigned></tt> pair, indicating the node and result
value being used, respectively. Each value produced by an <tt>SDNode</tt> has
-an associated <tt>MVT::ValueType</tt> indicating what type the value is.</p>
+an associated <tt>MVT</tt> (Machine Value Type) indicating what the type of the
+value is.</p>
<p>SelectionDAGs contain two different kinds of values: those that represent
data flow and those that represent control flow dependencies. Data values are
<div class="doc_code">
<pre>
%a = MOVE %b
-%a = ADD %a %b
+%a = ADD %a %c
</pre>
</div>
<p>Notice that, internally, the second instruction is represented as
-<tt>ADD %a[def/use] %b</tt>. I.e., the register operand <tt>%a</tt> is
+<tt>ADD %a[def/use] %c</tt>. I.e., the register operand <tt>%a</tt> is
both used and defined by the instruction.</p>
</div>
</div>
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="tailcallopt">Tail call optimization</a>
+</div>
+
+<div class="doc_text">
+ <p>Tail call optimization, callee reusing the stack of the caller, is currently supported on x86/x86-64 and PowerPC. It is performed if:
+ <ul>
+ <li>Caller and callee have the calling convention <tt>fastcc</tt>.</li>
+ <li>The call is a tail call - in tail position (ret immediately follows call and ret uses value of call or is void).</li>
+ <li>Option <tt>-tailcallopt</tt> is enabled.</li>
+ <li>Platform specific constraints are met.</li>
+ </ul>
+ </p>
+ <p>x86/x86-64 constraints:
+ <ul>
+ <li>No variable argument lists are used.</li>
+ <li>On x86-64 when generating GOT/PIC code only module-local calls (visibility = hidden or protected) are supported.</li>
+ </ul>
+ </p>
+ <p>PowerPC constraints:
+ <ul>
+ <li>No variable argument lists are used.</li>
+ <li>No byval parameters are used.</li>
+ <li>On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.</li>
+ </ul>
+ </p>
+ <p>Example:</p>
+ <p>Call as <tt>llc -tailcallopt test.ll</tt>.
+ <div class="doc_code">
+ <pre>
+declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
+
+define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
+ %l1 = add i32 %in1, %in2
+ %tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1)
+ ret i32 %tmp
+}</pre>
+ </div>
+ </p>
+ <p>Implications of <tt>-tailcallopt</tt>:</p>
+ <p>To support tail call optimization in situations where the callee has more arguments than the caller a 'callee pops arguments' convention is used. This currently causes each <tt>fastcc</tt> call that is not tail call optimized (because one or more of above constraints are not met) to be followed by a readjustment of the stack. So performance might be worse in such cases.</p>
+ <p>On x86 and x86-64 one register is reserved for indirect tail calls (e.g via a function pointer). So there is one less register for integer argument passing. For x86 this means 2 registers (if <tt>inreg</tt> parameter attribute is used) and for x86-64 this means 5 register are used.</p>
+</div>
<!-- ======================================================================= -->
<div class="doc_subsection">
<a name="x86">The X86 backend</a>