</head>
<body>
-<div class="doc_title">
+<h1>
The LLVM Target-Independent Code Generator
-</div>
+</h1>
<ol>
<li><a href="#introduction">Introduction</a>
<li><a href="#ppc_prolog">Prolog/Epilog</a></li>
<li><a href="#ppc_dynamic">Dynamic Allocation</a></li>
</ul></li>
+ <li><a href="#ptx">The PTX backend</a></li>
</ul></li>
</ol>
</div>
<!-- *********************************************************************** -->
-<div class="doc_section">
+<h2>
<a name="introduction">Introduction</a>
-</div>
+</h2>
<!-- *********************************************************************** -->
-<div class="doc_text">
+<div>
<p>The LLVM target-independent code generator is a framework that provides a
suite of reusable components for translating the LLVM internal representation
depend on the target-description and machine code representation classes,
ensuring that it is portable.</p>
-</div>
-
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="required">Required components in the code generator</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The two pieces of the LLVM code generator are the high-level interface to the
code generator and the set of reusable components that can be used to build
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="high-level-design">The high-level design of the code generator</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The LLVM target-independent code generator is designed to support efficient
and quality code generation for standard register-based microprocessors.
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="tablegen">Using TableGen for target description</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The target description classes require a detailed description of the target
architecture. These target descriptions often have a large amount of common
</div>
+</div>
+
<!-- *********************************************************************** -->
-<div class="doc_section">
+<h2>
<a name="targetdesc">Target description classes</a>
-</div>
+</h2>
<!-- *********************************************************************** -->
-<div class="doc_text">
+<div>
<p>The LLVM target description classes (located in the
<tt>include/llvm/Target</tt> directory) provide an abstract description of
<tt><a href="#targetmachine">TargetMachine</a></tt> class provides accessors
that should be implemented by the target.</p>
-</div>
-
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetmachine">The <tt>TargetMachine</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetMachine</tt> class provides virtual methods that are used to
access the target-specific implementations of the various target description
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetdata">The <tt>TargetData</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetData</tt> class is the only required target description class,
and it is the only class that is not extensible (you cannot derived a new
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetlowering">The <tt>TargetLowering</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetLowering</tt> class is used by SelectionDAG based instruction
selectors primarily to describe how LLVM code should be lowered to
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetRegisterInfo</tt> class is used to describe the register file
of the target and any interactions between the registers.</p>
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetInstrInfo</tt> class is used to describe the machine
instructions supported by the target. It is essentially an array of
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetFrameInfo</tt> class is used to provide information about the
stack frame layout of the target. It holds the direction of stack growth, the
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetsubtarget">The <tt>TargetSubtarget</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetSubtarget</tt> class is used to provide information about the
specific chip set being targeted. A sub-target informs code generation of
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetJITInfo</tt> class exposes an abstract interface used by the
Just-In-Time code generator to perform target-specific activities, such as
</div>
+</div>
+
<!-- *********************************************************************** -->
-<div class="doc_section">
+<h2>
<a name="codegendesc">Machine code description classes</a>
-</div>
+</h2>
<!-- *********************************************************************** -->
-<div class="doc_text">
+<div>
<p>At the high-level, LLVM code is translated to a machine specific
representation formed out of
SSA representation for machine code, as well as a register allocated, non-SSA
form.</p>
-</div>
-
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="machineinstr">The <tt>MachineInstr</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>Target machine instructions are represented as instances of the
<tt>MachineInstr</tt> class. This class is an extremely abstract way of
<p>Also if the first operand is a def, it is easier to <a href="#buildmi">create
instructions</a> whose only def is the first operand.</p>
-</div>
-
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>Machine instructions are created by using the <tt>BuildMI</tt> functions,
located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file. The
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="fixedregs">Fixed (preassigned) registers</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>One important issue that the code generator needs to be aware of is the
presence of fixed registers. In particular, there are often places in the
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="ssa">Machine code in SSA form</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p><tt>MachineInstr</tt>'s are initially selected in SSA-form, and are
maintained in SSA-form until register allocation happens. For the most part,
</div>
+</div>
+
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="machinebasicblock">The <tt>MachineBasicBlock</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>MachineBasicBlock</tt> class contains a list of machine instructions
(<tt><a href="#machineinstr">MachineInstr</a></tt> instances). It roughly
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="machinefunction">The <tt>MachineFunction</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>MachineFunction</tt> class contains a list of machine basic blocks
(<tt><a href="#machinebasicblock">MachineBasicBlock</a></tt> instances). It
</div>
+</div>
<!-- *********************************************************************** -->
-<div class="doc_section">
+<h2>
<a name="mc">The "MC" Layer</a>
-</div>
+</h2>
<!-- *********************************************************************** -->
-<div class="doc_text">
+<div>
<p>
The MC Layer is used to represent and process code at the raw machine code
like label names, machine instructions, and sections in the object file. The
code in this layer is used for a number of important purposes: the tail end of
the code generator uses it to write a .s or .o file, and it is also used by the
-llvm-mc tool to implement standalone machine codeassemblers and disassemblers.
+llvm-mc tool to implement standalone machine code assemblers and disassemblers.
</p>
<p>
in this manual.
</p>
-</div>
-
-
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="mcstreamer">The <tt>MCStreamer</tt> API</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>
MCStreamer is best thought of as an assembler API. It is an abstract API which
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="mccontext">The <tt>MCContext</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>
The MCContext class is the owner of a variety of uniqued data structures at the
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="mcsymbol">The <tt>MCSymbol</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>
The MCSymbol class represents a symbol (aka label) in the assembly file. There
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="mcsection">The <tt>MCSection</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>
The MCSection class represents an object-file specific section. It is subclassed
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="mcinst">The <tt>MCInst</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>
The MCInst class is a target-independent representation of an instruction. It
</div>
+</div>
<!-- *********************************************************************** -->
-<div class="doc_section">
+<h2>
<a name="codegenalgs">Target-independent code generation algorithms</a>
-</div>
+</h2>
<!-- *********************************************************************** -->
-<div class="doc_text">
+<div>
<p>This section documents the phases described in the
<a href="#high-level-design">high-level design of the code generator</a>.
It explains how they work and some of the rationale behind their design.</p>
-</div>
-
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="instselect">Instruction Selection</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>Instruction Selection is the process of translating LLVM code presented to
the code generator into target-specific machine instructions. There are
selector to be generated from these <tt>.td</tt> files, though currently
there are still things that require custom C++ code.</p>
-</div>
-
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_intro">Introduction to SelectionDAGs</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The SelectionDAG provides an abstraction for code representation in a way
that is amenable to instruction selection using automatic techniques
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_process">SelectionDAG Instruction Selection Process</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>SelectionDAG-based instruction selection consists of the following steps:</p>
SelectionDAG optimizer is run to clean up redundancies exposed by type
legalization.</li>
- <li><a href="#selectiondag_legalize">Legalize SelectionDAG Types</a> —
- This stage transforms SelectionDAG nodes to eliminate any types that are
- unsupported on the target.</li>
+ <li><a href="#selectiondag_legalize">Legalize SelectionDAG Ops</a> —
+ This stage transforms SelectionDAG nodes to eliminate any operations
+ that are unsupported on the target.</li>
<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — The
SelectionDAG optimizer is run to eliminate inefficiencies introduced by
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_build">Initial SelectionDAG Construction</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The initial SelectionDAG is naïvely peephole expanded from the LLVM
input by the <tt>SelectionDAGLowering</tt> class in the
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The Legalize phase is in charge of converting a DAG to only use the types
that are natively supported by the target.</p>
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_legalize">SelectionDAG Legalize Phase</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The Legalize phase is in charge of converting a DAG to only use the
operations that are natively supported by the target.</p>
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
- <a name="selectiondag_optimize">SelectionDAG Optimization Phase: the DAG
- Combiner</a>
-</div>
+<h4>
+ <a name="selectiondag_optimize">
+ SelectionDAG Optimization Phase: the DAG Combiner
+ </a>
+</h4>
-<div class="doc_text">
+<div>
<p>The SelectionDAG optimization phase is run multiple times for code
generation, immediately after the DAG is built and once after each
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_select">SelectionDAG Select Phase</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The Select phase is the bulk of the target-specific code for instruction
selection. This phase takes a legal SelectionDAG as input, pattern matches
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_sched">SelectionDAG Scheduling and Formation Phase</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The scheduling phase takes the DAG of target instructions from the selection
phase and assigns an order. The scheduler can pick an order depending on
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_future">Future directions for the SelectionDAG</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<ol>
<li>Optional function-at-a-time selection.</li>
</div>
+</div>
+
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="ssamco">SSA-based Machine Code Optimizations</a>
-</div>
-<div class="doc_text"><p>To Be Written</p></div>
+</h3>
+<div><p>To Be Written</p></div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="liveintervals">Live Intervals</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>.
They are used by some <a href="#regalloc">register allocator</a> passes to
register are live at the same point in the program (i.e., they conflict).
When this situation occurs, one virtual register must be <i>spilled</i>.</p>
-</div>
-
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="livevariable_analysis">Live Variable Analysis</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The first step in determining the live intervals of variables is to calculate
the set of registers that are immediately dead after the instruction (i.e.,
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="liveintervals_analysis">Live Intervals Analysis</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>We now have the information available to perform the live intervals analysis
and build the live intervals themselves. We start off by numbering the basic
</div>
+</div>
+
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="regalloc">Register Allocation</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <i>Register Allocation problem</i> consists in mapping a program
<i>P<sub>v</sub></i>, that can use an unbounded number of virtual registers,
accommodate all the virtual registers, some of them will have to be mapped
into memory. These virtuals are called <i>spilled virtuals</i>.</p>
-</div>
-
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="regAlloc_represent">How registers are represented in LLVM</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>In LLVM, physical registers are denoted by integer numbers that normally
range from 1 to 1023. To see how this numbering is defined for a particular
</p>
<p>Virtual registers are also denoted by integer numbers. Contrary to physical
- registers, different virtual registers never share the same number. The
- smallest virtual register is normally assigned the number 1024. This may
- change, so, in order to know which is the first virtual register, you should
- access <tt>TargetRegisterInfo::FirstVirtualRegister</tt>. Any register whose
- number is greater than or equal
- to <tt>TargetRegisterInfo::FirstVirtualRegister</tt> is considered a virtual
- register. Whereas physical registers are statically defined in
- a <tt>TargetRegisterInfo.td</tt> file and cannot be created by the
- application developer, that is not the case with virtual registers. In order
- to create new virtual registers, use the
+ registers, different virtual registers never share the same number. Whereas
+ physical registers are statically defined in a <tt>TargetRegisterInfo.td</tt>
+ file and cannot be created by the application developer, that is not the case
+ with virtual registers. In order to create new virtual registers, use the
method <tt>MachineRegisterInfo::createVirtualRegister()</tt>. This method
- will return a virtual register with the highest code.</p>
+ will return a new virtual register. Use an <tt>IndexedMap<Foo,
+ VirtReg2IndexFunctor></tt> to hold information per virtual register. If you
+ need to enumerate all virtual registers, use the function
+ <tt>TargetRegisterInfo::index2VirtReg()</tt> to find the virtual register
+ numbers:</p>
+
+<div class="doc_code">
+<pre>
+ for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {
+ unsigned VirtReg = TargetRegisterInfo::index2VirtReg(i);
+ stuff(VirtReg);
+ }
+</pre>
+</div>
<p>Before register allocation, the operands of an instruction are mostly virtual
registers, although physical registers may also be used. In order to check if
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="regAlloc_howTo">Mapping virtual registers to physical registers</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>There are two ways to map virtual registers to physical registers (or to
memory slots). The first way, that we will call <i>direct mapping</i>, is
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="regAlloc_twoAddr">Handling two address instructions</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>With very rare exceptions (e.g., function calls), the LLVM machine code
instructions are three address instructions. That is, each instruction is
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="regAlloc_ssaDecon">The SSA deconstruction phase</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>An important transformation that happens during register allocation is called
the <i>SSA Deconstruction Phase</i>. The SSA form simplifies many analyses
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="regAlloc_fold">Instruction folding</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p><i>Instruction folding</i> is an optimization performed during register
allocation that removes unnecessary copy instructions. For instance, a
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="regAlloc_builtIn">Built in register allocators</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The LLVM infrastructure provides the application developer with three
different register allocators:</p>
<ul>
- <li><i>Linear Scan</i> — <i>The default allocator</i>. This is the
- well-know linear scan register allocator. Whereas the
- <i>Simple</i> and <i>Local</i> algorithms use a direct mapping
- implementation technique, the <i>Linear Scan</i> implementation
- uses a spiller in order to place load and stores.</li>
-
<li><i>Fast</i> — This register allocator is the default for debug
builds. It allocates registers on a basic block level, attempting to keep
values in registers and reusing registers as appropriate.</li>
+ <li><i>Basic</i> — This is an incremental approach to register
+ allocation. Live ranges are assigned to registers one at a time in
+ an order that is driven by heuristics. Since code can be rewritten
+ on-the-fly during allocation, this framework allows interesting
+ allocators to be developed as extensions. It is not itself a
+ production register allocator but is a potentially useful
+ stand-alone mode for triaging bugs and as a performance baseline.
+
+ <li><i>Greedy</i> — <i>The default allocator</i>. This is a
+ highly tuned implementation of the <i>Basic</i> allocator that
+ incorporates global live range splitting. This allocator works hard
+ to minimize the cost of spill code.
+
<li><i>PBQP</i> — A Partitioned Boolean Quadratic Programming (PBQP)
based register allocator. This allocator works by constructing a PBQP
problem representing the register allocation problem under consideration,
solving this using a PBQP solver, and mapping the solution back to a
register assignment.</li>
-
</ul>
<p>The type of register allocator used in <tt>llc</tt> can be chosen with the
</div>
+</div>
+
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="proepicode">Prolog/Epilog Code Insertion</a>
+</h3>
+
+<!-- _______________________________________________________________________ -->
+<h4>
+ <a name="compact_unwind">Compact Unwind</a>
+</h4>
+
+<div>
+
+<p>Throwing an exception requires <em>unwinding</em> out of a function. The
+ information on how to unwind a given function is traditionally expressed in
+ DWARF unwind (a.k.a. frame) info. But that format was originally developed
+ for debuggers to backtrace, and each Frame Description Entry (FDE) requires
+ ~20-30 bytes per function. There is also the cost of mapping from an address
+ in a function to the corresponding FDE at runtime. An alternative unwind
+ encoding is called <em>compact unwind</em> and requires just 4-bytes per
+ function.</p>
+
+<p>The compact unwind encoding is a 32-bit value, which is encoded in an
+ architecture-specific way. It specifies which registers to restore and from
+ where, and how to unwind out of the function. When the linker creates a final
+ linked image, it will create a <code>__TEXT,__unwind_info</code>
+ section. This section is a small and fast way for the runtime to access
+ unwind info for any given function. If we emit compact unwind info for the
+ function, that compact unwind info will be encoded in
+ the <code>__TEXT,__unwind_info</code> section. If we emit DWARF unwind info,
+ the <code>__TEXT,__unwind_info</code> section will contain the offset of the
+ FDE in the <code>__TEXT,__eh_frame</code> section in the final linked
+ image.</p>
+
+<p>For X86, there are three modes for the compact unwind encoding:</p>
+
+<dl>
+ <dt><i>Function with a Frame Pointer (<code>EBP</code> or <code>RBP</code>)</i></dt>
+ <dd><p><code>EBP/RBP</code>-based frame, where <code>EBP/RBP</code> is pushed
+ onto the stack immediately after the return address,
+ then <code>ESP/RSP</code> is moved to <code>EBP/RBP</code>. Thus to
+ unwind, <code>ESP/RSP</code> is restored with the
+ current <code>EBP/RBP</code> value, then <code>EBP/RBP</code> is restored
+ by popping the stack, and the return is done by popping the stack once
+ more into the PC. All non-volatile registers that need to be restored must
+ have been saved in a small range on the stack that
+ starts <code>EBP-4</code> to <code>EBP-1020</code> (<code>RBP-8</code>
+ to <code>RBP-1020</code>). The offset (divided by 4 in 32-bit mode and 8
+ in 64-bit mode) is encoded in bits 16-23 (mask: <code>0x00FF0000</code>).
+ The registers saved are encoded in bits 0-14
+ (mask: <code>0x00007FFF</code>) as five 3-bit entries from the following
+ table:</p>
+<table border="1" cellspacing="0">
+ <tr>
+ <th>Compact Number</th>
+ <th>i386 Register</th>
+ <th>x86-64 Regiser</th>
+ </tr>
+ <tr>
+ <td>1</td>
+ <td><code>EBX</code></td>
+ <td><code>RBX</code></td>
+ </tr>
+ <tr>
+ <td>2</td>
+ <td><code>ECX</code></td>
+ <td><code>R12</code></td>
+ </tr>
+ <tr>
+ <td>3</td>
+ <td><code>EDX</code></td>
+ <td><code>R13</code></td>
+ </tr>
+ <tr>
+ <td>4</td>
+ <td><code>EDI</code></td>
+ <td><code>R14</code></td>
+ </tr>
+ <tr>
+ <td>5</td>
+ <td><code>ESI</code></td>
+ <td><code>R15</code></td>
+ </tr>
+ <tr>
+ <td>6</td>
+ <td><code>EBP</code></td>
+ <td><code>RBP</code></td>
+ </tr>
+</table>
+
+</dd>
+
+ <dt><i>Frameless with a Small Constant Stack Size (<code>EBP</code>
+ or <code>RBP</code> is not used as a frame pointer)</i></dt>
+ <dd><p>To return, a constant (encoded in the compact unwind encoding) is added
+ to the <code>ESP/RSP</code>. Then the return is done by popping the stack
+ into the PC. All non-volatile registers that need to be restored must have
+ been saved on the stack immediately after the return address. The stack
+ size (divided by 4 in 32-bit mode and 8 in 64-bit mode) is encoded in bits
+ 16-23 (mask: <code>0x00FF0000</code>). There is a maximum stack size of
+ 1024 bytes in 32-bit mode and 2048 in 64-bit mode. The number of registers
+ saved is encoded in bits 9-12 (mask: <code>0x00001C00</code>). Bits 0-9
+ (mask: <code>0x000003FF</code>) contain which registers were saved and
+ their order. (See
+ the <code>encodeCompactUnwindRegistersWithoutFrame()</code> function
+ in <code>lib/Target/X86FrameLowering.cpp</code> for the encoding
+ algorithm.)</p></dd>
+
+ <dt><i>Frameless with a Large Constant Stack Size (<code>EBP</code>
+ or <code>RBP</code> is not used as a frame pointer)</i></dt>
+ <dd><p>This case is like the "Frameless with a Small Constant Stack Size"
+ case, but the stack size is too large to encode in the compact unwind
+ encoding. Instead it requires that the function contains "<code>subl
+ $nnnnnn, %esp</code>" in its prolog. The compact encoding contains the
+ offset to the <code>$nnnnnn</code> value in the function in bits 9-12
+ (mask: <code>0x00001C00</code>).</p></dd>
+</dl>
+
</div>
-<div class="doc_text"><p>To Be Written</p></div>
+
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="latemco">Late Machine Code Optimizations</a>
-</div>
-<div class="doc_text"><p>To Be Written</p></div>
+</h3>
+<div><p>To Be Written</p></div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="codeemit">Code Emission</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The code emission step of code generation is responsible for lowering from
the code generator abstractions (like <a
</div>
+</div>
<!-- *********************************************************************** -->
-<div class="doc_section">
+<h2>
<a name="nativeassembler">Implementing a Native Assembler</a>
-</div>
+</h2>
<!-- *********************************************************************** -->
-<div class="doc_text">
+<div>
<p>Though you're probably reading this because you want to write or maintain a
compiler backend, LLVM also fully supports building a native assemblers too.
part of the manual and repetitive data entry can be factored and shared with the
compiler.</p>
-</div>
-
<!-- ======================================================================= -->
-<div class="doc_subsection" id="na_instparsing">Instruction Parsing</div>
+<h3 id="na_instparsing">Instruction Parsing</h3>
-<div class="doc_text"><p>To Be Written</p></div>
+<div><p>To Be Written</p></div>
<!-- ======================================================================= -->
-<div class="doc_subsection" id="na_instaliases">
+<h3 id="na_instaliases">
Instruction Alias Processing
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>Once the instruction is parsed, it enters the MatchInstructionImpl function.
The MatchInstructionImpl function performs alias processing and then does
actual matching.</p>
meets the needs of your instruction, because it will allow a more concise
description.</p>
-</div>
-
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">Mnemonic Aliases</div>
+<h4>Mnemonic Aliases</h4>
-<div class="doc_text">
+<div>
<p>The first phase of alias processing is simple instruction mnemonic
remapping for classes of instructions which are allowed with two different
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">Instruction Aliases</div>
+<h4>Instruction Aliases</h4>
-<div class="doc_text">
+<div>
<p>The most general phase of alias processing occurs while matching is
happening: it provides new forms for the matcher to match along with a specific
<p>Instruction aliases can also have a Requires clause to make them
subtarget specific.</p>
-</div>
+<p>If the back-end supports it, the instruction printer can automatically emit
+ the alias rather than what's being aliased. It typically leads to better,
+ more readable code. If it's better to print out what's being aliased, then
+ pass a '0' as the third parameter to the InstAlias definition.</p>
+</div>
+</div>
<!-- ======================================================================= -->
-<div class="doc_subsection" id="na_matching">Instruction Matching</div>
-
-<div class="doc_text"><p>To Be Written</p></div>
-
+<h3 id="na_matching">Instruction Matching</h3>
+<div><p>To Be Written</p></div>
+</div>
<!-- *********************************************************************** -->
-<div class="doc_section">
+<h2>
<a name="targetimpls">Target-specific Implementation Notes</a>
-</div>
+</h2>
<!-- *********************************************************************** -->
-<div class="doc_text">
+<div>
<p>This section of the document explains features or design decisions that are
specific to the code generator for a particular target. First we start
with a table that summarizes what features are supported by each target.</p>
-</div>
-
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetfeatures">Target Feature Matrix</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>Note that this table does not include the C backend or Cpp backends, since
they do not use the target independent code generator infrastructure. It also
<th>PTX</th>
<th>PowerPC</th>
<th>Sparc</th>
- <th>SystemZ</th>
<th>X86</th>
<th>XCore</th>
</tr>
<td class="no"></td> <!-- CellSPU -->
<td class="no"></td> <!-- MBlaze -->
<td class="unknown"></td> <!-- MSP430 -->
- <td class="unknown"></td> <!-- Mips -->
+ <td class="no"></td> <!-- Mips -->
<td class="no"></td> <!-- PTX -->
<td class="yes"></td> <!-- PowerPC -->
<td class="yes"></td> <!-- Sparc -->
- <td class="unknown"></td> <!-- SystemZ -->
<td class="yes"></td> <!-- X86 -->
<td class="unknown"></td> <!-- XCore -->
</tr>
<td class="no"></td> <!-- Alpha -->
<td class="no"></td> <!-- Blackfin -->
<td class="no"></td> <!-- CellSPU -->
- <td class="no"></td> <!-- MBlaze -->
+ <td class="yes"></td> <!-- MBlaze -->
<td class="no"></td> <!-- MSP430 -->
<td class="no"></td> <!-- Mips -->
<td class="no"></td> <!-- PTX -->
<td class="no"></td> <!-- PowerPC -->
<td class="no"></td> <!-- Sparc -->
- <td class="no"></td> <!-- SystemZ -->
<td class="yes"></td> <!-- X86 -->
<td class="no"></td> <!-- XCore -->
</tr>
<td class="no"></td> <!-- Alpha -->
<td class="no"></td> <!-- Blackfin -->
<td class="no"></td> <!-- CellSPU -->
- <td class="no"></td> <!-- MBlaze -->
+ <td class="yes"></td> <!-- MBlaze -->
<td class="no"></td> <!-- MSP430 -->
<td class="no"></td> <!-- Mips -->
<td class="no"></td> <!-- PTX -->
<td class="no"></td> <!-- PowerPC -->
<td class="no"></td> <!-- Sparc -->
- <td class="no"></td> <!-- SystemZ -->
<td class="yes"></td> <!-- X86 -->
<td class="no"></td> <!-- XCore -->
</tr>
<td class="unknown"></td> <!-- Alpha -->
<td class="yes"></td> <!-- Blackfin -->
<td class="no"></td> <!-- CellSPU -->
- <td class="no"></td> <!-- MBlaze -->
+ <td class="yes"></td> <!-- MBlaze -->
<td class="unknown"></td> <!-- MSP430 -->
- <td class="unknown"></td> <!-- Mips -->
+ <td class="no"></td> <!-- Mips -->
<td class="unknown"></td> <!-- PTX -->
<td class="yes"></td> <!-- PowerPC -->
<td class="unknown"></td> <!-- Sparc -->
- <td class="unknown"></td> <!-- SystemZ -->
- <td class="yes"><a href="#feat_inlineasm_x86">*</a></td> <!-- X86 -->
+ <td class="yes"></td> <!-- X86 -->
<td class="unknown"></td> <!-- XCore -->
</tr>
<tr>
<td><a href="#feat_jit">jit</a></td>
<td class="partial"><a href="#feat_jit_arm">*</a></td> <!-- ARM -->
- <td class="unknown"></td> <!-- Alpha -->
+ <td class="no"></td> <!-- Alpha -->
<td class="no"></td> <!-- Blackfin -->
<td class="no"></td> <!-- CellSPU -->
<td class="no"></td> <!-- MBlaze -->
<td class="unknown"></td> <!-- MSP430 -->
- <td class="unknown"></td> <!-- Mips -->
+ <td class="no"></td> <!-- Mips -->
<td class="unknown"></td> <!-- PTX -->
<td class="yes"></td> <!-- PowerPC -->
<td class="unknown"></td> <!-- Sparc -->
- <td class="unknown"></td> <!-- SystemZ -->
<td class="yes"></td> <!-- X86 -->
<td class="unknown"></td> <!-- XCore -->
</tr>
<td class="no"></td> <!-- Alpha -->
<td class="no"></td> <!-- Blackfin -->
<td class="no"></td> <!-- CellSPU -->
- <td class="no"></td> <!-- MBlaze -->
+ <td class="yes"></td> <!-- MBlaze -->
<td class="no"></td> <!-- MSP430 -->
<td class="no"></td> <!-- Mips -->
<td class="no"></td> <!-- PTX -->
<td class="no"></td> <!-- PowerPC -->
<td class="no"></td> <!-- Sparc -->
- <td class="no"></td> <!-- SystemZ -->
<td class="yes"></td> <!-- X86 -->
<td class="no"></td> <!-- XCore -->
</tr>
<td class="no"></td> <!-- CellSPU -->
<td class="no"></td> <!-- MBlaze -->
<td class="unknown"></td> <!-- MSP430 -->
- <td class="unknown"></td> <!-- Mips -->
+ <td class="no"></td> <!-- Mips -->
<td class="unknown"></td> <!-- PTX -->
<td class="yes"></td> <!-- PowerPC -->
<td class="unknown"></td> <!-- Sparc -->
- <td class="unknown"></td> <!-- SystemZ -->
<td class="yes"></td> <!-- X86 -->
<td class="unknown"></td> <!-- XCore -->
</tr>
</table>
-</div>
-
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection" id="feat_reliable">Is Generally Reliable</div>
+<h4 id="feat_reliable">Is Generally Reliable</h4>
-<div class="doc_text">
+<div>
<p>This box indicates whether the target is considered to be production quality.
This indicates that the target has been used as a static compiler to
compile large amounts of code by a variety of different people and is in
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection" id="feat_asmparser">Assembly Parser</div>
+<h4 id="feat_asmparser">Assembly Parser</h4>
-<div class="doc_text">
+<div>
<p>This box indicates whether the target supports parsing target specific .s
files by implementing the MCAsmParser interface. This is required for llvm-mc
to be able to act as a native assembler and is required for inline assembly
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection" id="feat_disassembler">Disassembler</div>
+<h4 id="feat_disassembler">Disassembler</h4>
-<div class="doc_text">
+<div>
<p>This box indicates whether the target supports the MCDisassembler API for
disassembling machine opcode bytes into MCInst's.</p>
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection" id="feat_inlineasm">Inline Asm</div>
+<h4 id="feat_inlineasm">Inline Asm</h4>
-<div class="doc_text">
+<div>
<p>This box indicates whether the target supports most popular inline assembly
constraints and modifiers.</p>
-<p id="feat_inlineasm_x86">X86 lacks reliable support for inline assembly
-constraints relating to the X86 floating point stack.</p>
-
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection" id="feat_jit">JIT Support</div>
+<h4 id="feat_jit">JIT Support</h4>
-<div class="doc_text">
+<div>
<p>This box indicates whether the target supports the JIT compiler through
the ExecutionEngine interface.</p>
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection" id="feat_objectwrite">.o File Writing</div>
+<h4 id="feat_objectwrite">.o File Writing</h4>
-<div class="doc_text">
+<div>
<p>This box indicates whether the target supports writing .o files (e.g. MachO,
ELF, and/or COFF) files directly from the target. Note that the target also
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection" id="feat_tailcall">Tail Calls</div>
+<h4 id="feat_tailcall">Tail Calls</h4>
-<div class="doc_text">
+<div>
<p>This box indicates whether the target supports guaranteed tail calls. These
are calls marked "<a href="LangRef.html#i_call">tail</a>" and use the fastcc
</div>
-
-
+</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="tailcallopt">Tail call optimization</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>Tail call optimization, callee reusing the stack of the caller, is currently
supported on x86/x86-64 and PowerPC. It is performed if:</p>
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="sibcallopt">Sibling call optimization</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>Sibling call optimization is a restricted form of tail call optimization.
Unlike tail call optimization described in the previous section, it can be
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="x86">The X86 backend</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory. This
code generator is capable of targeting a variety of x86-32 and x86-64
processors, and includes support for ISA extensions such as MMX and SSE.</p>
-</div>
-
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="x86_tt">X86 Target Triples supported</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The following are the known target triples that are supported by the X86
backend. This is not an exhaustive list, and it would be useful to add those
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="x86_cc">X86 Calling Conventions supported</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The following target-specific calling conventions are known to backend:</p>
<ul>
- <li><b>x86_StdCall</b> — stdcall calling convention seen on Microsoft
- Windows platform (CC ID = 64).</li>
-
- <li><b>x86_FastCall</b> — fastcall calling convention seen on Microsoft
- Windows platform (CC ID = 65).</li>
+<li><b>x86_StdCall</b> — stdcall calling convention seen on Microsoft
+ Windows platform (CC ID = 64).</li>
+<li><b>x86_FastCall</b> — fastcall calling convention seen on Microsoft
+ Windows platform (CC ID = 65).</li>
+<li><b>x86_ThisCall</b> — Similar to X86_StdCall. Passes first argument
+ in ECX, others via stack. Callee is responsible for stack cleaning. This
+ convention is used by MSVC by default for methods in its ABI
+ (CC ID = 70).</li>
</ul>
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The x86 has a very flexible way of accessing memory. It is capable of
forming memory addresses of the following expression directly in integer
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="x86_memory">X86 address spaces supported</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
-<p>x86 has an experimental feature which provides
+<p>x86 has a feature which provides
the ability to perform loads and stores to different address spaces
via the x86 segment registers. A segment override prefix byte on an
instruction causes the instruction's memory access to go to the specified
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="x86_names">Instruction naming</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>An instruction name consists of the base name, a default operand size, and a
a character per operand with an optional special size. For example:</p>
</div>
+</div>
+
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="ppc">The PowerPC backend</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The PowerPC code generator lives in the lib/Target/PowerPC directory. The
code generation is retargetable to several variations or <i>subtargets</i> of
the PowerPC ISA; including ppc32, ppc64 and altivec.</p>
-</div>
-
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="ppc_abi">LLVM PowerPC ABI</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC
relative (PIC) or static addressing for accessing global values, so no TOC
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="ppc_frame">Frame Layout</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The size of a PowerPC frame is usually fixed for the duration of a
function's invocation. Since the frame is fixed size, all references
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="ppc_prolog">Prolog/Epilog</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with
the following exceptions. Callee saved registers are spilled after the frame
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="ppc_dynamic">Dynamic Allocation</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p><i>TODO - More to come.</i></p>
</div>
+</div>
+
+<!-- ======================================================================= -->
+<h3>
+ <a name="ptx">The PTX backend</a>
+</h3>
+
+<div>
+
+<p>The PTX code generator lives in the lib/Target/PTX directory. It is
+ currently a work-in-progress, but already supports most of the code
+ generation functionality needed to generate correct PTX kernels for
+ CUDA devices.</p>
+
+<p>The code generator can target PTX 2.0+, and shader model 1.0+. The
+ PTX ISA Reference Manual is used as the primary source of ISA
+ information, though an effort is made to make the output of the code
+ generator match the output of the NVidia nvcc compiler, whenever
+ possible.</p>
+
+<p>Code Generator Options:</p>
+<table border="1" cellspacing="0">
+ <tr>
+ <th>Option</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <td><code>double</code></td>
+ <td align="left">If enabled, the map_f64_to_f32 directive is
+ disabled in the PTX output, allowing native double-precision
+ arithmetic</td>
+ </tr>
+ <tr>
+ <td><code>no-fma</code></td>
+ <td align="left">Disable generation of Fused-Multiply Add
+ instructions, which may be beneficial for some devices</td>
+ </tr>
+ <tr>
+ <td><code>smxy / computexy</code></td>
+ <td align="left">Set shader model/compute capability to x.y,
+ e.g. sm20 or compute13</td>
+ </tr>
+</table>
+
+<p>Working:</p>
+<ul>
+ <li>Arithmetic instruction selection (including combo FMA)</li>
+ <li>Bitwise instruction selection</li>
+ <li>Control-flow instruction selection</li>
+ <li>Function calls (only on SM 2.0+ and no return arguments)</li>
+ <li>Addresses spaces (0 = global, 1 = constant, 2 = local, 4 =
+ shared)</li>
+ <li>Thread synchronization (bar.sync)</li>
+ <li>Special register reads ([N]TID, [N]CTAID, PMx, CLOCK, etc.)</li>
+</ul>
+
+<p>In Progress:</p>
+<ul>
+ <li>Robust call instruction selection</li>
+ <li>Stack frame allocation</li>
+ <li>Device-specific instruction scheduling optimizations</li>
+</ul>
+
+
+</div>
+
+</div>
<!-- *********************************************************************** -->
<hr>
src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
+ <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br>
Last modified: $Date$
</address>