Packed types, brought to you by Brad Jones

[oota-llvm.git] / docs / CodeGenerator.html
diff --git a/docs/CodeGenerator.html b/docs/CodeGenerator.html

index 4729cc089f6cee113d48868a0cc34a7b9dfd46c3..dfdc0528455a29710973b1b65d8dadf899c47c8c 100644 (file)
--- a/docs/CodeGenerator.html
+++ b/docs/CodeGenerator.html
@@ -30,13 +30,16 @@
      </ul>
    </li>
    <li><a href="#codegendesc">Machine code description classes</a>
+    <ul>
+      <li><a href="#machineinstr">The <tt>MachineInstr</tt> class</a></li>
+    </ul>
    </li>
    <li><a href="#codegenalgs">Target-independent code generation algorithms</a>
    </li>
    <li><a href="#targetimpls">Target description implementations</a>
      <ul>
        <li><a href="#x86">The X86 backend</a></li>
-    </li>
+    </ul>
    </li>
  
  </ol>
@@ -45,6 +48,10 @@
    <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
  </div>
  
+<div class="doc_warning">
+  <p>Warning: This is a work in progress.</p>
+</div>
+
  <!-- *********************************************************************** -->
  <div class="doc_section">
    <a name="introduction">Introduction</a>
@@ -57,16 +64,16 @@
  suite of reusable components for translating the LLVM internal representation to
  the machine code for a specified target -- either in assembly form (suitable for
  a static compiler) or in binary machine code format (usable for a JIT compiler).
-The LLVM target-independent code generator consists of four main components:</p>
+The LLVM target-independent code generator consists of five main components:</p>
  
  <ol>
  <li><a href="#targetdesc">Abstract target description</a> interfaces which
-capture improtant properties about various aspects of the machine independently
+capture important properties about various aspects of the machine, independently
  of how they will be used.  These interfaces are defined in
  <tt>include/llvm/Target/</tt>.</li>
  
  <li>Classes used to represent the <a href="#codegendesc">machine code</a> being
-generator for a target.  These classes are intended to be abstract enough to
+generated for a target.  These classes are intended to be abstract enough to
  represent the machine code for <i>any</i> target machine.  These classes are
  defined in <tt>include/llvm/CodeGen/</tt>.</li>
  
@@ -80,6 +87,11 @@ the components provided by LLVM, and can optionally provide custom
  target-specific passes, to build complete code generators for a specific target.
  Target descriptions live in <tt>lib/Target/</tt>.</li>
  
+<li><a href="#jit">The target-independent JIT components</a>.  The LLVM JIT is
+completely target independent (it uses the <tt>TargetJITInfo</tt> structure to
+interface for target-specific issues.  The code for the target-independent
+JIT lives in <tt>lib/ExecutionEngine/JIT</tt>.</li>
+
  </ol>
  
  <p>
@@ -87,8 +99,8 @@ Depending on which part of the code generator you are interested in working on,
  different pieces of this will be useful to you.  In any case, you should be
  familiar with the <a href="#targetdesc">target description</a> and <a
  href="#codegendesc">machine code representation</a> classes.  If you want to add
-a backend for a new target, you will need <a href="#targetimpls">implement the
-targe description</a> classes for your new target and understand the <a
+a backend for a new target, you will need to <a href="#targetimpls">implement the
+target description</a> classes for your new target and understand the <a
  href="LangRef.html">LLVM code representation</a>.  If you are interested in
  implementing a new <a href="#codegenalgs">code generation algorithm</a>, it
  should only depend on the target-description and machine code representation
@@ -121,17 +133,28 @@ implements these two interfaces, and does its own thing.  Another example of a
  code generator like this is a (purely hypothetical) backend that converts LLVM
  to the GCC RTL form and uses GCC to emit machine code for a target.</p>
  
-<p>The other implication of this design is that it is possible to design and
+<p>This design also implies that it is possible to design and
  implement radically different code generators in the LLVM system that do not
  make use of any of the built-in components.  Doing so is not recommended at all,
  but could be required for radically different targets that do not fit into the
  LLVM machine description model: programmable FPGAs for example.</p>
-</p>
+
+<p><b>Important Note:</b> For historical reasons, the LLVM SparcV9 code
+generator uses almost entirely different code paths than described in this
+document.  For this reason, there are some deprecated interfaces (such as
+<tt>TargetRegInfo</tt> and <tt>TargetSchedInfo</tt>), which are only used by the
+V9 backend and should not be used by any other targets.  Also, all code in the
+<tt>lib/Target/SparcV9</tt> directory and subdirectories should be considered
+deprecated, and should not be used as the basis for future code generator work.
+The SparcV9 backend is slowly being merged into the rest of the
+target-independent code generators, but this is a low-priority process with no
+predictable completion date.</p>
+
  </div>
  
  <!-- ======================================================================= -->
  <div class="doc_subsection">
- <a name="high-level-design">The high-level design of the code generator</a></li>
+ <a name="high-level-design">The high-level design of the code generator</a>
  </div>
  
  <div class="doc_text">
@@ -141,9 +164,9 @@ quality code generation for standard register-based microprocessors.  Code
  generation in this model is divided into the following stages:</p>
  
  <ol>
-<li><b>Instruction Selection</b> - Determining a efficient implementation of the
+<li><b>Instruction Selection</b> - Determining an efficient implementation of the
  input LLVM code in the target instruction set.  This stage produces the initial
-code for the program in the target instruction set the makes use of virtual
+code for the program in the target instruction set, then makes use of virtual
  registers in SSA form and physical registers that represent any required
  register assignments due to target constraints or calling conventions.</li>
  
@@ -168,7 +191,7 @@ elimination and stack packing.</li>
  "final" machine code can go here, such as spill code scheduling and peephole
  optimizations.</li>
  
-<li><b>Code Emission</b> - The final stage actually outputs the machine code for
+<li><b>Code Emission</b> - The final stage actually outputs the code for
  the current function, either in the target assembler format or in machine
  code.</li>
  
@@ -177,11 +200,13 @@ code.</li>
  <p>
  The code generator is based on the assumption that the instruction selector will
  use an optimal pattern matching selector to create high-quality sequences of
-native code.  Alternative code generator designs based on pattern expansion and
-aggressive iterative peephole optimization are much slower.  This design is
-designed to permit efficient compilation (important for JIT environments) and
-aggressive optimization (used when generate code offline) by allowing components
-of varying levels of sophisication to be used for any step of compilation.</p>
+native instructions.  Alternative code generator designs based on pattern 
+expansion and
+aggressive iterative peephole optimization are much slower.  This design 
+permits efficient compilation (important for JIT environments) and
+aggressive optimization (used when generating code offline) by allowing 
+components of varying levels of sophisication to be used for any step of 
+compilation.</p>
  
  <p>
  In addition to these stages, target implementations can insert arbitrary
@@ -195,17 +220,18 @@ targets with unusual requirements can be supported with custom passes as needed.
  
  <!-- ======================================================================= -->
  <div class="doc_subsection">
- <a name="tablegen">Using TableGen for target description</a></li>
+ <a name="tablegen">Using TableGen for target description</a>
  </div>
  
  <div class="doc_text">
  
-<p>The target description classes require a detailed descriptions of the target
+<p>The target description classes require a detailed description of the target
  architecture.  These target descriptions often have a large amount of common
  information (e.g., an add instruction is almost identical to a sub instruction).
  In order to allow the maximum amount of commonality to be factored out, the LLVM
  code generator uses the <a href="TableGenFundamentals.html">TableGen</a> tool to
-allow 
+describe big chunks of the target machine, which allows the use of domain- and 
+target-specific abstractions to reduce the amount of repetition.
  </p>
  
  </div>
@@ -229,7 +255,7 @@ as inputs or other algorithm-specific data structures).</p>
  <p>All of the target description classes (except the <tt><a
  href="#targetdata">TargetData</a></tt> class) are designed to be subclassed by
  the concrete target implementation, and have virtual methods implemented.  To
-get to these implementations, <tt><a
+get to these implementations, the <tt><a
  href="#targetmachine">TargetMachine</a></tt> class provides accessors that
  should be implemented by the target.</p>
  
@@ -245,7 +271,7 @@ should be implemented by the target.</p>
  <p>The <tt>TargetMachine</tt> class provides virtual methods that are used to
  access the target-specific implementations of the various target description
  classes (with the <tt>getInstrInfo</tt>, <tt>getRegisterInfo</tt>,
-<tt>getFrameInfo</tt>, ... methods).  This class is designed to be subclassed by
+<tt>getFrameInfo</tt>, ... methods).  This class is designed to be specialized by
  a concrete target implementation (e.g., <tt>X86TargetMachine</tt>) which
  implements the various virtual methods.  The only required target description
  class is the <a href="#targetdata"><tt>TargetData</tt></a> class, but if the
@@ -273,7 +299,7 @@ target, and whether the target is little- or big-endian.</p>
  
  <!-- ======================================================================= -->
  <div class="doc_subsection">
-  <a name="mregisterinfo">The <tt>MRegisterInfo</tt> class</a></li>
+  <a name="mregisterinfo">The <tt>MRegisterInfo</tt> class</a>
  </div>
  
  <div class="doc_text">
@@ -310,17 +336,17 @@ href="TableGenFundamentals.html">TableGen</a> description of the register file.
  
  <!-- ======================================================================= -->
  <div class="doc_subsection">
-  <a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a></li>
+  <a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a>
  </div>
  
  <!-- ======================================================================= -->
  <div class="doc_subsection">
-  <a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a></li>
+  <a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a>
  </div>
  
  <!-- ======================================================================= -->
  <div class="doc_subsection">
-  <a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a></li>
+  <a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a>
  </div>
  
  <!-- *********************************************************************** -->
@@ -329,7 +355,277 @@ href="TableGenFundamentals.html">TableGen</a> description of the register file.
  </div>
  <!-- *********************************************************************** -->
  
+<div class="doc_text">
+
+<p>
+At the high-level, LLVM code is translated to a machine specific representation
+formed out of MachineFunction, MachineBasicBlock, and <a 
+href="#machineinstr"><tt>MachineInstr</tt></a> instances
+(defined in include/llvm/CodeGen).  This representation is completely target
+agnostic, representing instructions in their most abstract form: an opcode and a
+series of operands.  This representation is designed to support both SSA
+representation for machine code, as well as a register allocated, non-SSA form.
+</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+  <a name="machineinstr">The <tt>MachineInstr</tt> class</a>
+</div>
+
+<div class="doc_text">
+
+<p>Target machine instructions are represented as instances of the
+<tt>MachineInstr</tt> class.  This class is an extremely abstract way of
+representing machine instructions.  In particular, all it keeps track of is 
+an opcode number and some number of operands.</p>
+
+<p>The opcode number is an simple unsigned number that only has meaning to a 
+specific backend.  All of the instructions for a target should be defined in 
+the <tt>*InstrInfo.td</tt> file for the target, and the opcode enum values
+are autogenerated from this description.  The <tt>MachineInstr</tt> class does
+not have any information about how to intepret the instruction (i.e., what the 
+semantics of the instruction are): for that you must refer to the 
+<tt><a href="#targetinstrinfo">TargetInstrInfo</a></tt> class.</p> 
+
+<p>The operands of a machine instruction can be of several different types:
+they can be a register reference, constant integer, basic block reference, etc.
+In addition, a machine operand should be marked as a def or a use of the value
+(though only registers are allowed to be defs).</p>
+
+<p>By convention, the LLVM code generator orders instruction operands so that
+all register definitions come before the register uses, even on architectures
+that are normally printed in other orders.  For example, the sparc add 
+instruction: "<tt>add %i1, %i2, %i3</tt>" adds the "%i1", and "%i2" registers
+and stores the result into the "%i3" register.  In the LLVM code generator,
+the operands should be stored as "<tt>%i3, %i1, %i2</tt>": with the destination
+first.</p>
+
+<p>Keeping destination operands at the beginning of the operand list has several
+advantages.  In particular, the debugging printer will print the instruction 
+like this:</p>
+
+<pre>
+  %r3 = add %i1, %i2
+</pre>
+
+<p>If the first operand is a def, and it is also easier to <a 
+href="#buildmi">create instructions</a> whose only def is the first 
+operand.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+  <a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a>
+</div>
+
+<div class="doc_text">
+
+<p>Machine instructions are created by using the <tt>BuildMI</tt> functions,
+located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file.  The
+<tt>BuildMI</tt> functions make it easy to build arbitrary machine 
+instructions.  Usage of the <tt>BuildMI</tt> functions look like this: 
+</p>
+
+<pre>
+  // Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
+  // instruction.  The '1' specifies how many operands will be added.
+  MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42);
+
+  // Create the same instr, but insert it at the end of a basic block.
+  MachineBasicBlock &amp;MBB = ...
+  BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42);
+
+  // Create the same instr, but insert it before a specified iterator point.
+  MachineBasicBlock::iterator MBBI = ...
+  BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42);
+
+  // Create a 'cmp Reg, 0' instruction, no destination reg.
+  MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0);
+  // Create an 'sahf' instruction which takes no operands and stores nothing.
+  MI = BuildMI(X86::SAHF, 0);
+
+  // Create a self looping branch instruction.
+  BuildMI(MBB, X86::JNE, 1).addMBB(&amp;MBB);
+</pre>
+
+<p>
+The key thing to remember with the <tt>BuildMI</tt> functions is that you have
+to specify the number of operands that the machine instruction will take 
+(allowing efficient memory allocation).  Also, if operands default to be uses
+of values, not definitions.  If you need to add a definition operand (other 
+than the optional destination register), you must explicitly mark it as such.
+</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+  <a name="fixedregs">Fixed (aka preassigned) registers</a>
+</div>
+
+<div class="doc_text">
+
+<p>One important issue that the code generator needs to be aware of is the
+presence of fixed registers.  In particular, there are often places in the 
+instruction stream where the register allocator <em>must</em> arrange for a
+particular value to be in a particular register.  This can occur due to 
+limitations in the instruction set (e.g., the X86 can only do a 32-bit divide 
+with the <tt>EAX</tt>/<tt>EDX</tt> registers), or external factors like calling
+conventions.  In any case, the instruction selector should emit code that 
+copies a virtual register into or out of a physical register when needed.</p>
+
+<p>For example, consider this simple LLVM example:</p>
+
+<pre>
+  int %test(int %X, int %Y) {
+    %Z = div int %X, %Y
+    ret int %Z
+  }
+</pre>
+
+<p>The X86 instruction selector produces this machine code for the div 
+and ret (use 
+"<tt>llc X.bc -march=x86 -print-machineinstrs</tt>" to get this):</p>
+
+<pre>
+        ;; Start of div
+        %EAX = mov %reg1024           ;; Copy X (in reg1024) into EAX
+        %reg1027 = sar %reg1024, 31
+        %EDX = mov %reg1027           ;; Sign extend X into EDX
+        idiv %reg1025                 ;; Divide by Y (in reg1025)
+        %reg1026 = mov %EAX           ;; Read the result (Z) out of EAX
+
+        ;; Start of ret
+        %EAX = mov %reg1026           ;; 32-bit return value goes in EAX
+        ret
+</pre>
+
+<p>By the end of code generation, the register allocator has coallesced
+the registers and deleted the resultant identity moves, producing the
+following code:</p>
+
+<pre>
+        ;; X is in EAX, Y is in ECX
+        mov %EAX, %EDX
+        sar %EDX, 31
+        idiv %ECX
+        ret 
+</pre>
+
+<p>This approach is extremely general (if it can handle the X86 architecture, 
+it can handle anything!) and allows all of the target specific
+knowledge about the instruction stream to be isolated in the instruction 
+selector.  Note that physical registers should have a short lifetime for good 
+code generation, and all physical registers are assumed dead on entry and
+exit of basic blocks (before register allocation).  Thus if you need a value
+to be live across basic block boundaries, it <em>must</em> live in a virtual 
+register.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+  <a name="ssa">Machine code SSA form</a>
+</div>
+
+<div class="doc_text">
+
+<p><tt>MachineInstr</tt>'s are initially instruction selected in SSA-form, and
+are maintained in SSA-form until register allocation happens.  For the most 
+part, this is trivially simple since LLVM is already in SSA form: LLVM PHI nodes
+become machine code PHI nodes, and virtual registers are only allowed to have a
+single definition.</p>
+
+<p>After register allocation, machine code is no longer in SSA-form, as there 
+are no virtual registers left in the code.</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section">
+  <a name="targetimpls">Target description implementations</a>
+</div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>This section of the document explains any features or design decisions that
+are specific to the code generator for a particular target.</p>
+
+</div>
+
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+  <a name="x86">The X86 backend</a>
+</div>
+
+<div class="doc_text">
+
+<p>
+The X86 code generator lives in the <tt>lib/Target/X86</tt> directory.  This
+code generator currently targets a generic P6-like processor.  As such, it
+produces a few P6-and-above instructions (like conditional moves), but it does
+not make use of newer features like MMX or SSE.  In the future, the X86 backend
+will have subtarget support added for specific processor families and 
+implementations.</p>
  
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+  <a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
+</div>
+
+<div class="doc_text">
+
+<p>
+The x86 has a very, uhm, flexible, way of accessing memory.  It is capable of
+forming memory addresses of the following expression directly in integer
+instructions (which use ModR/M addressing):</p>
+
+<pre>
+   Base+[1,2,4,8]*IndexReg+Disp32
+</pre>
+
+<p>Wow, that's crazy.  In order to represent this, LLVM tracks no less that 4
+operands for each memory operand of this form.  This means that the "load" form
+of 'mov' has the following "Operands" in this order:</p>
+
+<pre>
+Index:        0     |    1        2       3           4
+Meaning:   DestReg, | BaseReg,  Scale, IndexReg, Displacement
+OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg,   SignExtImm
+</pre>
+
+<p>Stores and all other instructions treat the four memory operands in the same
+way, in the same order.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+  <a name="x86_names">Instruction naming</a>
+</div>
+
+<div class="doc_text">
+
+<p>
+An instruction name consists of the base name, a default operand size
+followed by a character per operand with an optional special size. For
+example:</p>
+
+<p>
+<tt>ADD8rr</tt> -&gt; add, 8-bit register, 8-bit register<br>
+<tt>IMUL16rmi</tt> -&gt; imul, 16-bit register, 16-bit memory, 16-bit immediate<br>
+<tt>IMUL16rmi8</tt> -&gt; imul, 16-bit register, 16-bit memory, 8-bit immediate<br>
+<tt>MOVSX32rm16</tt> -&gt; movsx, 32-bit register, 16-bit memory
+</p>
+
+</div>
  
  <!-- *********************************************************************** -->
  <hr>