+<div class="doc_code">
+<pre>
+ADD8rr -> add, 8-bit register, 8-bit register
+IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate
+IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate
+MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
+</pre>
+</div>
+
+</div>
+
+</div>
+
+<!-- ======================================================================= -->
+<h3>
+ <a name="ppc">The PowerPC backend</a>
+</h3>
+
+<div>
+
+<p>The PowerPC code generator lives in the lib/Target/PowerPC directory. The
+ code generation is retargetable to several variations or <i>subtargets</i> of
+ the PowerPC ISA; including ppc32, ppc64 and altivec.</p>
+
+<!-- _______________________________________________________________________ -->
+<h4>
+ <a name="ppc_abi">LLVM PowerPC ABI</a>
+</h4>
+
+<div>
+
+<p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC
+ relative (PIC) or static addressing for accessing global values, so no TOC
+ (r2) is used. Second, r31 is used as a frame pointer to allow dynamic growth
+ of a stack frame. LLVM takes advantage of having no TOC to provide space to
+ save the frame pointer in the PowerPC linkage area of the caller frame.
+ Other details of PowerPC ABI can be found at <a href=
+ "http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html"
+ >PowerPC ABI.</a> Note: This link describes the 32 bit ABI. The 64 bit ABI
+ is similar except space for GPRs are 8 bytes wide (not 4) and r13 is reserved
+ for system use.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<h4>
+ <a name="ppc_frame">Frame Layout</a>
+</h4>
+
+<div>
+
+<p>The size of a PowerPC frame is usually fixed for the duration of a
+ function's invocation. Since the frame is fixed size, all references
+ into the frame can be accessed via fixed offsets from the stack pointer. The
+ exception to this is when dynamic alloca or variable sized arrays are
+ present, then a base pointer (r31) is used as a proxy for the stack pointer
+ and stack pointer is free to grow or shrink. A base pointer is also used if
+ llvm-gcc is not passed the -fomit-frame-pointer flag. The stack pointer is
+ always aligned to 16 bytes, so that space allocated for altivec vectors will
+ be properly aligned.</p>
+
+<p>An invocation frame is laid out as follows (low memory at top);</p>
+
+<table class="layout">
+ <tr>
+ <td>Linkage<br><br></td>
+ </tr>
+ <tr>
+ <td>Parameter area<br><br></td>
+ </tr>
+ <tr>
+ <td>Dynamic area<br><br></td>
+ </tr>
+ <tr>
+ <td>Locals area<br><br></td>
+ </tr>
+ <tr>
+ <td>Saved registers area<br><br></td>
+ </tr>
+ <tr style="border-style: none hidden none hidden;">
+ <td><br></td>
+ </tr>
+ <tr>
+ <td>Previous Frame<br><br></td>
+ </tr>
+</table>
+
+<p>The <i>linkage</i> area is used by a callee to save special registers prior
+ to allocating its own frame. Only three entries are relevant to LLVM. The
+ first entry is the previous stack pointer (sp), aka link. This allows
+ probing tools like gdb or exception handlers to quickly scan the frames in
+ the stack. A function epilog can also use the link to pop the frame from the
+ stack. The third entry in the linkage area is used to save the return
+ address from the lr register. Finally, as mentioned above, the last entry is
+ used to save the previous frame pointer (r31.) The entries in the linkage
+ area are the size of a GPR, thus the linkage area is 24 bytes long in 32 bit
+ mode and 48 bytes in 64 bit mode.</p>
+
+<p>32 bit linkage area</p>
+
+<table class="layout">
+ <tr>
+ <td>0</td>
+ <td>Saved SP (r1)</td>
+ </tr>
+ <tr>
+ <td>4</td>
+ <td>Saved CR</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ <td>Saved LR</td>
+ </tr>
+ <tr>
+ <td>12</td>
+ <td>Reserved</td>
+ </tr>
+ <tr>
+ <td>16</td>
+ <td>Reserved</td>
+ </tr>
+ <tr>
+ <td>20</td>
+ <td>Saved FP (r31)</td>
+ </tr>
+</table>
+
+<p>64 bit linkage area</p>
+
+<table class="layout">
+ <tr>
+ <td>0</td>
+ <td>Saved SP (r1)</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ <td>Saved CR</td>
+ </tr>
+ <tr>
+ <td>16</td>
+ <td>Saved LR</td>
+ </tr>
+ <tr>
+ <td>24</td>
+ <td>Reserved</td>
+ </tr>
+ <tr>
+ <td>32</td>
+ <td>Reserved</td>
+ </tr>
+ <tr>
+ <td>40</td>
+ <td>Saved FP (r31)</td>
+ </tr>
+</table>
+
+<p>The <i>parameter area</i> is used to store arguments being passed to a callee
+ function. Following the PowerPC ABI, the first few arguments are actually
+ passed in registers, with the space in the parameter area unused. However,
+ if there are not enough registers or the callee is a thunk or vararg
+ function, these register arguments can be spilled into the parameter area.
+ Thus, the parameter area must be large enough to store all the parameters for
+ the largest call sequence made by the caller. The size must also be
+ minimally large enough to spill registers r3-r10. This allows callees blind
+ to the call signature, such as thunks and vararg functions, enough space to
+ cache the argument registers. Therefore, the parameter area is minimally 32
+ bytes (64 bytes in 64 bit mode.) Also note that since the parameter area is
+ a fixed offset from the top of the frame, that a callee can access its spilt
+ arguments using fixed offsets from the stack pointer (or base pointer.)</p>
+
+<p>Combining the information about the linkage, parameter areas and alignment. A
+ stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit
+ mode.</p>
+
+<p>The <i>dynamic area</i> starts out as size zero. If a function uses dynamic
+ alloca then space is added to the stack, the linkage and parameter areas are
+ shifted to top of stack, and the new space is available immediately below the
+ linkage and parameter areas. The cost of shifting the linkage and parameter
+ areas is minor since only the link value needs to be copied. The link value
+ can be easily fetched by adding the original frame size to the base pointer.
+ Note that allocations in the dynamic space need to observe 16 byte
+ alignment.</p>
+
+<p>The <i>locals area</i> is where the llvm compiler reserves space for local
+ variables.</p>
+
+<p>The <i>saved registers area</i> is where the llvm compiler spills callee
+ saved registers on entry to the callee.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<h4>
+ <a name="ppc_prolog">Prolog/Epilog</a>
+</h4>
+
+<div>
+
+<p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with
+ the following exceptions. Callee saved registers are spilled after the frame
+ is created. This allows the llvm epilog/prolog support to be common with
+ other targets. The base pointer callee saved register r31 is saved in the
+ TOC slot of linkage area. This simplifies allocation of space for the base
+ pointer and makes it convenient to locate programatically and during
+ debugging.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<h4>
+ <a name="ppc_dynamic">Dynamic Allocation</a>
+</h4>
+
+<div>
+
+<p><i>TODO - More to come.</i></p>
+
+</div>
+
+</div>
+
+<!-- ======================================================================= -->
+<h3>
+ <a name="ptx">The PTX backend</a>
+</h3>
+
+<div>
+
+<p>The PTX code generator lives in the lib/Target/PTX directory. It is
+ currently a work-in-progress, but already supports most of the code
+ generation functionality needed to generate correct PTX kernels for
+ CUDA devices.</p>
+
+<p>The code generator can target PTX 2.0+, and shader model 1.0+. The
+ PTX ISA Reference Manual is used as the primary source of ISA
+ information, though an effort is made to make the output of the code
+ generator match the output of the NVidia nvcc compiler, whenever
+ possible.</p>
+
+<p>Code Generator Options:</p>
+<table border="1" cellspacing="0">
+ <tr>
+ <th>Option</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <td><code>double</code></td>
+ <td align="left">If enabled, the map_f64_to_f32 directive is
+ disabled in the PTX output, allowing native double-precision
+ arithmetic</td>
+ </tr>
+ <tr>
+ <td><code>no-fma</code></td>
+ <td align="left">Disable generation of Fused-Multiply Add
+ instructions, which may be beneficial for some devices</td>
+ </tr>
+ <tr>
+ <td><code>smxy / computexy</code></td>
+ <td align="left">Set shader model/compute capability to x.y,
+ e.g. sm20 or compute13</td>
+ </tr>
+</table>
+
+<p>Working:</p>
+<ul>
+ <li>Arithmetic instruction selection (including combo FMA)</li>
+ <li>Bitwise instruction selection</li>
+ <li>Control-flow instruction selection</li>
+ <li>Function calls (only on SM 2.0+ and no return arguments)</li>
+ <li>Addresses spaces (0 = global, 1 = constant, 2 = local, 4 =
+ shared)</li>
+ <li>Thread synchronization (bar.sync)</li>
+ <li>Special register reads ([N]TID, [N]CTAID, PMx, CLOCK, etc.)</li>
+</ul>
+
+<p>In Progress:</p>
+<ul>
+ <li>Robust call instruction selection</li>
+ <li>Stack frame allocation</li>
+ <li>Device-specific instruction scheduling optimizations</li>
+</ul>
+
+
+</div>