+<p>While it has many strengths, the system currently has some limitations,
+ primarily because it is a work in progress and is not yet finished:</p>
+
+<ul>
+ <li>Overall, there is no way to define or match SelectionDAG nodes that define
+ multiple values (e.g. <tt>SMUL_LOHI</tt>, <tt>LOAD</tt>, <tt>CALL</tt>,
+ etc). This is the biggest reason that you currently still <em>have
+ to</em> write custom C++ code for your instruction selector.</li>
+
+ <li>There is no great way to support matching complex addressing modes yet.
+ In the future, we will extend pattern fragments to allow them to define
+ multiple values (e.g. the four operands of the <a href="#x86_memory">X86
+ addressing mode</a>, which are currently matched with custom C++ code).
+ In addition, we'll extend fragments so that a fragment can match multiple
+ different patterns.</li>
+
+ <li>We don't automatically infer flags like isStore/isLoad yet.</li>
+
+ <li>We don't automatically generate the set of supported registers and
+ operations for the <a href="#selectiondag_legalize">Legalizer</a>
+ yet.</li>
+
+ <li>We don't have a way of tying in custom legalized nodes yet.</li>
+</ul>
+
+<p>Despite these limitations, the instruction selector generator is still quite
+ useful for most of the binary and logical operations in typical instruction
+ sets. If you run into any problems or can't figure out how to do something,
+ please let Chris know!</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="selectiondag_sched">SelectionDAG Scheduling and Formation Phase</a>
+</div>
+
+<div class="doc_text">
+
+<p>The scheduling phase takes the DAG of target instructions from the selection
+ phase and assigns an order. The scheduler can pick an order depending on
+ various constraints of the machines (i.e. order for minimal register pressure
+ or try to cover instruction latencies). Once an order is established, the
+ DAG is converted to a list
+ of <tt><a href="#machineinstr">MachineInstr</a></tt>s and the SelectionDAG is
+ destroyed.</p>
+
+<p>Note that this phase is logically separate from the instruction selection
+ phase, but is tied to it closely in the code because it operates on
+ SelectionDAGs.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="selectiondag_future">Future directions for the SelectionDAG</a>
+</div>
+
+<div class="doc_text">
+
+<ol>
+ <li>Optional function-at-a-time selection.</li>
+
+ <li>Auto-generate entire selector from <tt>.td</tt> file.</li>
+</ol>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="ssamco">SSA-based Machine Code Optimizations</a>
+</div>
+<div class="doc_text"><p>To Be Written</p></div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="liveintervals">Live Intervals</a>
+</div>
+
+<div class="doc_text">
+
+<p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>.
+ They are used by some <a href="#regalloc">register allocator</a> passes to
+ determine if two or more virtual registers which require the same physical
+ register are live at the same point in the program (i.e., they conflict).
+ When this situation occurs, one virtual register must be <i>spilled</i>.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="livevariable_analysis">Live Variable Analysis</a>
+</div>
+
+<div class="doc_text">
+
+<p>The first step in determining the live intervals of variables is to calculate
+ the set of registers that are immediately dead after the instruction (i.e.,
+ the instruction calculates the value, but it is never used) and the set of
+ registers that are used by the instruction, but are never used after the
+ instruction (i.e., they are killed). Live variable information is computed
+ for each <i>virtual</i> register and <i>register allocatable</i> physical
+ register in the function. This is done in a very efficient manner because it
+ uses SSA to sparsely compute lifetime information for virtual registers
+ (which are in SSA form) and only has to track physical registers within a
+ block. Before register allocation, LLVM can assume that physical registers
+ are only live within a single basic block. This allows it to do a single,
+ local analysis to resolve physical register lifetimes within each basic
+ block. If a physical register is not register allocatable (e.g., a stack
+ pointer or condition codes), it is not tracked.</p>
+
+<p>Physical registers may be live in to or out of a function. Live in values are
+ typically arguments in registers. Live out values are typically return values
+ in registers. Live in values are marked as such, and are given a dummy
+ "defining" instruction during live intervals analysis. If the last basic
+ block of a function is a <tt>return</tt>, then it's marked as using all live
+ out values in the function.</p>
+
+<p><tt>PHI</tt> nodes need to be handled specially, because the calculation of
+ the live variable information from a depth first traversal of the CFG of the
+ function won't guarantee that a virtual register used by the <tt>PHI</tt>
+ node is defined before it's used. When a <tt>PHI</tt> node is encountered,
+ only the definition is handled, because the uses will be handled in other
+ basic blocks.</p>
+
+<p>For each <tt>PHI</tt> node of the current basic block, we simulate an
+ assignment at the end of the current basic block and traverse the successor
+ basic blocks. If a successor basic block has a <tt>PHI</tt> node and one of
+ the <tt>PHI</tt> node's operands is coming from the current basic block, then
+ the variable is marked as <i>alive</i> within the current basic block and all
+ of its predecessor basic blocks, until the basic block with the defining
+ instruction is encountered.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="liveintervals_analysis">Live Intervals Analysis</a>
+</div>
+
+<div class="doc_text">
+
+<p>We now have the information available to perform the live intervals analysis
+ and build the live intervals themselves. We start off by numbering the basic
+ blocks and machine instructions. We then handle the "live-in" values. These
+ are in physical registers, so the physical register is assumed to be killed
+ by the end of the basic block. Live intervals for virtual registers are
+ computed for some ordering of the machine instructions <tt>[1, N]</tt>. A
+ live interval is an interval <tt>[i, j)</tt>, where <tt>1 <= i <= j
+ < N</tt>, for which a variable is live.</p>
+
+<p><i><b>More to come...</b></i></p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="regalloc">Register Allocation</a>
+</div>
+
+<div class="doc_text">
+
+<p>The <i>Register Allocation problem</i> consists in mapping a program
+ <i>P<sub>v</sub></i>, that can use an unbounded number of virtual registers,
+ to a program <i>P<sub>p</sub></i> that contains a finite (possibly small)
+ number of physical registers. Each target architecture has a different number
+ of physical registers. If the number of physical registers is not enough to
+ accommodate all the virtual registers, some of them will have to be mapped
+ into memory. These virtuals are called <i>spilled virtuals</i>.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+
+<div class="doc_subsubsection">
+ <a name="regAlloc_represent">How registers are represented in LLVM</a>
+</div>
+
+<div class="doc_text">
+
+<p>In LLVM, physical registers are denoted by integer numbers that normally
+ range from 1 to 1023. To see how this numbering is defined for a particular
+ architecture, you can read the <tt>GenRegisterNames.inc</tt> file for that
+ architecture. For instance, by
+ inspecting <tt>lib/Target/X86/X86GenRegisterNames.inc</tt> we see that the
+ 32-bit register <tt>EAX</tt> is denoted by 15, and the MMX register
+ <tt>MM0</tt> is mapped to 48.</p>
+
+<p>Some architectures contain registers that share the same physical location. A
+ notable example is the X86 platform. For instance, in the X86 architecture,
+ the registers <tt>EAX</tt>, <tt>AX</tt> and <tt>AL</tt> share the first eight
+ bits. These physical registers are marked as <i>aliased</i> in LLVM. Given a
+ particular architecture, you can check which registers are aliased by
+ inspecting its <tt>RegisterInfo.td</tt> file. Moreover, the method
+ <tt>TargetRegisterInfo::getAliasSet(p_reg)</tt> returns an array containing
+ all the physical registers aliased to the register <tt>p_reg</tt>.</p>
+
+<p>Physical registers, in LLVM, are grouped in <i>Register Classes</i>.
+ Elements in the same register class are functionally equivalent, and can be
+ interchangeably used. Each virtual register can only be mapped to physical
+ registers of a particular class. For instance, in the X86 architecture, some
+ virtuals can only be allocated to 8 bit registers. A register class is
+ described by <tt>TargetRegisterClass</tt> objects. To discover if a virtual
+ register is compatible with a given physical, this code can be used:</p>
+
+<div class="doc_code">
+<pre>
+bool RegMapping_Fer::compatible_class(MachineFunction &mf,
+ unsigned v_reg,
+ unsigned p_reg) {
+ assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &&
+ "Target register must be physical");
+ const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
+ return trc->contains(p_reg);
+}
+</pre>
+</div>
+
+<p>Sometimes, mostly for debugging purposes, it is useful to change the number
+ of physical registers available in the target architecture. This must be done
+ statically, inside the <tt>TargetRegsterInfo.td</tt> file. Just <tt>grep</tt>
+ for <tt>RegisterClass</tt>, the last parameter of which is a list of
+ registers. Just commenting some out is one simple way to avoid them being
+ used. A more polite way is to explicitly exclude some registers from
+ the <i>allocation order</i>. See the definition of the <tt>GR8</tt> register
+ class in <tt>lib/Target/X86/X86RegisterInfo.td</tt> for an example of this.
+ </p>
+
+<p>Virtual registers are also denoted by integer numbers. Contrary to physical
+ registers, different virtual registers never share the same number. The
+ smallest virtual register is normally assigned the number 1024. This may
+ change, so, in order to know which is the first virtual register, you should
+ access <tt>TargetRegisterInfo::FirstVirtualRegister</tt>. Any register whose
+ number is greater than or equal
+ to <tt>TargetRegisterInfo::FirstVirtualRegister</tt> is considered a virtual
+ register. Whereas physical registers are statically defined in
+ a <tt>TargetRegisterInfo.td</tt> file and cannot be created by the
+ application developer, that is not the case with virtual registers. In order
+ to create new virtual registers, use the
+ method <tt>MachineRegisterInfo::createVirtualRegister()</tt>. This method
+ will return a virtual register with the highest code.</p>
+
+<p>Before register allocation, the operands of an instruction are mostly virtual
+ registers, although physical registers may also be used. In order to check if
+ a given machine operand is a register, use the boolean
+ function <tt>MachineOperand::isRegister()</tt>. To obtain the integer code of
+ a register, use <tt>MachineOperand::getReg()</tt>. An instruction may define
+ or use a register. For instance, <tt>ADD reg:1026 := reg:1025 reg:1024</tt>
+ defines the registers 1024, and uses registers 1025 and 1026. Given a
+ register operand, the method <tt>MachineOperand::isUse()</tt> informs if that
+ register is being used by the instruction. The
+ method <tt>MachineOperand::isDef()</tt> informs if that registers is being
+ defined.</p>
+
+<p>We will call physical registers present in the LLVM bitcode before register
+ allocation <i>pre-colored registers</i>. Pre-colored registers are used in
+ many different situations, for instance, to pass parameters of functions
+ calls, and to store results of particular instructions. There are two types
+ of pre-colored registers: the ones <i>implicitly</i> defined, and
+ those <i>explicitly</i> defined. Explicitly defined registers are normal
+ operands, and can be accessed
+ with <tt>MachineInstr::getOperand(int)::getReg()</tt>. In order to check
+ which registers are implicitly defined by an instruction, use
+ the <tt>TargetInstrInfo::get(opcode)::ImplicitDefs</tt>,
+ where <tt>opcode</tt> is the opcode of the target instruction. One important
+ difference between explicit and implicit physical registers is that the
+ latter are defined statically for each instruction, whereas the former may
+ vary depending on the program being compiled. For example, an instruction
+ that represents a function call will always implicitly define or use the same
+ set of physical registers. To read the registers implicitly used by an
+ instruction,
+ use <tt>TargetInstrInfo::get(opcode)::ImplicitUses</tt>. Pre-colored
+ registers impose constraints on any register allocation algorithm. The
+ register allocator must make sure that none of them are overwritten by
+ the values of virtual registers while still alive.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+
+<div class="doc_subsubsection">
+ <a name="regAlloc_howTo">Mapping virtual registers to physical registers</a>
+</div>
+
+<div class="doc_text">
+
+<p>There are two ways to map virtual registers to physical registers (or to
+ memory slots). The first way, that we will call <i>direct mapping</i>, is
+ based on the use of methods of the classes <tt>TargetRegisterInfo</tt>,
+ and <tt>MachineOperand</tt>. The second way, that we will call <i>indirect
+ mapping</i>, relies on the <tt>VirtRegMap</tt> class in order to insert loads
+ and stores sending and getting values to and from memory.</p>
+
+<p>The direct mapping provides more flexibility to the developer of the register
+ allocator; however, it is more error prone, and demands more implementation
+ work. Basically, the programmer will have to specify where load and store
+ instructions should be inserted in the target function being compiled in
+ order to get and store values in memory. To assign a physical register to a
+ virtual register present in a given operand,
+ use <tt>MachineOperand::setReg(p_reg)</tt>. To insert a store instruction,
+ use <tt>TargetInstrInfo::storeRegToStackSlot(...)</tt>, and to insert a
+ load instruction, use <tt>TargetInstrInfo::loadRegFromStackSlot</tt>.</p>
+
+<p>The indirect mapping shields the application developer from the complexities
+ of inserting load and store instructions. In order to map a virtual register
+ to a physical one, use <tt>VirtRegMap::assignVirt2Phys(vreg, preg)</tt>. In
+ order to map a certain virtual register to memory,
+ use <tt>VirtRegMap::assignVirt2StackSlot(vreg)</tt>. This method will return
+ the stack slot where <tt>vreg</tt>'s value will be located. If it is
+ necessary to map another virtual register to the same stack slot,
+ use <tt>VirtRegMap::assignVirt2StackSlot(vreg, stack_location)</tt>. One
+ important point to consider when using the indirect mapping, is that even if
+ a virtual register is mapped to memory, it still needs to be mapped to a
+ physical register. This physical register is the location where the virtual
+ register is supposed to be found before being stored or after being
+ reloaded.</p>
+
+<p>If the indirect strategy is used, after all the virtual registers have been
+ mapped to physical registers or stack slots, it is necessary to use a spiller
+ object to place load and store instructions in the code. Every virtual that
+ has been mapped to a stack slot will be stored to memory after been defined
+ and will be loaded before being used. The implementation of the spiller tries
+ to recycle load/store instructions, avoiding unnecessary instructions. For an
+ example of how to invoke the spiller,
+ see <tt>RegAllocLinearScan::runOnMachineFunction</tt>
+ in <tt>lib/CodeGen/RegAllocLinearScan.cpp</tt>.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="regAlloc_twoAddr">Handling two address instructions</a>
+</div>
+
+<div class="doc_text">
+
+<p>With very rare exceptions (e.g., function calls), the LLVM machine code
+ instructions are three address instructions. That is, each instruction is
+ expected to define at most one register, and to use at most two registers.
+ However, some architectures use two address instructions. In this case, the
+ defined register is also one of the used register. For instance, an
+ instruction such as <tt>ADD %EAX, %EBX</tt>, in X86 is actually equivalent
+ to <tt>%EAX = %EAX + %EBX</tt>.</p>
+
+<p>In order to produce correct code, LLVM must convert three address
+ instructions that represent two address instructions into true two address
+ instructions. LLVM provides the pass <tt>TwoAddressInstructionPass</tt> for
+ this specific purpose. It must be run before register allocation takes
+ place. After its execution, the resulting code may no longer be in SSA
+ form. This happens, for instance, in situations where an instruction such
+ as <tt>%a = ADD %b %c</tt> is converted to two instructions such as:</p>
+
+<div class="doc_code">
+<pre>
+%a = MOVE %b
+%a = ADD %a %c
+</pre>
+</div>
+
+<p>Notice that, internally, the second instruction is represented as
+ <tt>ADD %a[def/use] %c</tt>. I.e., the register operand <tt>%a</tt> is both
+ used and defined by the instruction.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="regAlloc_ssaDecon">The SSA deconstruction phase</a>
+</div>
+
+<div class="doc_text">
+
+<p>An important transformation that happens during register allocation is called
+ the <i>SSA Deconstruction Phase</i>. The SSA form simplifies many analyses
+ that are performed on the control flow graph of programs. However,
+ traditional instruction sets do not implement PHI instructions. Thus, in
+ order to generate executable code, compilers must replace PHI instructions
+ with other instructions that preserve their semantics.</p>
+
+<p>There are many ways in which PHI instructions can safely be removed from the
+ target code. The most traditional PHI deconstruction algorithm replaces PHI
+ instructions with copy instructions. That is the strategy adopted by
+ LLVM. The SSA deconstruction algorithm is implemented
+ in <tt>lib/CodeGen/PHIElimination.cpp</tt>. In order to invoke this pass, the
+ identifier <tt>PHIEliminationID</tt> must be marked as required in the code
+ of the register allocator.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="regAlloc_fold">Instruction folding</a>
+</div>
+
+<div class="doc_text">
+
+<p><i>Instruction folding</i> is an optimization performed during register
+ allocation that removes unnecessary copy instructions. For instance, a
+ sequence of instructions such as:</p>
+
+<div class="doc_code">
+<pre>
+%EBX = LOAD %mem_address
+%EAX = COPY %EBX
+</pre>
+</div>
+
+<p>can be safely substituted by the single instruction:</p>
+
+<div class="doc_code">
+<pre>
+%EAX = LOAD %mem_address
+</pre>
+</div>
+
+<p>Instructions can be folded with
+ the <tt>TargetRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be
+ taken when folding instructions; a folded instruction can be quite different
+ from the original
+ instruction. See <tt>LiveIntervals::addIntervalsForSpills</tt>
+ in <tt>lib/CodeGen/LiveIntervalAnalysis.cpp</tt> for an example of its
+ use.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+
+<div class="doc_subsubsection">
+ <a name="regAlloc_builtIn">Built in register allocators</a>
+</div>
+
+<div class="doc_text">
+
+<p>The LLVM infrastructure provides the application developer with three
+ different register allocators:</p>
+
+<ul>
+ <li><i>Linear Scan</i> — <i>The default allocator</i>. This is the
+ well-know linear scan register allocator. Whereas the
+ <i>Simple</i> and <i>Local</i> algorithms use a direct mapping
+ implementation technique, the <i>Linear Scan</i> implementation
+ uses a spiller in order to place load and stores.</li>
+
+ <li><i>Fast</i> — This register allocator is the default for debug
+ builds. It allocates registers on a basic block level, attempting to keep
+ values in registers and reusing registers as appropriate.</li>
+
+ <li><i>PBQP</i> — A Partitioned Boolean Quadratic Programming (PBQP)
+ based register allocator. This allocator works by constructing a PBQP
+ problem representing the register allocation problem under consideration,
+ solving this using a PBQP solver, and mapping the solution back to a
+ register assignment.</li>
+
+</ul>
+
+<p>The type of register allocator used in <tt>llc</tt> can be chosen with the
+ command line option <tt>-regalloc=...</tt>:</p>
+
+<div class="doc_code">
+<pre>
+$ llc -regalloc=linearscan file.bc -o ln.s;
+$ llc -regalloc=fast file.bc -o fa.s;
+$ llc -regalloc=pbqp file.bc -o pbqp.s;
+</pre>
+</div>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="proepicode">Prolog/Epilog Code Insertion</a>
+</div>
+<div class="doc_text"><p>To Be Written</p></div>
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="latemco">Late Machine Code Optimizations</a>
+</div>
+<div class="doc_text"><p>To Be Written</p></div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="codeemit">Code Emission</a>
+</div>
+
+<div class="doc_text">
+
+<p>The code emission step of code generation is responsible for lowering from
+the code generator abstractions (like <a
+href="#machinefunction">MachineFunction</a>, <a
+href="#machineinstr">MachineInstr</a>, etc) down
+to the abstractions used by the MC layer (<a href="#mcinst">MCInst</a>,
+<a href="#mcstreamer">MCStreamer</a>, etc). This is
+done with a combination of several different classes: the (misnamed)
+target-independent AsmPrinter class, target-specific subclasses of AsmPrinter
+(such as SparcAsmPrinter), and the TargetLoweringObjectFile class.</p>
+
+<p>Since the MC layer works at the level of abstraction of object files, it
+doesn't have a notion of functions, global variables etc. Instead, it thinks
+about labels, directives, and instructions. A key class used at this time is
+the MCStreamer class. This is an abstract API that is implemented in different
+ways (e.g. to output a .s file, output an ELF .o file, etc) that is effectively
+an "assembler API". MCStreamer has one method per directive, such as EmitLabel,
+EmitSymbolAttribute, SwitchSection, etc, which directly correspond to assembly
+level directives.
+</p>
+
+<p>If you are interested in implementing a code generator for a target, there
+are three important things that you have to implement for your target:</p>
+
+<ol>
+<li>First, you need a subclass of AsmPrinter for your target. This class
+implements the general lowering process converting MachineFunction's into MC
+label constructs. The AsmPrinter base class provides a number of useful methods
+and routines, and also allows you to override the lowering process in some
+important ways. You should get much of the lowering for free if you are
+implementing an ELF, COFF, or MachO target, because the TargetLoweringObjectFile
+class implements much of the common logic.</li>
+
+<li>Second, you need to implement an instruction printer for your target. The
+instruction printer takes an <a href="#mcinst">MCInst</a> and renders it to a
+raw_ostream as text. Most of this is automatically generated from the .td file
+(when you specify something like "<tt>add $dst, $src1, $src2</tt>" in the
+instructions), but you need to implement routines to print operands.</li>
+
+<li>Third, you need to implement code that lowers a <a
+href="#machineinstr">MachineInstr</a> to an MCInst, usually implemented in
+"<target>MCInstLower.cpp". This lowering process is often target
+specific, and is responsible for turning jump table entries, constant pool
+indices, global variable addresses, etc into MCLabels as appropriate. This
+translation layer is also responsible for expanding pseudo ops used by the code
+generator into the actual machine instructions they correspond to. The MCInsts
+that are generated by this are fed into the instruction printer or the encoder.
+</li>
+
+</ol>
+
+<p>Finally, at your choosing, you can also implement an subclass of
+MCCodeEmitter which lowers MCInst's into machine code bytes and relocations.
+This is important if you want to support direct .o file emission, or would like
+to implement an assembler for your target.</p>
+
+</div>
+
+
+<!-- *********************************************************************** -->
+<div class="doc_section">
+ <a name="nativeassembler">Implementing a Native Assembler</a>
+</div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>Though you're probably reading this because you want to write or maintain a
+compiler backend, LLVM also fully supports building a native assemblers too.
+We've tried hard to automate the generation of the assembler from the .td files
+(in particular the instruction syntax and encodings), which means that a large
+part of the manual and repetitive data entry can be factored and shared with the
+compiler.</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection" id="na_instparsing">Instruction Parsing</div>
+
+<div class="doc_text"><p>To Be Written</p></div>
+
+
+<!-- ======================================================================= -->
+<div class="doc_subsection" id="na_instaliases">
+ Instruction Alias Processing
+</div>
+
+<div class="doc_text">
+<p>Once the instruction is parsed, it enters the MatchInstructionImpl function.
+The MatchInstructionImpl function performs alias processing and then does
+actual matching.</p>
+
+<p>Alias processing is the phase that canonicalizes different lexical forms of
+the same instructions down to one representation. There are several different
+kinds of alias that are possible to implement and they are listed below in the
+order that they are processed (which is in order from simplest/weakest to most
+complex/powerful). Generally you want to use the first alias mechanism that
+meets the needs of your instruction, because it will allow a more concise
+description.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">Mnemonic Aliases</div>
+
+<div class="doc_text">
+
+<p>The first phase of alias processing is simple instruction mnemonic
+remapping for classes of instructions which are allowed with two different
+mnemonics. This phase is a simple and unconditionally remapping from one input
+mnemonic to one output mnemonic. It isn't possible for this form of alias to
+look at the operands at all, so the remapping must apply for all forms of a
+given mnemonic. Mnemonic aliases are defined simply, for example X86 has: