X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FCodeGenerator.html;h=bc82b46735b307eca8187098d0795493db4ab4a5;hb=e3b989d4a4ba47f77d5d38c35ff17e9673d9f87b;hp=f45a98c7241b00c420669226111cc608a47f1ad2;hpb=16448772804d15550c435adcdbe687f4a38a9b8b;p=oota-llvm.git diff --git a/docs/CodeGenerator.html b/docs/CodeGenerator.html index f45a98c7241..bc82b46735b 100644 --- a/docs/CodeGenerator.html +++ b/docs/CodeGenerator.html @@ -2,6 +2,7 @@ "http://www.w3.org/TR/html4/strict.dtd"> + The LLVM Target-Independent Code Generator @@ -58,6 +59,22 @@
  • Future directions for the SelectionDAG
  • +
  • Live Intervals +
  • +
  • Register Allocation +
  • Code Emission
  • -

    Written by Chris Lattner & - Bill Wendling

    +

    Written by Chris Lattner, + Bill Wendling, + Fernando Magno Quintao + Pereira and + Jim Laskey

    @@ -356,7 +382,7 @@ operations. Among other things, this class indicates:

  • the type to use for shift amounts
  • various high-level characteristics, like whether it is profitable to turn division by a constant into a multiplication sequence
  • - +
    @@ -1089,7 +1115,7 @@ primarily because it is a work in progress and is not yet finished:

    fragment can match multiple different patterns.
  • We don't automatically infer flags like isStore/isLoad yet.
  • We don't automatically generate the set of supported registers and - operations for the Legalizer yet.
  • + operations for the Legalizer yet.
  • We don't have a way of tying in custom legalized nodes yet.
  • @@ -1130,7 +1156,6 @@ SelectionDAGs.

    1. Optional function-at-a-time selection.
    2. Auto-generate entire selector from .td file.
    3. -
    @@ -1140,11 +1165,417 @@ SelectionDAGs.

    SSA-based Machine Code Optimizations

    To Be Written

    + + +
    + Live Intervals +
    + +
    + +

    Live Intervals are the ranges (intervals) where a variable is live. +They are used by some register allocator passes to +determine if two or more virtual registers which require the same physical +register are live at the same point in the program (i.e., they conflict). When +this situation occurs, one virtual register must be spilled.

    + +
    + + +
    + Live Variable Analysis +
    + +
    + +

    The first step in determining the live intervals of variables is to +calculate the set of registers that are immediately dead after the +instruction (i.e., the instruction calculates the value, but it is +never used) and the set of registers that are used by the instruction, +but are never used after the instruction (i.e., they are killed). Live +variable information is computed for each virtual register and +register allocatable physical register in the function. This +is done in a very efficient manner because it uses SSA to sparsely +compute lifetime information for virtual registers (which are in SSA +form) and only has to track physical registers within a block. Before +register allocation, LLVM can assume that physical registers are only +live within a single basic block. This allows it to do a single, +local analysis to resolve physical register lifetimes within each +basic block. If a physical register is not register allocatable (e.g., +a stack pointer or condition codes), it is not tracked.

    + +

    Physical registers may be live in to or out of a function. Live in values +are typically arguments in registers. Live out values are typically return +values in registers. Live in values are marked as such, and are given a dummy +"defining" instruction during live intervals analysis. If the last basic block +of a function is a return, then it's marked as using all live out +values in the function.

    + +

    PHI nodes need to be handled specially, because the calculation +of the live variable information from a depth first traversal of the CFG of +the function won't guarantee that a virtual register used by the PHI +node is defined before it's used. When a PHI node is encounted, only +the definition is handled, because the uses will be handled in other basic +blocks.

    + +

    For each PHI node of the current basic block, we simulate an +assignment at the end of the current basic block and traverse the successor +basic blocks. If a successor basic block has a PHI node and one of +the PHI node's operands is coming from the current basic block, +then the variable is marked as alive within the current basic block +and all of its predecessor basic blocks, until the basic block with the +defining instruction is encountered.

    + +
    + + +
    + Live Intervals Analysis +
    + +
    + +

    We now have the information available to perform the live intervals analysis +and build the live intervals themselves. We start off by numbering the basic +blocks and machine instructions. We then handle the "live-in" values. These +are in physical registers, so the physical register is assumed to be killed by +the end of the basic block. Live intervals for virtual registers are computed +for some ordering of the machine instructions [1, N]. A live interval +is an interval [i, j), where 1 <= i <= j < N, for which a +variable is live.

    + +

    More to come...

    + +
    +
    Register Allocation
    -

    To Be Written

    + +
    + +

    The Register Allocation problem consists in mapping a program +Pv, that can use an unbounded number of virtual +registers, to a program Pp that contains a finite +(possibly small) number of physical registers. Each target architecture has +a different number of physical registers. If the number of physical +registers is not enough to accommodate all the virtual registers, some of +them will have to be mapped into memory. These virtuals are called +spilled virtuals.

    + +
    + + + +
    + How registers are represented in LLVM +
    + +
    + +

    In LLVM, physical registers are denoted by integer numbers that +normally range from 1 to 1023. To see how this numbering is defined +for a particular architecture, you can read the +GenRegisterNames.inc file for that architecture. For +instance, by inspecting +lib/Target/X86/X86GenRegisterNames.inc we see that the 32-bit +register EAX is denoted by 15, and the MMX register +MM0 is mapped to 48.

    + +

    Some architectures contain registers that share the same physical +location. A notable example is the X86 platform. For instance, in the +X86 architecture, the registers EAX, AX and +AL share the first eight bits. These physical registers are +marked as aliased in LLVM. Given a particular architecture, you +can check which registers are aliased by inspecting its +RegisterInfo.td file. Moreover, the method +MRegisterInfo::getAliasSet(p_reg) returns an array containing +all the physical registers aliased to the register p_reg.

    + +

    Physical registers, in LLVM, are grouped in Register Classes. +Elements in the same register class are functionally equivalent, and can +be interchangeably used. Each virtual register can only be mapped to +physical registers of a particular class. For instance, in the X86 +architecture, some virtuals can only be allocated to 8 bit registers. +A register class is described by TargetRegisterClass objects. +To discover if a virtual register is compatible with a given physical, +this code can be used: +

    + +
    +
    +bool RegMapping_Fer::compatible_class(MachineFunction &mf,
    +                                      unsigned v_reg,
    +                                      unsigned p_reg) {
    +  assert(MRegisterInfo::isPhysicalRegister(p_reg) &&
    +         "Target register must be physical");
    +  const TargetRegisterClass *trc = mf.getSSARegMap()->getRegClass(v_reg);
    +  return trc->contains(p_reg);
    +}
    +
    +
    + +

    Sometimes, mostly for debugging purposes, it is useful to change +the number of physical registers available in the target +architecture. This must be done statically, inside the +TargetRegsterInfo.td file. Just grep for +RegisterClass, the last parameter of which is a list of +registers. Just commenting some out is one simple way to avoid them +being used. A more polite way is to explicitly exclude some registers +from the allocation order. See the definition of the +GR register class in +lib/Target/IA64/IA64RegisterInfo.td for an example of this +(e.g., numReservedRegs registers are hidden.)

    + +

    Virtual registers are also denoted by integer numbers. Contrary to +physical registers, different virtual registers never share the same +number. The smallest virtual register is normally assigned the number +1024. This may change, so, in order to know which is the first virtual +register, you should access +MRegisterInfo::FirstVirtualRegister. Any register whose +number is greater than or equal to +MRegisterInfo::FirstVirtualRegister is considered a virtual +register. Whereas physical registers are statically defined in a +TargetRegisterInfo.td file and cannot be created by the +application developer, that is not the case with virtual registers. +In order to create new virtual registers, use the method +SSARegMap::createVirtualRegister(). This method will return a +virtual register with the highest code. +

    + +

    Before register allocation, the operands of an instruction are +mostly virtual registers, although physical registers may also be +used. In order to check if a given machine operand is a register, use +the boolean function MachineOperand::isRegister(). To obtain +the integer code of a register, use +MachineOperand::getReg(). An instruction may define or use a +register. For instance, ADD reg:1026 := reg:1025 reg:1024 +defines the registers 1024, and uses registers 1025 and 1026. Given a +register operand, the method MachineOperand::isUse() informs +if that register is being used by the instruction. The method +MachineOperand::isDef() informs if that registers is being +defined.

    + +

    We will call physical registers present in the LLVM bitcode before +register allocation pre-colored registers. Pre-colored +registers are used in many different situations, for instance, to pass +parameters of functions calls, and to store results of particular +instructions. There are two types of pre-colored registers: the ones +implicitly defined, and those explicitly +defined. Explicitly defined registers are normal operands, and can be +accessed with MachineInstr::getOperand(int)::getReg(). In +order to check which registers are implicitly defined by an +instruction, use the +TargetInstrInfo::get(opcode)::ImplicitDefs, where +opcode is the opcode of the target instruction. One important +difference between explicit and implicit physical registers is that +the latter are defined statically for each instruction, whereas the +former may vary depending on the program being compiled. For example, +an instruction that represents a function call will always implicitly +define or use the same set of physical registers. To read the +registers implicitly used by an instruction, use +TargetInstrInfo::get(opcode)::ImplicitUses. Pre-colored +registers impose constraints on any register allocation algorithm. The +register allocator must make sure that none of them is been +overwritten by the values of virtual registers while still alive.

    + +
    + + + +
    + Mapping virtual registers to physical registers +
    + +
    + +

    There are two ways to map virtual registers to physical registers (or to +memory slots). The first way, that we will call direct mapping, +is based on the use of methods of the classes MRegisterInfo, +and MachineOperand. The second way, that we will call +indirect mapping, relies on the VirtRegMap class in +order to insert loads and stores sending and getting values to and from +memory.

    + +

    The direct mapping provides more flexibility to the developer of +the register allocator; however, it is more error prone, and demands +more implementation work. Basically, the programmer will have to +specify where load and store instructions should be inserted in the +target function being compiled in order to get and store values in +memory. To assign a physical register to a virtual register present in +a given operand, use MachineOperand::setReg(p_reg). To insert +a store instruction, use +MRegisterInfo::storeRegToStackSlot(...), and to insert a load +instruction, use MRegisterInfo::loadRegFromStackSlot.

    + +

    The indirect mapping shields the application developer from the +complexities of inserting load and store instructions. In order to map +a virtual register to a physical one, use +VirtRegMap::assignVirt2Phys(vreg, preg). In order to map a +certain virtual register to memory, use +VirtRegMap::assignVirt2StackSlot(vreg). This method will +return the stack slot where vreg's value will be located. If +it is necessary to map another virtual register to the same stack +slot, use VirtRegMap::assignVirt2StackSlot(vreg, +stack_location). One important point to consider when using the +indirect mapping, is that even if a virtual register is mapped to +memory, it still needs to be mapped to a physical register. This +physical register is the location where the virtual register is +supposed to be found before being stored or after being reloaded.

    + +

    If the indirect strategy is used, after all the virtual registers +have been mapped to physical registers or stack slots, it is necessary +to use a spiller object to place load and store instructions in the +code. Every virtual that has been mapped to a stack slot will be +stored to memory after been defined and will be loaded before being +used. The implementation of the spiller tries to recycle load/store +instructions, avoiding unnecessary instructions. For an example of how +to invoke the spiller, see +RegAllocLinearScan::runOnMachineFunction in +lib/CodeGen/RegAllocLinearScan.cpp.

    + +
    + + +
    + Handling two address instructions +
    + +
    + +

    With very rare exceptions (e.g., function calls), the LLVM machine +code instructions are three address instructions. That is, each +instruction is expected to define at most one register, and to use at +most two registers. However, some architectures use two address +instructions. In this case, the defined register is also one of the +used register. For instance, an instruction such as ADD %EAX, +%EBX, in X86 is actually equivalent to %EAX = %EAX + +%EBX.

    + +

    In order to produce correct code, LLVM must convert three address +instructions that represent two address instructions into true two +address instructions. LLVM provides the pass +TwoAddressInstructionPass for this specific purpose. It must +be run before register allocation takes place. After its execution, +the resulting code may no longer be in SSA form. This happens, for +instance, in situations where an instruction such as %a = ADD %b +%c is converted to two instructions such as:

    + +
    +
    +%a = MOVE %b
    +%a = ADD %a %b
    +
    +
    + +

    Notice that, internally, the second instruction is represented as +ADD %a[def/use] %b. I.e., the register operand %a is +both used and defined by the instruction.

    + +
    + + +
    + The SSA deconstruction phase +
    + +
    + +

    An important transformation that happens during register allocation is called +the SSA Deconstruction Phase. The SSA form simplifies many +analyses that are performed on the control flow graph of +programs. However, traditional instruction sets do not implement +PHI instructions. Thus, in order to generate executable code, compilers +must replace PHI instructions with other instructions that preserve their +semantics.

    + +

    There are many ways in which PHI instructions can safely be removed +from the target code. The most traditional PHI deconstruction +algorithm replaces PHI instructions with copy instructions. That is +the strategy adopted by LLVM. The SSA deconstruction algorithm is +implemented in nlib/CodeGen/>PHIElimination.cpp. In order to +invoke this pass, the identifier PHIEliminationID must be +marked as required in the code of the register allocator.

    + +
    + + +
    + Instruction folding +
    + +
    + +

    Instruction folding is an optimization performed during +register allocation that removes unnecessary copy instructions. For +instance, a sequence of instructions such as:

    + +
    +
    +%EBX = LOAD %mem_address
    +%EAX = COPY %EBX
    +
    +
    + +

    can be safely substituted by the single instruction: + +

    +
    +%EAX = LOAD %mem_address
    +
    +
    + +

    Instructions can be folded with the +MRegisterInfo::foldMemoryOperand(...) method. Care must be +taken when folding instructions; a folded instruction can be quite +different from the original instruction. See +LiveIntervals::addIntervalsForSpills in +lib/CodeGen/LiveIntervalAnalysis.cpp for an example of its use.

    + +
    + + + +
    + Built in register allocators +
    + +
    + +

    The LLVM infrastructure provides the application developer with +three different register allocators:

    + + + +

    The type of register allocator used in llc can be chosen with the +command line option -regalloc=...:

    + +
    +
    +$ llc -f -regalloc=simple file.bc -o sp.s;
    +$ llc -f -regalloc=local file.bc -o lc.s;
    +$ llc -f -regalloc=linearscan file.bc -o ln.s;
    +
    +
    + +
    +
    Prolog/Epilog Code Insertion @@ -1221,11 +1652,31 @@ that people test.

  • i386-unknown-freebsd5.3 - FreeBSD 5.3
  • i686-pc-cygwin - Cygwin on Win32
  • i686-pc-mingw32 - MingW on Win32
  • +
  • i386-pc-mingw32msvc - MingW crosscompiler on Linux
  • i686-apple-darwin* - Apple Darwin on X86
  • + +
    + X86 Calling Conventions supported +
    + + +
    + +

    The folowing target-specific calling conventions are known to backend:

    + + + +
    +
    Representing X86 addressing modes in MachineInstrs @@ -1277,6 +1728,223 @@ a character per operand with an optional special size. For example:

    + +
    + The PowerPC backend +
    + +
    +

    The PowerPC code generator lives in the lib/Target/PowerPC directory. The +code generation is retargetable to several variations or subtargets of +the PowerPC ISA; including ppc32, ppc64 and altivec. +

    +
    + + +
    + LLVM PowerPC ABI +
    + +
    +

    LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC +relative (PIC) or static addressing for accessing global values, so no TOC (r2) +is used. Second, r31 is used as a frame pointer to allow dynamic growth of a +stack frame. LLVM takes advantage of having no TOC to provide space to save +the frame pointer in the PowerPC linkage area of the caller frame. Other +details of PowerPC ABI can be found at PowerPC ABI. Note: This link describes the 32 bit ABI. The +64 bit ABI is similar except space for GPRs are 8 bytes wide (not 4) and r13 is +reserved for system use.

    +
    + + +
    + Frame Layout +
    + +
    +

    The size of a PowerPC frame is usually fixed for the duration of a +function’s invocation. Since the frame is fixed size, all references into +the frame can be accessed via fixed offsets from the stack pointer. The +exception to this is when dynamic alloca or variable sized arrays are present, +then a base pointer (r31) is used as a proxy for the stack pointer and stack +pointer is free to grow or shrink. A base pointer is also used if llvm-gcc is +not passed the -fomit-frame-pointer flag. The stack pointer is always aligned to +16 bytes, so that space allocated for altivec vectors will be properly +aligned.

    +

    An invocation frame is layed out as follows (low memory at top);

    +
    + +
    + + + + + + + + + + + + + + + + + + + + + + +
    Linkage

    Parameter area

    Dynamic area

    Locals area

    Saved registers area


    Previous Frame

    +
    + +
    +

    The linkage area is used by a callee to save special registers prior +to allocating its own frame. Only three entries are relevant to LLVM. The +first entry is the previous stack pointer (sp), aka link. This allows probing +tools like gdb or exception handlers to quickly scan the frames in the stack. A +function epilog can also use the link to pop the frame from the stack. The +third entry in the linkage area is used to save the return address from the lr +register. Finally, as mentioned above, the last entry is used to save the +previous frame pointer (r31.) The entries in the linkage area are the size of a +GPR, thus the linkage area is 24 bytes long in 32 bit mode and 48 bytes in 64 +bit mode.

    +
    + +
    +

    32 bit linkage area

    + + + + + + + + + + + + + + + + + + + + + + + + + +
    0Saved SP (r1)
    4Saved CR
    8Saved LR
    12Reserved
    16Reserved
    20Saved FP (r31)
    +
    + +
    +

    64 bit linkage area

    + + + + + + + + + + + + + + + + + + + + + + + + + +
    0Saved SP (r1)
    8Saved CR
    16Saved LR
    24Reserved
    32Reserved
    40Saved FP (r31)
    +
    + +
    +

    The parameter area is used to store arguments being passed to a callee +function. Following the PowerPC ABI, the first few arguments are actually +passed in registers, with the space in the parameter area unused. However, if +there are not enough registers or the callee is a thunk or vararg function, +these register arguments can be spilled into the parameter area. Thus, the +parameter area must be large enough to store all the parameters for the largest +call sequence made by the caller. The size must also be mimimally large enough +to spill registers r3-r10. This allows callees blind to the call signature, +such as thunks and vararg functions, enough space to cache the argument +registers. Therefore, the parameter area is minimally 32 bytes (64 bytes in 64 +bit mode.) Also note that since the parameter area is a fixed offset from the +top of the frame, that a callee can access its spilt arguments using fixed +offsets from the stack pointer (or base pointer.)

    +
    + +
    +

    Combining the information about the linkage, parameter areas and alignment. A +stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit +mode.

    +
    + +
    +

    The dynamic area starts out as size zero. If a function uses dynamic +alloca then space is added to the stack, the linkage and parameter areas are +shifted to top of stack, and the new space is available immediately below the +linkage and parameter areas. The cost of shifting the linkage and parameter +areas is minor since only the link value needs to be copied. The link value can +be easily fetched by adding the original frame size to the base pointer. Note +that allocations in the dynamic space need to observe 16 byte aligment.

    +
    + +
    +

    The locals area is where the llvm compiler reserves space for local +variables.

    +
    + +
    +

    The saved registers area is where the llvm compiler spills callee saved +registers on entry to the callee.

    +
    + + +
    + Prolog/Epilog +
    + +
    +

    The llvm prolog and epilog are the same as described in the PowerPC ABI, with +the following exceptions. Callee saved registers are spilled after the frame is +created. This allows the llvm epilog/prolog support to be common with other +targets. The base pointer callee saved register r31 is saved in the TOC slot of +linkage area. This simplifies allocation of space for the base pointer and +makes it convenient to locate programatically and during debugging.

    +
    + + +
    + Dynamic Allocation +
    + +
    +

    +
    + +
    +

    TODO - More to come.

    +
    + +