X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FCodeGenerator.html;h=32a3a971a32e088cc89c883362b6b01ba701f236;hb=cd37fd51fcb3ed39d51fce5f5435d94ef85586f5;hp=53c2b54d36a1557f8e7e2dd8dbefe269ed083ae5;hpb=75471d698fe35d42ea230019e7c7761afead74a4;p=oota-llvm.git diff --git a/docs/CodeGenerator.html b/docs/CodeGenerator.html index 53c2b54d36a..32a3a971a32 100644 --- a/docs/CodeGenerator.html +++ b/docs/CodeGenerator.html @@ -50,6 +50,7 @@
  • The MachineBasicBlock class
  • The MachineFunction class
  • +
  • MachineInstr Bundles
  • The "MC" Layer @@ -97,6 +98,14 @@
  • Built in register allocators
  • Code Emission
  • +
  • VLIW Packetizer + +
  • Implementing a Native Assembler
  • @@ -114,6 +123,7 @@
  • Prolog/Epilog
  • Dynamic Allocation
  • +
  • The PTX backend
  • @@ -697,6 +707,21 @@ ret + +

    + Call-clobbered registers +

    + +
    + +

    Some machine instructions, like calls, clobber a large number of physical + registers. Rather than adding <def,dead> operands for + all of them, it is possible to use an MO_RegisterMask operand + instead. The register mask operand holds a bit mask of preserved registers, + and everything else is considered to be clobbered by the instruction.

    + +
    +

    Machine code in SSA form @@ -752,6 +777,88 @@ ret + +

    + MachineInstr Bundles +

    + +
    + +

    LLVM code generator can model sequences of instructions as MachineInstr + bundles. A MI bundle can model a VLIW group / pack which contains an + arbitrary number of parallel instructions. It can also be used to model + a sequential list of instructions (potentially with data dependencies) that + cannot be legally separated (e.g. ARM Thumb2 IT blocks).

    + +

    Conceptually a MI bundle is a MI with a number of other MIs nested within: +

    + +
    +
    +--------------
    +|   Bundle   | ---------
    +--------------          \
    +       |           ----------------
    +       |           |      MI      |
    +       |           ----------------
    +       |                   |
    +       |           ----------------
    +       |           |      MI      |
    +       |           ----------------
    +       |                   |
    +       |           ----------------
    +       |           |      MI      |
    +       |           ----------------
    +       |
    +--------------
    +|   Bundle   | --------
    +--------------         \
    +       |           ----------------
    +       |           |      MI      |
    +       |           ----------------
    +       |                   |
    +       |           ----------------
    +       |           |      MI      |
    +       |           ----------------
    +       |                   |
    +       |                  ...
    +       |
    +--------------
    +|   Bundle   | --------
    +--------------         \
    +       |
    +      ...
    +
    +
    + +

    MI bundle support does not change the physical representations of + MachineBasicBlock and MachineInstr. All the MIs (including top level and + nested ones) are stored as sequential list of MIs. The "bundled" MIs are + marked with the 'InsideBundle' flag. A top level MI with the special BUNDLE + opcode is used to represent the start of a bundle. It's legal to mix BUNDLE + MIs with indiviual MIs that are not inside bundles nor represent bundles. +

    + +

    MachineInstr passes should operate on a MI bundle as a single unit. Member + methods have been taught to correctly handle bundles and MIs inside bundles. + The MachineBasicBlock iterator has been modified to skip over bundled MIs to + enforce the bundle-as-a-single-unit concept. An alternative iterator + instr_iterator has been added to MachineBasicBlock to allow passes to + iterate over all of the MIs in a MachineBasicBlock, including those which + are nested inside bundles. The top level BUNDLE instruction must have the + correct set of register MachineOperand's that represent the cumulative + inputs and outputs of the bundled MIs.

    + +

    Packing / bundling of MachineInstr's should be done as part of the register + allocation super-pass. More specifically, the pass which determines what + MIs should be bundled together must be done after code generator exits SSA + form (i.e. after two-address pass, PHI elimination, and copy coalescing). + Bundles should only be finalized (i.e. adding BUNDLE MIs and input and + output register MachineOperands) after virtual registers have been + rewritten into physical registers. This requirement eliminates the need to + add virtual register operands to BUNDLE instructions which would effectively + double the virtual register def and use lists.

    +
    @@ -1768,22 +1875,28 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf, different register allocators:

    The type of register allocator used in llc can be chosen with the @@ -1806,6 +1919,8 @@ $ llc -regalloc=pbqp file.bc -o pbqp.s; Prolog/Epilog Code Insertion +

    +

    Compact Unwind @@ -1824,7 +1939,7 @@ $ llc -regalloc=pbqp file.bc -o pbqp.s;

    The compact unwind encoding is a 32-bit value, which is encoded in an architecture-specific way. It specifies which registers to restore and from - where, and how to unwind out of the funciton. When the linker creates a final + where, and how to unwind out of the function. When the linker creates a final linked image, it will create a __TEXT,__unwind_info section. This section is a small and fast way for the runtime to access unwind info for any given function. If we emit compact unwind info for the @@ -1920,6 +2035,8 @@ $ llc -regalloc=pbqp file.bc -o pbqp.s;

    + +

    Late Machine Code Optimizations @@ -1990,6 +2107,73 @@ to implement an assembler for your target.

    + +

    + VLIW Packetizer +

    + +
    + +

    In a Very Long Instruction Word (VLIW) architecture, the compiler is + responsible for mapping instructions to functional-units available on + the architecture. To that end, the compiler creates groups of instructions + called packets or bundles. The VLIW packetizer in LLVM is + a target-independent mechanism to enable the packetization of machine + instructions.

    + + + +

    + Mapping from instructions to functional units +

    + +
    + +

    Instructions in a VLIW target can typically be mapped to multiple functional +units. During the process of packetizing, the compiler must be able to reason +about whether an instruction can be added to a packet. This decision can be +complex since the compiler has to examine all possible mappings of instructions +to functional units. Therefore to alleviate compilation-time complexity, the +VLIW packetizer parses the instruction classes of a target and generates tables +at compiler build time. These tables can then be queried by the provided +machine-independent API to determine if an instruction can be accommodated in a +packet.

    +
    + + +

    + + How the packetization tables are generated and used + +

    + +
    + +

    The packetizer reads instruction classes from a target's itineraries and +creates a deterministic finite automaton (DFA) to represent the state of a +packet. A DFA consists of three major elements: inputs, states, and +transitions. The set of inputs for the generated DFA represents the instruction +being added to a packet. The states represent the possible consumption +of functional units by instructions in a packet. In the DFA, transitions from +one state to another occur on the addition of an instruction to an existing +packet. If there is a legal mapping of functional units to instructions, then +the DFA contains a corresponding transition. The absence of a transition +indicates that a legal mapping does not exist and that the instruction cannot +be added to the packet.

    + +

    To generate tables for a VLIW target, add TargetGenDFAPacketizer.inc +as a target to the Makefile in the target directory. The exported API provides +three functions: DFAPacketizer::clearResources(), +DFAPacketizer::reserveResources(MachineInstr *MI), and +DFAPacketizer::canReserveResources(MachineInstr *MI). These functions +allow a target packetizer to add an instruction to an existing packet and to +check whether an instruction can be added to a packet. See +llvm/CodeGen/DFAPacketizer.h for more information.

    + +
    + +
    + @@ -2201,16 +2385,14 @@ is the key:

    Feature ARM - Alpha - Blackfin CellSPU + Hexagon MBlaze MSP430 Mips PTX PowerPC Sparc - SystemZ X86 XCore @@ -2218,16 +2400,14 @@ is the key:

    is generally reliable - - + - + - @@ -2235,16 +2415,14 @@ is the key:

    assembly parser - - + - @@ -2252,16 +2430,14 @@ is the key:

    disassembler - - + - @@ -2269,33 +2445,29 @@ is the key:

    inline asm - - + - - * + jit * - - + - + - @@ -2303,16 +2475,14 @@ is the key:

    .o file writing - - + - @@ -2320,20 +2490,33 @@ is the key:

    tail calls - - + - + + segmented stacks + + + + + + + + + + * + + + @@ -2375,9 +2558,6 @@ disassembling machine opcode bytes into MCInst's.

    This box indicates whether the target supports most popular inline assembly constraints and modifiers.

    -

    X86 lacks reliable support for inline assembly -constraints relating to the X86 floating point stack.

    - @@ -2420,6 +2600,22 @@ more more details.

    + +

    Segmented Stacks

    + +
    + +

    This box indicates whether the target supports segmented stacks. This +replaces the traditional large C stack with many linked segments. It +is compatible with the gcc +implementation used by the Go front end.

    + +

    Basic support exists on the X86 backend. Currently +vararg doesn't work and the object files are not marked the way the gold +linker expects, but simple Go programs can be built by dragonegg.

    + +
    + @@ -2906,6 +3102,70 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory + + + +

    + The PTX backend +

    + +
    + +

    The PTX code generator lives in the lib/Target/PTX directory. It is + currently a work-in-progress, but already supports most of the code + generation functionality needed to generate correct PTX kernels for + CUDA devices.

    + +

    The code generator can target PTX 2.0+, and shader model 1.0+. The + PTX ISA Reference Manual is used as the primary source of ISA + information, though an effort is made to make the output of the code + generator match the output of the NVidia nvcc compiler, whenever + possible.

    + +

    Code Generator Options:

    + + + + + + + + + + + + + + + + + +
    OptionDescription
    doubleIf enabled, the map_f64_to_f32 directive is + disabled in the PTX output, allowing native double-precision + arithmetic
    no-fmaDisable generation of Fused-Multiply Add + instructions, which may be beneficial for some devices
    smxy / computexySet shader model/compute capability to x.y, + e.g. sm20 or compute13
    + +

    Working:

    + + +

    In Progress:

    + + +