X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;ds=sidebyside;f=docs%2FCodeGenerator.rst;h=d54df0f6f4b242408be4ea7da114b08e6757fdaf;hb=0a230e0d985625a3909cb78fd867a3abaf434565;hp=900fb8a81f2156987e40b4406b46fa6fbb03241c;hpb=97d6abee58da99c41b26b99d724d026d4c73791a;p=oota-llvm.git diff --git a/docs/CodeGenerator.rst b/docs/CodeGenerator.rst index 900fb8a81f2..d54df0f6f4b 100644 --- a/docs/CodeGenerator.rst +++ b/docs/CodeGenerator.rst @@ -1,5 +1,3 @@ -.. _code_generator: - ========================================== The LLVM Target-Independent Code Generator ========================================== @@ -17,6 +15,8 @@ The LLVM Target-Independent Code Generator .partial { background-color: #F88017 } .yes { background-color: #0F0; } .yes:before { content: "Y" } + .na { background-color: #6666FF; } + .na:before { content: "N/A" } .. contents:: @@ -172,7 +172,7 @@ architecture. These target descriptions often have a large amount of common information (e.g., an ``add`` instruction is almost identical to a ``sub`` instruction). In order to allow the maximum amount of commonality to be factored out, the LLVM code generator uses the -`TableGen `_ tool to describe big chunks of the +:doc:`TableGen ` tool to describe big chunks of the target machine, which allows the use of domain-specific and target-specific abstractions to reduce the amount of repetition. @@ -224,13 +224,13 @@ The ``DataLayout`` class ------------------------ The ``DataLayout`` class is the only required target description class, and it -is the only class that is not extensible (you cannot derived a new class from +is the only class that is not extensible (you cannot derive a new class from it). ``DataLayout`` specifies information about how the target lays out memory for structures, the alignment requirements for various data types, the size of pointers in the target, and whether the target is little-endian or big-endian. -.. _targetlowering: +.. _TargetLowering: The ``TargetLowering`` class ---------------------------- @@ -248,7 +248,9 @@ operations. Among other things, this class indicates: * the type to use for shift amounts, and * various high-level characteristics, like whether it is profitable to turn - division by a constant into a multiplication sequence + division by a constant into a multiplication sequence. + +.. _TargetRegisterInfo: The ``TargetRegisterInfo`` class -------------------------------- @@ -283,12 +285,10 @@ The ``TargetInstrInfo`` class ----------------------------- The ``TargetInstrInfo`` class is used to describe the machine instructions -supported by the target. It is essentially an array of ``TargetInstrDescriptor`` -objects, each of which describes one instruction the target -supports. Descriptors define things like the mnemonic for the opcode, the number -of operands, the list of implicit register uses and defs, whether the -instruction has certain target-independent properties (accesses memory, is -commutable, etc), and holds any target-specific flags. +supported by the target. Descriptions define things like the mnemonic for +the opcode, the number of operands, the list of implicit register uses and defs, +whether the instruction has certain target-independent properties (accesses +memory, is commutable, etc), and holds any target-specific flags. The ``TargetFrameInfo`` class ----------------------------- @@ -771,6 +771,8 @@ value of type i1, i8, i16, or i64 would be illegal, as would a DAG that uses a SREM or UREM operation. The `legalize types`_ and `legalize operations`_ phases are responsible for turning an illegal DAG into a legal DAG. +.. _SelectionDAG-Process: + SelectionDAG Instruction Selection Process ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -838,8 +840,7 @@ Initial SelectionDAG Construction ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The initial SelectionDAG is na\ :raw-html:`ï`\ vely peephole expanded from -the LLVM input by the ``SelectionDAGLowering`` class in the -``lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp`` file. The intent of this pass +the LLVM input by the ``SelectionDAGBuilder`` class. The intent of this pass is to expose as much low-level, target-specific details to the SelectionDAG as possible. This pass is mostly hard-coded (e.g. an LLVM ``add`` turns into an ``SDNode add`` while a ``getelementptr`` is expanded into the obvious @@ -875,7 +876,7 @@ found, the elements are converted to scalars ("scalarizing"). A target implementation tells the legalizer which types are supported (and which register class to use for them) by calling the ``addRegisterClass`` method in -its TargetLowering constructor. +its ``TargetLowering`` constructor. .. _legalize operations: .. _Legalizer: @@ -969,7 +970,8 @@ The ``FADDS`` instruction is a simple binary single-precision add instruction. To perform this pattern match, the PowerPC backend includes the following instruction definitions: -:: +.. code-block:: text + :emphasize-lines: 4-5,9 def FMADDS : AForm_1<59, 29, (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB), @@ -981,10 +983,10 @@ instruction definitions: "fadds $FRT, $FRA, $FRB", [(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>; -The portion of the instruction definition in bold indicates the pattern used to -match the instruction. The DAG operators (like ``fmul``/``fadd``) are defined -in the ``include/llvm/Target/TargetSelectionDAG.td`` file. " ``F4RC``" is the -register class of the input and result values. +The highlighted portion of the instruction definitions indicates the pattern +used to match the instructions. The DAG operators (like ``fmul``/``fadd``) +are defined in the ``include/llvm/Target/TargetSelectionDAG.td`` file. +"``F4RC``" is the register class of the input and result values. The TableGen DAG instruction selector generator reads the instruction patterns in the ``.td`` file and automatically builds parts of the pattern matching code @@ -1036,6 +1038,24 @@ for your target. It has the following strengths: are used to manipulate the input immediate (in this case, take the high or low 16-bits of the immediate). +* When using the 'Pat' class to map a pattern to an instruction that has one + or more complex operands (like e.g. `X86 addressing mode`_), the pattern may + either specify the operand as a whole using a ``ComplexPattern``, or else it + may specify the components of the complex operand separately. The latter is + done e.g. for pre-increment instructions by the PowerPC back end: + + :: + + def STWU : DForm_1<37, (outs ptr_rc:$ea_res), (ins GPRC:$rS, memri:$dst), + "stwu $rS, $dst", LdStStoreUpd, []>, + RegConstraint<"$dst.reg = $ea_res">, NoEncode<"$ea_res">; + + def : Pat<(pre_store GPRC:$rS, ptr_rc:$ptrreg, iaddroff:$ptroff), + (STWU GPRC:$rS, iaddroff:$ptroff, ptr_rc:$ptrreg)>; + + Here, the pair of ``ptroff`` and ``ptrreg`` operands is matched onto the + complex operand ``dst`` of class ``memri`` in the ``STWU`` instruction. + * While the system does automate a lot, it still allows you to write custom C++ code to match special cases if there is something that is hard to express. @@ -1728,6 +1748,8 @@ This section of the document explains features or design decisions that are specific to the code generator for a particular target. First we start with a table that summarizes what features are supported by each target. +.. _target-feature-matrix: + Target Feature Matrix --------------------- @@ -1742,12 +1764,14 @@ the key: :raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` @@ -1763,14 +1787,14 @@ Here is the table: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` @@ -1778,29 +1802,29 @@ Here is the table: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` @@ -1808,59 +1832,59 @@ Here is the table: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` @@ -1868,29 +1892,29 @@ Here is the table: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` @@ -1992,8 +2016,8 @@ Tail call optimization Tail call optimization, callee reusing the stack of the caller, is currently supported on x86/x86-64 and PowerPC. It is performed if: -* Caller and callee have the calling convention ``fastcc`` or ``cc 10`` (GHC - call convention). +* Caller and callee have the calling convention ``fastcc``, ``cc 10`` (GHC + calling convention) or ``cc 11`` (HiPE calling convention). * The call is a tail call - in tail position (ret immediately follows call and ret uses value of call or is void). @@ -2370,17 +2394,17 @@ Dynamic Allocation TODO - More to come. -The PTX backend ---------------- +The NVPTX backend +----------------- -The PTX code generator lives in the lib/Target/PTX directory. It is currently a -work-in-progress, but already supports most of the code generation functionality -needed to generate correct PTX kernels for CUDA devices. +The NVPTX code generator under lib/Target/NVPTX is an open-source version of +the NVIDIA NVPTX code generator for LLVM. It is contributed by NVIDIA and is +a port of the code generator used in the CUDA compiler (nvcc). It targets the +PTX 3.0/3.1 ISA and can target any compute capability greater than or equal to +2.0 (Fermi). -The code generator can target PTX 2.0+, and shader model 1.0+. The PTX ISA -Reference Manual is used as the primary source of ISA information, though an -effort is made to make the output of the code generator match the output of the -NVidia nvcc compiler, whenever possible. +This target is of production quality and should be completely compatible with +the official NVIDIA toolchain. Code Generator Options: @@ -2390,39 +2414,28 @@ Code Generator Options: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` -:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` -:raw-html:`` +:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` -:raw-html:`` +:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`
UnknownNot ApplicableNo supportPartial SupportComplete Support
FeatureARMCellSPUHexagonMBlazeMSP430MipsPTXNVPTXPowerPCSparcSystemZX86XCore
is generally reliable
assembly parser
disassembler
inline asm
jit*
.o file writing
tail calls
segmented stacks *
Description
``double``If enabled, the map_f64_to_f32 directive is disabled in the PTX output, allowing native double-precision arithmeticsm_20Set shader model/compute capability to 2.0
sm_21Set shader model/compute capability to 2.1
sm_30Set shader model/compute capability to 3.0
sm_35Set shader model/compute capability to 3.5
``no-fma``Disable generation of Fused-Multiply Add instructions, which may be beneficial for some devicesptx30Target PTX 3.0
``smxy / computexy``Set shader model/compute capability to x.y, e.g. sm20 or compute13ptx31Target PTX 3.1
` -Working: - -* Arithmetic instruction selection (including combo FMA) - -* Bitwise instruction selection - -* Control-flow instruction selection - -* Function calls (only on SM 2.0+ and no return arguments) - -* Addresses spaces (0 = global, 1 = constant, 2 = local, 4 = shared) - -* Thread synchronization (bar.sync) - -* Special register reads ([N]TID, [N]CTAID, PMx, CLOCK, etc.) - -In Progress: - -* Robust call instruction selection - -* Stack frame allocation - -* Device-specific instruction scheduling optimizations