X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FCodeGenerator.rst;h=d54df0f6f4b242408be4ea7da114b08e6757fdaf;hb=da07e9df843015f8c306ed7863dbb8c8055fd85f;hp=11174b7bee1fe38090e5be465943da403aac34b5;hpb=2d4a477b48d75b349e0834b6dacbb6fa3aaf87f9;p=oota-llvm.git diff --git a/docs/CodeGenerator.rst b/docs/CodeGenerator.rst index 11174b7bee1..d54df0f6f4b 100644 --- a/docs/CodeGenerator.rst +++ b/docs/CodeGenerator.rst @@ -1,5 +1,3 @@ -.. _code_generator: - ========================================== The LLVM Target-Independent Code Generator ========================================== @@ -17,6 +15,8 @@ The LLVM Target-Independent Code Generator .partial { background-color: #F88017 } .yes { background-color: #0F0; } .yes:before { content: "Y" } + .na { background-color: #6666FF; } + .na:before { content: "N/A" } .. contents:: @@ -172,7 +172,7 @@ architecture. These target descriptions often have a large amount of common information (e.g., an ``add`` instruction is almost identical to a ``sub`` instruction). In order to allow the maximum amount of commonality to be factored out, the LLVM code generator uses the -`TableGen `_ tool to describe big chunks of the +:doc:`TableGen ` tool to describe big chunks of the target machine, which allows the use of domain-specific and target-specific abstractions to reduce the amount of repetition. @@ -230,7 +230,7 @@ for structures, the alignment requirements for various data types, the size of pointers in the target, and whether the target is little-endian or big-endian. -.. _targetlowering: +.. _TargetLowering: The ``TargetLowering`` class ---------------------------- @@ -250,6 +250,8 @@ operations. Among other things, this class indicates: * various high-level characteristics, like whether it is profitable to turn division by a constant into a multiplication sequence. +.. _TargetRegisterInfo: + The ``TargetRegisterInfo`` class -------------------------------- @@ -283,12 +285,10 @@ The ``TargetInstrInfo`` class ----------------------------- The ``TargetInstrInfo`` class is used to describe the machine instructions -supported by the target. It is essentially an array of ``TargetInstrDescriptor`` -objects, each of which describes one instruction the target -supports. Descriptors define things like the mnemonic for the opcode, the number -of operands, the list of implicit register uses and defs, whether the -instruction has certain target-independent properties (accesses memory, is -commutable, etc), and holds any target-specific flags. +supported by the target. Descriptions define things like the mnemonic for +the opcode, the number of operands, the list of implicit register uses and defs, +whether the instruction has certain target-independent properties (accesses +memory, is commutable, etc), and holds any target-specific flags. The ``TargetFrameInfo`` class ----------------------------- @@ -771,6 +771,8 @@ value of type i1, i8, i16, or i64 would be illegal, as would a DAG that uses a SREM or UREM operation. The `legalize types`_ and `legalize operations`_ phases are responsible for turning an illegal DAG into a legal DAG. +.. _SelectionDAG-Process: + SelectionDAG Instruction Selection Process ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -874,7 +876,7 @@ found, the elements are converted to scalars ("scalarizing"). A target implementation tells the legalizer which types are supported (and which register class to use for them) by calling the ``addRegisterClass`` method in -its TargetLowering constructor. +its ``TargetLowering`` constructor. .. _legalize operations: .. _Legalizer: @@ -1036,6 +1038,24 @@ for your target. It has the following strengths: are used to manipulate the input immediate (in this case, take the high or low 16-bits of the immediate). +* When using the 'Pat' class to map a pattern to an instruction that has one + or more complex operands (like e.g. `X86 addressing mode`_), the pattern may + either specify the operand as a whole using a ``ComplexPattern``, or else it + may specify the components of the complex operand separately. The latter is + done e.g. for pre-increment instructions by the PowerPC back end: + + :: + + def STWU : DForm_1<37, (outs ptr_rc:$ea_res), (ins GPRC:$rS, memri:$dst), + "stwu $rS, $dst", LdStStoreUpd, []>, + RegConstraint<"$dst.reg = $ea_res">, NoEncode<"$ea_res">; + + def : Pat<(pre_store GPRC:$rS, ptr_rc:$ptrreg, iaddroff:$ptroff), + (STWU GPRC:$rS, iaddroff:$ptroff, ptr_rc:$ptrreg)>; + + Here, the pair of ``ptroff`` and ``ptrreg`` operands is matched onto the + complex operand ``dst`` of class ``memri`` in the ``STWU`` instruction. + * While the system does automate a lot, it still allows you to write custom C++ code to match special cases if there is something that is hard to express. @@ -1728,6 +1748,8 @@ This section of the document explains features or design decisions that are specific to the code generator for a particular target. First we start with a table that summarizes what features are supported by each target. +.. _target-feature-matrix: + Target Feature Matrix --------------------- @@ -1742,12 +1764,14 @@ the key: :raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` @@ -1767,9 +1791,10 @@ Here is the table: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` @@ -1781,11 +1806,12 @@ Here is the table: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` @@ -1795,9 +1821,10 @@ Here is the table: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` @@ -1809,11 +1836,12 @@ Here is the table: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` @@ -1823,11 +1851,12 @@ Here is the table: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` @@ -1837,11 +1866,12 @@ Here is the table: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` @@ -1851,9 +1881,10 @@ Here is the table: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` @@ -1865,11 +1896,12 @@ Here is the table: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` @@ -1879,9 +1911,10 @@ Here is the table: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` :raw-html:`` @@ -2361,17 +2394,17 @@ Dynamic Allocation TODO - More to come. -The PTX backend ---------------- +The NVPTX backend +----------------- -The PTX code generator lives in the lib/Target/PTX directory. It is currently a -work-in-progress, but already supports most of the code generation functionality -needed to generate correct PTX kernels for CUDA devices. +The NVPTX code generator under lib/Target/NVPTX is an open-source version of +the NVIDIA NVPTX code generator for LLVM. It is contributed by NVIDIA and is +a port of the code generator used in the CUDA compiler (nvcc). It targets the +PTX 3.0/3.1 ISA and can target any compute capability greater than or equal to +2.0 (Fermi). -The code generator can target PTX 2.0+, and shader model 1.0+. The PTX ISA -Reference Manual is used as the primary source of ISA information, though an -effort is made to make the output of the code generator match the output of the -NVidia nvcc compiler, whenever possible. +This target is of production quality and should be completely compatible with +the official NVIDIA toolchain. Code Generator Options: @@ -2381,39 +2414,28 @@ Code Generator Options: :raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` -:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` -:raw-html:`` +:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`` -:raw-html:`` -:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` +:raw-html:`` :raw-html:`` :raw-html:`
UnknownNot ApplicableNo supportPartial SupportComplete Support
MBlazeMSP430MipsPTXNVPTXPowerPCSparcSystemZX86XCore
*
Description
``double``If enabled, the map_f64_to_f32 directive is disabled in the PTX output, allowing native double-precision arithmeticsm_20Set shader model/compute capability to 2.0
sm_21Set shader model/compute capability to 2.1
``no-fma``Disable generation of Fused-Multiply Add instructions, which may be beneficial for some devicessm_30Set shader model/compute capability to 3.0
``smxy / computexy``Set shader model/compute capability to x.y, e.g. sm20 or compute13sm_35Set shader model/compute capability to 3.5
ptx30Target PTX 3.0
ptx31Target PTX 3.1
` -Working: - -* Arithmetic instruction selection (including combo FMA) - -* Bitwise instruction selection - -* Control-flow instruction selection - -* Function calls (only on SM 2.0+ and no return arguments) - -* Addresses spaces (0 = global, 1 = constant, 2 = local, 4 = shared) - -* Thread synchronization (bar.sync) - -* Special register reads ([N]TID, [N]CTAID, PMx, CLOCK, etc.) - -In Progress: - -* Robust call instruction selection - -* Stack frame allocation - -* Device-specific instruction scheduling optimizations