docs/TableGen/index.rst

   1 ========
   2 TableGen
   3 ========
   4
   5 .. contents::
   6    :local:
   7
   8 .. toctree::
   9    :hidden:
  10
  11    BackEnds
  12    LangRef
  13    Deficiencies
  14
  15 Introduction
  16 ============
  17
  18 TableGen's purpose is to help a human develop and maintain records of
  19 domain-specific information.  Because there may be a large number of these
  20 records, it is specifically designed to allow writing flexible descriptions and
  21 for common features of these records to be factored out.  This reduces the
  22 amount of duplication in the description, reduces the chance of error, and makes
  23 it easier to structure domain specific information.
  24
  25 The core part of TableGen parses a file, instantiates the declarations, and
  26 hands the result off to a domain-specific `backend`_ for processing.
  27
  28 The current major users of TableGen are :doc:`../CodeGenerator`
  29 and the
  30 `Clang diagnostics and attributes <http://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_.
  31
  32 Note that if you work on TableGen much, and use emacs or vim, that you can find
  33 an emacs "TableGen mode" and a vim language file in the ``llvm/utils/emacs`` and
  34 ``llvm/utils/vim`` directories of your LLVM distribution, respectively.
  35
  36 .. _intro:
  37
  38
  39 The TableGen program
  40 ====================
  41
  42 TableGen files are interpreted by the TableGen program: `llvm-tblgen` available
  43 on your build directory under `bin`. It is not installed in the system (or where
  44 your sysroot is set to), since it has no use beyond LLVM's build process.
  45
  46 Running TableGen
  47 ----------------
  48
  49 TableGen runs just like any other LLVM tool.  The first (optional) argument
  50 specifies the file to read.  If a filename is not specified, ``llvm-tblgen``
  51 reads from standard input.
  52
  53 To be useful, one of the `backends`_ must be used.  These backends are
  54 selectable on the command line (type '``llvm-tblgen -help``' for a list).  For
  55 example, to get a list of all of the definitions that subclass a particular type
  56 (which can be useful for building up an enum list of these records), use the
  57 ``-print-enums`` option:
  58
  59 .. code-block:: bash
  60
  61   $ llvm-tblgen X86.td -print-enums -class=Register
  62   AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX,
  63   ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP,
  64   MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D,
  65   R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15,
  66   R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI,
  67   RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,
  68   XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5,
  69   XMM6, XMM7, XMM8, XMM9,
  70
  71   $ llvm-tblgen X86.td -print-enums -class=Instruction
  72   ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri,
  73   ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8,
  74   ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm,
  75   ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr,
  76   ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ...
  77
  78 The default backend prints out all of the records.
  79
  80 If you plan to use TableGen, you will most likely have to write a `backend`_
  81 that extracts the information specific to what you need and formats it in the
  82 appropriate way.
  83
  84 Example
  85 -------
  86
  87 With no other arguments, `llvm-tblgen` parses the specified file and prints out all
  88 of the classes, then all of the definitions.  This is a good way to see what the
  89 various definitions expand to fully.  Running this on the ``X86.td`` file prints
  90 this (at the time of this writing):
  91
  92 .. code-block:: llvm
  93
  94   ...
  95   def ADD32rr {   // Instruction X86Inst I
  96     string Namespace = "X86";
  97     dag OutOperandList = (outs GR32:$dst);
  98     dag InOperandList = (ins GR32:$src1, GR32:$src2);
  99     string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}";
 100     list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))];
 101     list<Register> Uses = [];
 102     list<Register> Defs = [EFLAGS];
 103     list<Predicate> Predicates = [];
 104     int CodeSize = 3;
 105     int AddedComplexity = 0;
 106     bit isReturn = 0;
 107     bit isBranch = 0;
 108     bit isIndirectBranch = 0;
 109     bit isBarrier = 0;
 110     bit isCall = 0;
 111     bit canFoldAsLoad = 0;
 112     bit mayLoad = 0;
 113     bit mayStore = 0;
 114     bit isImplicitDef = 0;
 115     bit isConvertibleToThreeAddress = 1;
 116     bit isCommutable = 1;
 117     bit isTerminator = 0;
 118     bit isReMaterializable = 0;
 119     bit isPredicable = 0;
 120     bit hasDelaySlot = 0;
 121     bit usesCustomInserter = 0;
 122     bit hasCtrlDep = 0;
 123     bit isNotDuplicable = 0;
 124     bit hasSideEffects = 0;
 125     bit neverHasSideEffects = 0;
 126     InstrItinClass Itinerary = NoItinerary;
 127     string Constraints = "";
 128     string DisableEncoding = "";
 129     bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 };
 130     Format Form = MRMDestReg;
 131     bits<6> FormBits = { 0, 0, 0, 0, 1, 1 };
 132     ImmType ImmT = NoImm;
 133     bits<3> ImmTypeBits = { 0, 0, 0 };
 134     bit hasOpSizePrefix = 0;
 135     bit hasAdSizePrefix = 0;
 136     bits<4> Prefix = { 0, 0, 0, 0 };
 137     bit hasREX_WPrefix = 0;
 138     FPFormat FPForm = ?;
 139     bits<3> FPFormBits = { 0, 0, 0 };
 140   }
 141   ...
 142
 143 This definition corresponds to the 32-bit register-register ``add`` instruction
 144 of the x86 architecture.  ``def ADD32rr`` defines a record named
 145 ``ADD32rr``, and the comment at the end of the line indicates the superclasses
 146 of the definition.  The body of the record contains all of the data that
 147 TableGen assembled for the record, indicating that the instruction is part of
 148 the "X86" namespace, the pattern indicating how the instruction is selected by
 149 the code generator, that it is a two-address instruction, has a particular
 150 encoding, etc.  The contents and semantics of the information in the record are
 151 specific to the needs of the X86 backend, and are only shown as an example.
 152
 153 As you can see, a lot of information is needed for every instruction supported
 154 by the code generator, and specifying it all manually would be unmaintainable,
 155 prone to bugs, and tiring to do in the first place.  Because we are using
 156 TableGen, all of the information was derived from the following definition:
 157
 158 .. code-block:: llvm
 159
 160   let Defs = [EFLAGS],
 161       isCommutable = 1,                  // X = ADD Y,Z --> X = ADD Z,Y
 162       isConvertibleToThreeAddress = 1 in // Can transform into LEA.
 163   def ADD32rr  : I<0x01, MRMDestReg, (outs GR32:$dst),
 164                                      (ins GR32:$src1, GR32:$src2),
 165                    "add{l}\t{$src2, $dst|$dst, $src2}",
 166                    [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
 167
 168 This definition makes use of the custom class ``I`` (extended from the custom
 169 class ``X86Inst``), which is defined in the X86-specific TableGen file, to
 170 factor out the common features that instructions of its class share.  A key
 171 feature of TableGen is that it allows the end-user to define the abstractions
 172 they prefer to use when describing their information.
 173
 174 Each ``def`` record has a special entry called "NAME".  This is the name of the
 175 record ("``ADD32rr``" above).  In the general case ``def`` names can be formed
 176 from various kinds of string processing expressions and ``NAME`` resolves to the
 177 final value obtained after resolving all of those expressions.  The user may
 178 refer to ``NAME`` anywhere she desires to use the ultimate name of the ``def``.
 179 ``NAME`` should not be defined anywhere else in user code to avoid conflicts.
 180
 181 Syntax
 182 ======
 183
 184 TableGen has a syntax that is loosely based on C++ templates, with built-in
 185 types and specification. In addition, TableGen's syntax introduces some
 186 automation concepts like multiclass, foreach, let, etc.
 187
 188 Basic concepts
 189 --------------
 190
 191 TableGen files consist of two key parts: 'classes' and 'definitions', both of
 192 which are considered 'records'.
 193
 194 **TableGen records** have a unique name, a list of values, and a list of
 195 superclasses.  The list of values is the main data that TableGen builds for each
 196 record; it is this that holds the domain specific information for the
 197 application.  The interpretation of this data is left to a specific `backends`_,
 198 but the structure and format rules are taken care of and are fixed by
 199 TableGen.
 200
 201 **TableGen definitions** are the concrete form of 'records'.  These generally do
 202 not have any undefined values, and are marked with the '``def``' keyword.
 203
 204 .. code-block:: llvm
 205
 206   def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true",
 207                                         "Enable ARMv8 FP">;
 208
 209 In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised
 210 with some values. The names of the classes are defined via the
 211 keyword `class` either on the same file or some other included. Most target
 212 TableGen files include the generic ones in ``include/llvm/Target``.
 213
 214 **TableGen classes** are abstract records that are used to build and describe
 215 other records.  These classes allow the end-user to build abstractions for
 216 either the domain they are targeting (such as "Register", "RegisterClass", and
 217 "Instruction" in the LLVM code generator) or for the implementor to help factor
 218 out common properties of records (such as "FPInst", which is used to represent
 219 floating point instructions in the X86 backend).  TableGen keeps track of all of
 220 the classes that are used to build up a definition, so the backend can find all
 221 definitions of a particular class, such as "Instruction".
 222
 223 .. code-block:: llvm
 224
 225  class ProcNoItin<string Name, list<SubtargetFeature> Features>
 226        : Processor<Name, NoItineraries, Features>;
 227
 228 Here, the class ProcNoItin, receiving parameters `Name` of type `string` and
 229 a list of target features is specializing the class Processor by passing the
 230 arguments down as well as hard-coding NoItineraries.
 231
 232 **TableGen multiclasses** are groups of abstract records that are instantiated
 233 all at once.  Each instantiation can result in multiple TableGen definitions.
 234 If a multiclass inherits from another multiclass, the definitions in the
 235 sub-multiclass become part of the current multiclass, as if they were declared
 236 in the current multiclass.
 237
 238 .. code-block:: llvm
 239
 240   multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend,
 241                           dag address, ValueType sty> {
 242   def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)),
 243             (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset")
 244               Base, Offset, Extend)>;
 245
 246   def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)),
 247             (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset")
 248               Base, Offset, Extend)>;
 249   }
 250
 251   defm : ro_signed_pats<"B", Rm, Base, Offset, Extend,
 252                         !foreach(decls.pattern, address,
 253                                  !subst(SHIFT, imm_eq0, decls.pattern)),
 254                         i8>;
 255
 256
 257
 258 See the `TableGen Language Reference <LangRef.html>`_ for more information.
 259
 260 .. _backend:
 261 .. _backends:
 262
 263 TableGen backends
 264 =================
 265
 266 TableGen files have no real meaning without a back-end. The default operation
 267 of running ``llvm-tblgen`` is to print the information in a textual format, but
 268 that's only useful for debugging of the TableGen files themselves. The power
 269 in TableGen is, however, to interpret the source files into an internal
 270 representation that can be generated into anything you want.
 271
 272 Current usage of TableGen is to create include huge files with tables that you
 273 can either include directly (if the output is in the language you're coding),
 274 or be used in pre-processing via macros surrounding the include of the file.
 275
 276 Direct output can be used if the back-end already prints a table in C format
 277 or if the output is just a list of strings (for error and warning messages).
 278 Pre-processed output should be used if the same information needs to be used
 279 in different contexts (like Instruction names), so your back-end should print
 280 a meta-information list that can be shaped into different compile-time formats.
 281
 282 See the `TableGen BackEnds <BackEnds.html>`_ for more information.
 283
 284 TableGen Deficiencies
 285 =====================
 286
 287 Despite being very generic, TableGen has some deficiencies that have been
 288 pointed out numerous times. The common theme is that, while TableGen allows
 289 you to build Domain-Specific-Languages, the final languages that you create
 290 lack the power of other DSLs, which in turn increase considerably the size
 291 and complecity of TableGen files.
 292
 293 At the same time, TableGen allows you to create virtually any meaning of
 294 the basic concepts via custom-made back-ends, which can pervert the original
 295 design and make it very hard for newcomers to understand the evil TableGen
 296 file.
 297
 298 There are some in favour of extending the semantics even more, but makeing sure
 299 back-ends adhere to strict rules. Others suggesting we should move to less,
 300 more powerful DSLs designed with specific purposes, or even re-using existing
 301 DSLs.
 302
 303 Either way, this is a discussion that is likely spanning across several years,
 304 if not decades. You can read more in the `TableGen Deficiencies <Deficiencies.html>`_
 305 document.