1 ===================================
2 Customizing LLVMC: Reference Manual
3 ===================================
5 LLVMC is a generic compiler driver, designed to be customizable and
6 extensible. It plays the same role for LLVM as the ``gcc`` program
7 does for GCC - LLVMC's job is essentially to transform a set of input
8 files into a set of targets depending on configuration rules and user
9 options. What makes LLVMC different is that these transformation rules
10 are completely customizable - in fact, LLVMC knows nothing about the
11 specifics of transformation (even the command-line options are mostly
12 not hard-coded) and regards the transformation structure as an
13 abstract graph. This makes it possible to adapt LLVMC for other
14 purposes - for example, as a build tool for game resources.
16 Because LLVMC employs TableGen [1]_ as its configuration language, you
17 need to be familiar with it to customize LLVMC.
26 LLVMC tries hard to be as compatible with ``gcc`` as possible,
27 although there are some small differences. Most of the time, however,
28 you shouldn't be able to notice them::
30 $ # This works as expected:
31 $ llvmc2 -O3 -Wall hello.cpp
35 One nice feature of LLVMC is that one doesn't have to distinguish
36 between different compilers for different languages (think ``g++`` and
37 ``gcc``) - the right toolchain is chosen automatically based on input
38 language names (which are, in turn, determined from file
39 extensions). If you want to force files ending with ".c" to compile as
40 C++, use the ``-x`` option, just like you would do it with ``gcc``::
42 $ llvmc2 -x c hello.cpp
43 $ # hello.cpp is really a C file
47 On the other hand, when using LLVMC as a linker to combine several C++
48 object files you should provide the ``--linker`` option since it's
49 impossible for LLVMC to choose the right linker in that case::
53 [A lot of link-time errors skipped]
54 $ llvmc2 --linker=c++ hello.o
61 LLVMC has some built-in options that can't be overridden in the
64 * ``-o FILE`` - Output file name.
66 * ``-x LANGUAGE`` - Specify the language of the following input files
67 until the next -x option.
69 * ``-v`` - Enable verbose mode, i.e. print out all executed commands.
71 * ``--view-graph`` - Show a graphical representation of the compilation
72 graph. Requires that you have ``dot`` and ``gv`` commands
73 installed. Hidden option, useful for debugging.
75 * ``--write-graph`` - Write a ``compilation-graph.dot`` file in the
76 current directory with the compilation graph description in the
77 Graphviz format. Hidden option, useful for debugging.
79 * ``--save-temps`` - Write temporary files to the current directory
80 and do not delete them on exit. Hidden option, useful for debugging.
82 * ``--help``, ``--help-hidden``, ``--version`` - These options have
83 their standard meaning.
86 Customizing LLVMC: the compilation graph
87 ========================================
89 At the time of writing LLVMC does not support on-the-fly reloading of
90 configuration, so to customize LLVMC you'll have to recompile the
91 source code (which lives under ``$LLVM_DIR/tools/llvmc2``). The
92 default configuration files are ``Common.td`` (contains common
93 definitions, don't forget to ``include`` it in your configuration
94 files), ``Tools.td`` (tool descriptions) and ``Graph.td`` (compilation
97 To compile LLVMC with your own configuration file (say,``MyGraph.td``),
98 run ``make`` like this::
100 $ cd $LLVM_DIR/tools/llvmc2
101 $ make GRAPH=MyGraph.td TOOLNAME=my_llvmc
103 This will build an executable named ``my_llvmc``. There are also
104 several sample configuration files in the ``llvmc2/examples``
105 subdirectory that should help to get you started.
107 Internally, LLVMC stores information about possible source
108 transformations in form of a graph. Nodes in this graph represent
109 tools, and edges between two nodes represent a transformation path. A
110 special "root" node is used to mark entry points for the
111 transformations. LLVMC also assigns a weight to each edge (more on
112 this later) to choose between several alternative edges.
114 The definition of the compilation graph (see file ``Graph.td``) is
115 just a list of edges::
117 def CompilationGraph : CompilationGraph<[
118 Edge<root, llvm_gcc_c>,
119 Edge<root, llvm_gcc_assembler>,
122 Edge<llvm_gcc_c, llc>,
123 Edge<llvm_gcc_cpp, llc>,
126 OptionalEdge<llvm_gcc_c, opt, [(switch_on "opt")]>,
127 OptionalEdge<llvm_gcc_cpp, opt, [(switch_on "opt")]>,
130 OptionalEdge<llvm_gcc_assembler, llvm_gcc_cpp_linker,
131 (case (input_languages_contain "c++"), (inc_weight),
132 (or (parameter_equals "linker", "g++"),
133 (parameter_equals "linker", "c++")), (inc_weight))>,
138 As you can see, the edges can be either default or optional, where
139 optional edges are differentiated by sporting a ``case`` expression
140 used to calculate the edge's weight.
142 The default edges are assigned a weight of 1, and optional edges get a
143 weight of 0 + 2*N where N is the number of tests that evaluated to
144 true in the ``case`` expression. It is also possible to provide an
145 integer parameter to ``inc_weight`` and ``dec_weight`` - in this case,
146 the weight is increased (or decreased) by the provided value instead
149 When passing an input file through the graph, LLVMC picks the edge
150 with the maximum weight. To avoid ambiguity, there should be only one
151 default edge between two nodes (with the exception of the root node,
152 which gets a special treatment - there you are allowed to specify one
153 default edge *per language*).
155 To get a visual representation of the compilation graph (useful for
156 debugging), run ``llvmc2 --view-graph``. You will need ``dot`` and
157 ``gsview`` installed for this to work properly.
160 Writing a tool description
161 ==========================
163 As was said earlier, nodes in the compilation graph represent tools,
164 which are described separately. A tool definition looks like this
165 (taken from the ``Tools.td`` file)::
167 def llvm_gcc_cpp : Tool<[
169 (out_language "llvm-assembler"),
170 (output_suffix "bc"),
171 (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"),
175 This defines a new tool called ``llvm_gcc_cpp``, which is an alias for
176 ``llvm-g++``. As you can see, a tool definition is just a list of
177 properties; most of them should be self-explanatory. The ``sink``
178 property means that this tool should be passed all command-line
179 options that lack explicit descriptions.
181 The complete list of the currently implemented tool properties follows:
183 * Possible tool properties:
185 - ``in_language`` - input language name.
187 - ``out_language`` - output language name.
189 - ``output_suffix`` - output file suffix.
191 - ``cmd_line`` - the actual command used to run the tool. You can
192 use ``$INFILE`` and ``$OUTFILE`` variables, output redirection
193 with ``>``, hook invocations (``$CALL``), environment variables
194 (via ``$ENV``) and the ``case`` construct (more on this below).
196 - ``join`` - this tool is a "join node" in the graph, i.e. it gets a
197 list of input files and joins them together. Used for linkers.
199 - ``sink`` - all command-line options that are not handled by other
200 tools are passed to this tool.
202 The next tool definition is slightly more complex::
204 def llvm_gcc_linker : Tool<[
205 (in_language "object-code"),
206 (out_language "executable"),
207 (output_suffix "out"),
208 (cmd_line "llvm-gcc $INFILE -o $OUTFILE"),
210 (prefix_list_option "L", (forward),
211 (help "add a directory to link path")),
212 (prefix_list_option "l", (forward),
213 (help "search a library when linking")),
214 (prefix_list_option "Wl", (unpack_values),
215 (help "pass options to linker"))
218 This tool has a "join" property, which means that it behaves like a
219 linker. This tool also defines several command-line options: ``-l``,
220 ``-L`` and ``-Wl`` which have their usual meaning. An option has two
221 attributes: a name and a (possibly empty) list of properties. All
222 currently implemented option types and properties are described below:
224 * Possible option types:
226 - ``switch_option`` - a simple boolean switch, for example ``-time``.
228 - ``parameter_option`` - option that takes an argument, for example
231 - ``parameter_list_option`` - same as the above, but more than one
232 occurence of the option is allowed.
234 - ``prefix_option`` - same as the parameter_option, but the option name
235 and parameter value are not separated.
237 - ``prefix_list_option`` - same as the above, but more than one
238 occurence of the option is allowed; example: ``-lm -lpthread``.
240 - ``alias_option`` - a special option type for creating
241 aliases. Unlike other option types, aliases are not allowed to
242 have any properties besides the aliased option name. Usage
243 example: ``(alias_option "preprocess", "E")``
246 * Possible option properties:
248 - ``append_cmd`` - append a string to the tool invocation command.
250 - ``forward`` - forward this option unchanged.
252 - ``output_suffix`` - modify the output suffix of this
253 tool. Example : ``(switch "E", (output_suffix "i")``.
255 - ``stop_compilation`` - stop compilation after this phase.
257 - ``unpack_values`` - used for for splitting and forwarding
258 comma-separated lists of options, e.g. ``-Wa,-foo=bar,-baz`` is
259 converted to ``-foo=bar -baz`` and appended to the tool invocation
262 - ``help`` - help string associated with this option. Used for
265 - ``required`` - this option is obligatory.
268 Option list - specifying all options in a single place
269 ======================================================
271 It can be handy to have all information about options gathered in a
272 single place to provide an overview. This can be achieved by using a
273 so-called ``OptionList``::
275 def Options : OptionList<[
276 (switch_option "E", (help "Help string")),
277 (alias_option "quiet", "q")
281 ``OptionList`` is also a good place to specify option aliases.
283 Tool-specific option properties like ``append_cmd`` have (obviously)
284 no meaning in the context of ``OptionList``, so the only properties
285 allowed there are ``help`` and ``required``.
287 Option lists are used at the file scope. See file
288 ``examples/Clang.td`` for an example of ``OptionList`` usage.
290 Using hooks and environment variables in the ``cmd_line`` property
291 ==================================================================
293 Normally, LLVMC executes programs from the system ``PATH``. Sometimes,
294 this is not sufficient: for example, we may want to specify tool names
295 in the configuration file. This can be achieved via the mechanism of
296 hooks - to compile LLVMC with your hooks, just drop a .cpp file into
297 ``tools/llvmc2`` directory. Hooks should live in the ``hooks``
298 namespace and have the signature ``std::string hooks::MyHookName
299 (void)``. They can be used from the ``cmd_line`` tool property::
301 (cmd_line "$CALL(MyHook)/path/to/file -o $CALL(AnotherHook)")
303 It is also possible to use environment variables in the same manner::
305 (cmd_line "$ENV(VAR1)/path/to/file -o $ENV(VAR2)")
307 To change the command line string based on user-provided options use
308 the ``case`` expression (documented below)::
313 "llvm-g++ -E -x c $INFILE -o $OUTFILE",
315 "llvm-g++ -c -x c $INFILE -o $OUTFILE -emit-llvm"))
317 Conditional evaluation: the ``case`` expression
318 ===============================================
320 The 'case' construct can be used to calculate weights of the optional
321 edges and to choose between several alternative command line strings
322 in the ``cmd_line`` tool property. It is designed after the
323 similarly-named construct in functional languages and takes the form
324 ``(case (test_1), statement_1, (test_2), statement_2, ... (test_N),
325 statement_N)``. The statements are evaluated only if the corresponding
326 tests evaluate to true.
330 // Increases edge weight by 5 if "-A" is provided on the
331 // command-line, and by 5 more if "-B" is also provided.
333 (switch_on "A"), (inc_weight 5),
334 (switch_on "B"), (inc_weight 5))
336 // Evaluates to "cmdline1" if option "-A" is provided on the
337 // command line, otherwise to "cmdline2"
339 (switch_on "A"), "cmdline1",
340 (switch_on "B"), "cmdline2",
341 (default), "cmdline3")
343 Note the slight difference in 'case' expression handling in contexts
344 of edge weights and command line specification - in the second example
345 the value of the ``"B"`` switch is never checked when switch ``"A"`` is
346 enabled, and the whole expression always evaluates to ``"cmdline1"`` in
349 Case expressions can also be nested, i.e. the following is legal::
351 (case (switch_on "E"), (case (switch_on "o"), ..., (default), ...)
354 You should, however, try to avoid doing that because it hurts
355 readability. It is usually better to split tool descriptions and/or
356 use TableGen inheritance instead.
358 * Possible tests are:
360 - ``switch_on`` - Returns true if a given command-line option is
361 provided by the user. Example: ``(switch_on "opt")``. Note that
362 you have to define all possible command-line options separately in
363 the tool descriptions. See the next section for the discussion of
364 different kinds of command-line options.
366 - ``parameter_equals`` - Returns true if a command-line parameter equals
367 a given value. Example: ``(parameter_equals "W", "all")``.
369 - ``element_in_list`` - Returns true if a command-line parameter list
370 includes a given value. Example: ``(parameter_in_list "l", "pthread")``.
372 - ``input_languages_contain`` - Returns true if a given language
373 belongs to the current input language set. Example:
374 ```(input_languages_contain "c++")``.
376 - ``in_language`` - Evaluates to true if the language of the input
377 file equals to the argument. Valid only when using ``case``
378 expression in a ``cmd_line`` tool property. Example:
379 ```(in_language "c++")``.
381 - ``not_empty`` - Returns true if a given option (which should be
382 either a parameter or a parameter list) is set by the
383 user. Example: ```(not_empty "o")``.
385 - ``default`` - Always evaluates to true. Should always be the last
386 test in the ``case`` expression.
388 - ``and`` - A standard logical combinator that returns true iff all
389 of its arguments return true. Used like this: ``(and (test1),
390 (test2), ... (testN))``. Nesting of ``and`` and ``or`` is allowed,
393 - ``or`` - Another logical combinator that returns true only if any
394 one of its arguments returns true. Example: ``(or (test1),
395 (test2), ... (testN))``.
401 One last thing that you will need to modify when adding support for a
402 new language to LLVMC is the language map, which defines mappings from
403 file extensions to language names. It is used to choose the proper
404 toolchain(s) for a given input file set. Language map definition is
405 located in the file ``Tools.td`` and looks like this::
407 def LanguageMap : LanguageMap<
408 [LangToSuffixes<"c++", ["cc", "cp", "cxx", "cpp", "CPP", "c++", "C"]>,
409 LangToSuffixes<"c", ["c"]>,
417 .. [1] TableGen Fundamentals
418 http://llvm.cs.uiuc.edu/docs/TableGenFundamentals.html