1 ===================================
2 Customizing LLVMC: Reference Manual
3 ===================================
5 LLVMC is a generic compiler driver, designed to be customizable and
6 extensible. It plays the same role for LLVM as the ``gcc`` program
7 does for GCC - LLVMC's job is essentially to transform a set of input
8 files into a set of targets depending on configuration rules and user
9 options. What makes LLVMC different is that these transformation rules
10 are completely customizable - in fact, LLVMC knows nothing about the
11 specifics of transformation (even the command-line options are mostly
12 not hard-coded) and regards the transformation structure as an
13 abstract graph. This makes it possible to adapt LLVMC for other
14 purposes - for example, as a build tool for game resources.
16 Because LLVMC employs TableGen [1]_ as its configuration language, you
17 need to be familiar with it to customize LLVMC.
26 LLVMC tries hard to be as compatible with ``gcc`` as possible,
27 although there are some small differences. Most of the time, however,
28 you shouldn't be able to notice them::
30 $ # This works as expected:
31 $ llvmc2 -O3 -Wall hello.cpp
35 One nice feature of LLVMC is that one doesn't have to distinguish
36 between different compilers for different languages (think ``g++`` and
37 ``gcc``) - the right toolchain is chosen automatically based on input
38 language names (which are, in turn, determined from file
39 extensions). If you want to force files ending with ".c" to compile as
40 C++, use the ``-x`` option, just like you would do it with ``gcc``::
42 $ llvmc2 -x c hello.cpp
43 $ # hello.cpp is really a C file
47 On the other hand, when using LLVMC as a linker to combine several C++
48 object files you should provide the ``--linker`` option since it's
49 impossible for LLVMC to choose the right linker in that case::
53 [A lot of link-time errors skipped]
54 $ llvmc2 --linker=c++ hello.o
61 LLVMC has some built-in options that can't be overridden in the
64 * ``-o FILE`` - Output file name.
66 * ``-x LANGUAGE`` - Specify the language of the following input files
67 until the next -x option.
69 * ``-v`` - Enable verbose mode, i.e. print out all executed commands.
71 * ``--view-graph`` - Show a graphical representation of the compilation
72 graph. Requires that you have ``dot`` and ``gv`` commands
73 installed. Hidden option, useful for debugging.
75 * ``--write-graph`` - Write a ``compilation-graph.dot`` file in the
76 current directory with the compilation graph description in the
77 Graphviz format. Hidden option, useful for debugging.
80 Customizing LLVMC: the compilation graph
81 ========================================
83 At the time of writing LLVMC does not support on-the-fly reloading of
84 configuration, so to customize LLVMC you'll have to recompile the
85 source code (which lives under ``$LLVM_DIR/tools/llvmc2``). The
86 default configuration files are ``Common.td`` (contains common
87 definitions, don't forget to ``include`` it in your configuration
88 files), ``Tools.td`` (tool descriptions) and ``Graph.td`` (compilation
91 To compile LLVMC with your own configuration file (say,``MyGraph.td``),
92 run ``make`` like this::
94 $ cd $LLVM_DIR/tools/llvmc2
95 $ make GRAPH=MyGraph.td TOOLNAME=my_llvmc
97 This will build an executable named ``my_llvmc``. There are also
98 several sample configuration files in the ``llvmc2/examples``
99 subdirectory that should help to get you started.
101 Internally, LLVMC stores information about possible source
102 transformations in form of a graph. Nodes in this graph represent
103 tools, and edges between two nodes represent a transformation path. A
104 special "root" node is used to mark entry points for the
105 transformations. LLVMC also assigns a weight to each edge (more on
106 this later) to choose between several alternative edges.
108 The definition of the compilation graph (see file ``Graph.td``) is
109 just a list of edges::
111 def CompilationGraph : CompilationGraph<[
112 Edge<root, llvm_gcc_c>,
113 Edge<root, llvm_gcc_assembler>,
116 Edge<llvm_gcc_c, llc>,
117 Edge<llvm_gcc_cpp, llc>,
120 OptionalEdge<llvm_gcc_c, opt, [(switch_on "opt")]>,
121 OptionalEdge<llvm_gcc_cpp, opt, [(switch_on "opt")]>,
124 OptionalEdge<llvm_gcc_assembler, llvm_gcc_cpp_linker,
125 (case (input_languages_contain "c++"), (inc_weight),
126 (or (parameter_equals "linker", "g++"),
127 (parameter_equals "linker", "c++")), (inc_weight))>,
132 As you can see, the edges can be either default or optional, where
133 optional edges are differentiated by sporting a ``case`` expression
134 used to calculate the edge's weight.
136 The default edges are assigned a weight of 1, and optional edges get a
137 weight of 0 + 2*N where N is the number of tests that evaluated to
138 true in the ``case`` expression. It is also possible to provide an
139 integer parameter to ``inc_weight`` and ``dec_weight`` - in this case,
140 the weight is increased (or decreased) by the provided value instead
143 When passing an input file through the graph, LLVMC picks the edge
144 with the maximum weight. To avoid ambiguity, there should be only one
145 default edge between two nodes (with the exception of the root node,
146 which gets a special treatment - there you are allowed to specify one
147 default edge *per language*).
149 To get a visual representation of the compilation graph (useful for
150 debugging), run ``llvmc2 --view-graph``. You will need ``dot`` and
151 ``gsview`` installed for this to work properly.
154 Writing a tool description
155 ==========================
157 As was said earlier, nodes in the compilation graph represent tools,
158 which are described separately. A tool definition looks like this
159 (taken from the ``Tools.td`` file)::
161 def llvm_gcc_cpp : Tool<[
163 (out_language "llvm-assembler"),
164 (output_suffix "bc"),
165 (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"),
169 This defines a new tool called ``llvm_gcc_cpp``, which is an alias for
170 ``llvm-g++``. As you can see, a tool definition is just a list of
171 properties; most of them should be self-explanatory. The ``sink``
172 property means that this tool should be passed all command-line
173 options that lack explicit descriptions.
175 The complete list of the currently implemented tool properties follows:
177 * Possible tool properties:
179 - ``in_language`` - input language name.
181 - ``out_language`` - output language name.
183 - ``output_suffix`` - output file suffix.
185 - ``cmd_line`` - the actual command used to run the tool. You can
186 use ``$INFILE`` and ``$OUTFILE`` variables, output redirection
187 with ``>``, hook invocations (``$CALL``), environment variables
188 (via ``$ENV``) and the ``case`` construct (more on this below).
190 - ``join`` - this tool is a "join node" in the graph, i.e. it gets a
191 list of input files and joins them together. Used for linkers.
193 - ``sink`` - all command-line options that are not handled by other
194 tools are passed to this tool.
196 The next tool definition is slightly more complex::
198 def llvm_gcc_linker : Tool<[
199 (in_language "object-code"),
200 (out_language "executable"),
201 (output_suffix "out"),
202 (cmd_line "llvm-gcc $INFILE -o $OUTFILE"),
204 (prefix_list_option "L", (forward),
205 (help "add a directory to link path")),
206 (prefix_list_option "l", (forward),
207 (help "search a library when linking")),
208 (prefix_list_option "Wl", (unpack_values),
209 (help "pass options to linker"))
212 This tool has a "join" property, which means that it behaves like a
213 linker. This tool also defines several command-line options: ``-l``,
214 ``-L`` and ``-Wl`` which have their usual meaning. An option has two
215 attributes: a name and a (possibly empty) list of properties. All
216 currently implemented option types and properties are described below:
218 * Possible option types:
220 - ``switch_option`` - a simple boolean switch, for example ``-time``.
222 - ``parameter_option`` - option that takes an argument, for example
225 - ``parameter_list_option`` - same as the above, but more than one
226 occurence of the option is allowed.
228 - ``prefix_option`` - same as the parameter_option, but the option name
229 and parameter value are not separated.
231 - ``prefix_list_option`` - same as the above, but more than one
232 occurence of the option is allowed; example: ``-lm -lpthread``.
234 - ``alias_option`` - a special option type for creating
235 aliases. Unlike other option types, aliases are not allowed to
236 have any properties besides the aliased option name. Usage
237 example: ``(alias_option "preprocess", "E")``
240 * Possible option properties:
242 - ``append_cmd`` - append a string to the tool invocation command.
244 - ``forward`` - forward this option unchanged.
246 - ``output_suffix`` - modify the output suffix of this
247 tool. Example : ``(switch "E", (output_suffix "i")``.
249 - ``stop_compilation`` - stop compilation after this phase.
251 - ``unpack_values`` - used for for splitting and forwarding
252 comma-separated lists of options, e.g. ``-Wa,-foo=bar,-baz`` is
253 converted to ``-foo=bar -baz`` and appended to the tool invocation
256 - ``help`` - help string associated with this option. Used for
259 - ``required`` - this option is obligatory.
262 Option list - specifying all options in a single place
263 ======================================================
265 It can be handy to have all information about options gathered in a
266 single place to provide an overview. This can be achieved by using a
267 so-called ``OptionList``::
269 def Options : OptionList<[
270 (switch_option "E", (help "Help string")),
271 (alias_option "quiet", "q")
275 ``OptionList`` is also a good place to specify option aliases.
277 Tool-specific option properties like ``append_cmd`` have (obviously)
278 no meaning in the context of ``OptionList``, so the only properties
279 allowed there are ``help`` and ``required``.
281 Option lists are used at the file scope. See file
282 ``examples/Clang.td`` for an example of ``OptionList`` usage.
284 Using hooks and environment variables in the ``cmd_line`` property
285 ==================================================================
287 Normally, LLVMC executes programs from the system ``PATH``. Sometimes,
288 this is not sufficient: for example, we may want to specify tool names
289 in the configuration file. This can be achieved via the mechanism of
290 hooks - to compile LLVMC with your hooks, just drop a .cpp file into
291 ``tools/llvmc2`` directory. Hooks should live in the ``hooks``
292 namespace and have the signature ``std::string hooks::MyHookName
293 (void)``. They can be used from the ``cmd_line`` tool property::
295 (cmd_line "$CALL(MyHook)/path/to/file -o $CALL(AnotherHook)")
297 It is also possible to use environment variables in the same manner::
299 (cmd_line "$ENV(VAR1)/path/to/file -o $ENV(VAR2)")
301 To change the command line string based on user-provided options use
302 the ``case`` expression (documented below)::
307 "llvm-g++ -E -x c $INFILE -o $OUTFILE",
309 "llvm-g++ -c -x c $INFILE -o $OUTFILE -emit-llvm"))
311 Conditional evaluation: the ``case`` expression
312 ===============================================
314 The 'case' construct can be used to calculate weights of the optional
315 edges and to choose between several alternative command line strings
316 in the ``cmd_line`` tool property. It is designed after the
317 similarly-named construct in functional languages and takes the form
318 ``(case (test_1), statement_1, (test_2), statement_2, ... (test_N),
319 statement_N)``. The statements are evaluated only if the corresponding
320 tests evaluate to true.
324 // Increases edge weight by 5 if "-A" is provided on the
325 // command-line, and by 5 more if "-B" is also provided.
327 (switch_on "A"), (inc_weight 5),
328 (switch_on "B"), (inc_weight 5))
330 // Evaluates to "cmdline1" if option "-A" is provided on the
331 // command line, otherwise to "cmdline2"
333 (switch_on "A"), "cmdline1",
334 (switch_on "B"), "cmdline2",
335 (default), "cmdline3")
337 Note the slight difference in 'case' expression handling in contexts
338 of edge weights and command line specification - in the second example
339 the value of the ``"B"`` switch is never checked when switch ``"A"`` is
340 enabled, and the whole expression always evaluates to ``"cmdline1"`` in
343 Case expressions can also be nested, i.e. the following is legal::
345 (case (switch_on "E"), (case (switch_on "o"), ..., (default), ...)
348 You should, however, try to avoid doing that because it hurts
349 readability. It is usually better to split tool descriptions and/or
350 use TableGen inheritance instead.
352 * Possible tests are:
354 - ``switch_on`` - Returns true if a given command-line option is
355 provided by the user. Example: ``(switch_on "opt")``. Note that
356 you have to define all possible command-line options separately in
357 the tool descriptions. See the next section for the discussion of
358 different kinds of command-line options.
360 - ``parameter_equals`` - Returns true if a command-line parameter equals
361 a given value. Example: ``(parameter_equals "W", "all")``.
363 - ``element_in_list`` - Returns true if a command-line parameter list
364 includes a given value. Example: ``(parameter_in_list "l", "pthread")``.
366 - ``input_languages_contain`` - Returns true if a given language
367 belongs to the current input language set. Example:
368 ```(input_languages_contain "c++")``.
370 - ``in_language`` - Evaluates to true if the language of the input
371 file equals to the argument. Valid only when using ``case``
372 expression in a ``cmd_line`` tool property. Example:
373 ```(in_language "c++")``.
375 - ``not_empty`` - Returns true if a given option (which should be
376 either a parameter or a parameter list) is set by the
377 user. Example: ```(not_empty "o")``.
379 - ``default`` - Always evaluates to true. Should always be the last
380 test in the ``case`` expression.
382 - ``and`` - A standard logical combinator that returns true iff all
383 of its arguments return true. Used like this: ``(and (test1),
384 (test2), ... (testN))``. Nesting of ``and`` and ``or`` is allowed,
387 - ``or`` - Another logical combinator that returns true only if any
388 one of its arguments returns true. Example: ``(or (test1),
389 (test2), ... (testN))``.
395 One last thing that you will need to modify when adding support for a
396 new language to LLVMC is the language map, which defines mappings from
397 file extensions to language names. It is used to choose the proper
398 toolchain(s) for a given input file set. Language map definition is
399 located in the file ``Tools.td`` and looks like this::
401 def LanguageMap : LanguageMap<
402 [LangToSuffixes<"c++", ["cc", "cp", "cxx", "cpp", "CPP", "c++", "C"]>,
403 LangToSuffixes<"c", ["c"]>,
411 .. [1] TableGen Fundamentals
412 http://llvm.cs.uiuc.edu/docs/TableGenFundamentals.html