Customizing LLVMC: Reference Manual

- -

Note: This document is a work-in-progress. Additions and clarifications - are welcome.

+ +

Contents

Introduction
Compiling with llvmc
Predefined options
Compiling LLVMC-based drivers
Customizing LLVMC: the compilation graph
Describing options
Conditional evaluation
Writing a tool description
- Actions
+
Language map
Option preprocessor
More advanced topics
+

- +

Introduction

LLVMC is a generic compiler driver, designed to be customizable and -extensible. It plays the same role for LLVM as the gcc program -does for GCC - LLVMC's job is essentially to transform a set of input -files into a set of targets depending on configuration rules and user -options. What makes LLVMC different is that these transformation rules -are completely customizable - in fact, LLVMC knows nothing about the -specifics of transformation (even the command-line options are mostly -not hard-coded) and regards the transformation structure as an -abstract graph. This makes it possible to adapt LLVMC for other +extensible. It plays the same role for LLVM as the gcc program does for +GCC - LLVMC's job is essentially to transform a set of input files into a set of +targets depending on configuration rules and user options. What makes LLVMC +different is that these transformation rules are completely customizable - in +fact, LLVMC knows nothing about the specifics of transformation (even the +command-line options are mostly not hard-coded) and regards the transformation +structure as an abstract graph. The structure of this graph is described in +high-level TableGen code, from which an efficient C++ representation is +automatically derived. This makes it possible to adapt LLVMC for other purposes - for example, as a build tool for game resources.

Because LLVMC employs TableGen [1] as its configuration language, you +

Because LLVMC employs TableGen as its configuration language, you need to be familiar with it to customize LLVMC.

Compiling with LLVMC
Predefined options
Customizing LLVMC: the compilation graph
Writing a tool description
Option list - specifying all options in a single place
Using hooks and environment variables in the cmd_line property
Conditional evaluation: the case expression
Language map
References

- -

Written by Mikhail Glushenkov

- -

Compiling with LLVMC

LLVMC tries hard to be as compatible with gcc as possible, +

Compiling with `llvmc`

LLVMC tries hard to be as compatible with gcc as possible, although there are some small differences. Most of the time, however, you shouldn't be able to notice them:

 $ # This works as expected:
-$ llvmc2 -O3 -Wall hello.cpp
+$ llvmc -O3 -Wall hello.cpp
 $ ./a.out
 hello

One nice feature of LLVMC is that one doesn't have to distinguish -between different compilers for different languages (think g++ and -gcc) - the right toolchain is chosen automatically based on input -language names (which are, in turn, determined from file -extensions). If you want to force files ending with ".c" to compile as -C++, use the -x option, just like you would do it with gcc:

One nice feature of LLVMC is that one doesn't have to distinguish between +different compilers for different languages (think g++ vs. gcc) - the +right toolchain is chosen automatically based on input language names (which +are, in turn, determined from file extensions). If you want to force files +ending with ".c" to compile as C++, use the -x option, just like you would +do it with gcc:

-$ llvmc2 -x c hello.cpp
-$ # hello.cpp is really a C file
+$ # hello.c is really a C++ file
+$ llvmc -x c++ hello.c
 $ ./a.out
 hello

@@ -72,76 +83,125 @@ hello object files you should provide the --linker option since it's impossible for LLVMC to choose the right linker in that case:

-$ llvmc2 -c hello.cpp
-$ llvmc2 hello.o
+$ llvmc -c hello.cpp
+$ llvmc hello.o
 [A lot of link-time errors skipped]
-$ llvmc2 --linker=c++ hello.o
+$ llvmc --linker=c++ hello.o
 $ ./a.out
 hello

By default, LLVMC uses llvm-gcc to compile the source code. It is also +possible to choose the clang compiler with the -clang option.

Predefined options

LLVMC has some built-in options that can't be overridden in the -configuration files:

Predefined options

LLVMC has some built-in options that can't be overridden in the TableGen code:

-o FILE - Output file name.
-x LANGUAGE - Specify the language of the following input files +
-o FILE - Output file name.
-x LANGUAGE - Specify the language of the following input files until the next -x option.
-v - Enable verbose mode, i.e. print out all executed commands.
--view-graph - Show a graphical representation of the compilation -graph. Requires that you have dot and gv commands -installed. Hidden option, useful for debugging.
--write-graph - Write a compilation-graph.dot file in the -current directory with the compilation graph description in the -Graphviz format. Hidden option, useful for debugging.
--save-temps - Write temporary files to the current directory -and do not delete them on exit. Hidden option, useful for debugging.
--save-temps - Write temporary files to the current directory and do not +delete them on exit. This option can also take an argument: the +--save-temps=obj switch will write files into the directory specified with +the -o option. The --save-temps=cwd and --save-temps switches are +both synonyms for the default behaviour.
--temp-dir DIRECTORY - Store temporary files in the given directory. This +directory is deleted on exit unless --save-temps is specified. If +--save-temps=obj is also specified, --temp-dir is given the +precedence.
--check-graph - Check the compilation for common errors like mismatched +output/input language names, multiple default edges and cycles. Exit with code +zero if no errors were found, and return the number of found errors +otherwise. Hidden option, useful for debugging.
--view-graph - Show a graphical representation of the compilation graph +and exit. Requires that you have dot and gv programs installed. Hidden +option, useful for debugging.
--write-graph - Write a compilation-graph.dot file in the current +directory with the compilation graph description in Graphviz format (identical +to the file used by the --view-graph option). The -o option can be +used to set the output file name. Hidden option, useful for debugging.
--help, --help-hidden, --version - These options have their standard meaning.

Customizing LLVMC: the compilation graph

At the time of writing LLVMC does not support on-the-fly reloading of -configuration, so to customize LLVMC you'll have to recompile the -source code (which lives under $LLVM_DIR/tools/llvmc2). The -default configuration files are Common.td (contains common -definitions, don't forget to include it in your configuration -files), Tools.td (tool descriptions) and Graph.td (compilation -graph definition).

To compile LLVMC with your own configuration file (say,``MyGraph.td``), -run make like this:

Compiling LLVMC-based drivers

It's easiest to start working on your own LLVMC driver by copying the skeleton +project which lives under $LLVMC_DIR/examples/Skeleton:

+$ cd $LLVMC_DIR/examples
+$ cp -r Skeleton MyDriver
+$ cd MyDriver
+$ ls
+AutoGenerated.td  Hooks.cpp  Main.cpp  Makefile
+

As you can see, our basic driver consists of only three files (not counting the +build script). AutoGenerated.td contains TableGen description of the +compilation graph; its format is documented in the following +sections. Hooks.cpp is an empty file that should be used for hook +definitions (see below). Main.cpp is just a helper used to compile the +auto-generated C++ code produced from TableGen source.

The first thing that you should do is to change the LLVMC_BASED_DRIVER +variable in the Makefile:

-$ cd $LLVM_DIR/tools/llvmc2
-$ make GRAPH=MyGraph.td TOOLNAME=my_llvmc
+LLVMC_BASED_DRIVER=MyDriver

This will build an executable named my_llvmc. There are also -several sample configuration files in the llvmc2/examples -subdirectory that should help to get you started.

Internally, LLVMC stores information about possible source -transformations in form of a graph. Nodes in this graph represent -tools, and edges between two nodes represent a transformation path. A -special "root" node is used to mark entry points for the -transformations. LLVMC also assigns a weight to each edge (more on -this later) to choose between several alternative edges.

The definition of the compilation graph (see file Graph.td) is -just a list of edges:

It can also be a good idea to put your TableGen code into a file with a less +generic name:

+$ touch MyDriver.td
+$ vim AutoGenerated.td
+[...]
+include "MyDriver.td"
+

If you have more than one TableGen source file, they all should be included from +AutoGenerated.td, since this file is used by the build system to generate +C++ code.

To build your driver, just cd to its source directory and run make. The +resulting executable will be put into $LLVM_OBJ_DIR/$(BuildMode)/bin.

If you're compiling LLVM with different source and object directories, then you +must perform the following additional steps before running make:

+# LLVMC_SRC_DIR = $LLVM_SRC_DIR/tools/llvmc/
+# LLVMC_OBJ_DIR = $LLVM_OBJ_DIR/tools/llvmc/
+$ mkdir $LLVMC_OBJ_DIR/examples/MyDriver/
+$ cp $LLVMC_SRC_DIR/examples/MyDriver/Makefile \
+  $LLVMC_OBJ_DIR/examples/MyDriver/
+$ cd $LLVMC_OBJ_DIR/examples/MyDriver
+$ make
+

Customizing LLVMC: the compilation graph

Each TableGen configuration file should include the common definitions:

+include "llvm/CompilerDriver/Common.td"
+

Internally, LLVMC stores information about possible source transformations in +form of a graph. Nodes in this graph represent tools, and edges between two +nodes represent a transformation path. A special "root" node is used to mark +entry points for the transformations. LLVMC also assigns a weight to each edge +(more on this later) to choose between several alternative edges.

The definition of the compilation graph (see file llvmc/src/Base.td for an +example) is just a list of edges:

 def CompilationGraph : CompilationGraph<[
-    Edge<root, llvm_gcc_c>,
-    Edge<root, llvm_gcc_assembler>,
+    Edge<"root", "llvm_gcc_c">,
+    Edge<"root", "llvm_gcc_assembler">,
     ...
 
-    Edge<llvm_gcc_c, llc>,
-    Edge<llvm_gcc_cpp, llc>,
+    Edge<"llvm_gcc_c", "llc">,
+    Edge<"llvm_gcc_cpp", "llc">,
     ...
 
-    OptionalEdge<llvm_gcc_c, opt, [(switch_on "opt")]>,
-    OptionalEdge<llvm_gcc_cpp, opt, [(switch_on "opt")]>,
+    OptionalEdge<"llvm_gcc_c", "opt", (case (switch_on "opt"),
+                                      (inc_weight))>,
+    OptionalEdge<"llvm_gcc_cpp", "opt", (case (switch_on "opt"),
+                                              (inc_weight))>,
     ...
 
-    OptionalEdge<llvm_gcc_assembler, llvm_gcc_cpp_linker,
+    OptionalEdge<"llvm_gcc_assembler", "llvm_gcc_cpp_linker",
         (case (input_languages_contain "c++"), (inc_weight),
               (or (parameter_equals "linker", "g++"),
                   (parameter_equals "linker", "c++")), (inc_weight))>,
@@ -149,245 +209,343 @@ def CompilationGraph : CompilationGraph<[
 
     ]>;

As you can see, the edges can be either default or optional, where -optional edges are differentiated by sporting a case expression -used to calculate the edge's weight.

The default edges are assigned a weight of 1, and optional edges get a -weight of 0 + 2*N where N is the number of tests that evaluated to -true in the case expression. It is also possible to provide an -integer parameter to inc_weight and dec_weight - in this case, -the weight is increased (or decreased) by the provided value instead -of the default 2.

When passing an input file through the graph, LLVMC picks the edge -with the maximum weight. To avoid ambiguity, there should be only one -default edge between two nodes (with the exception of the root node, -which gets a special treatment - there you are allowed to specify one -default edge per language).

To get a visual representation of the compilation graph (useful for -debugging), run llvmc2 --view-graph. You will need dot and -gsview installed for this to work properly.

As you can see, the edges can be either default or optional, where optional +edges are differentiated by an additional case expression used to calculate +the weight of this edge. Notice also that we refer to tools via their names (as +strings). This makes it possible to add edges to an existing compilation graph +without having to know about all tool definitions used in the graph.

The default edges are assigned a weight of 1, and optional edges get a weight of +0 + 2*N where N is the number of tests that evaluated to true in the case +expression. It is also possible to provide an integer parameter to +inc_weight and dec_weight - in this case, the weight is increased (or +decreased) by the provided value instead of the default 2. Default weight of an +optional edge can be changed by using the default clause of the case +construct.

When passing an input file through the graph, LLVMC picks the edge with the +maximum weight. To avoid ambiguity, there should be only one default edge +between two nodes (with the exception of the root node, which gets a special +treatment - there you are allowed to specify one default edge per language).

When multiple compilation graphs are defined, they are merged together. Multiple +edges with the same end nodes are not allowed (i.e. the graph is not a +multigraph), and will lead to a compile-time error.

To get a visual representation of the compilation graph (useful for debugging), +run llvmc --view-graph. You will need dot and gsview installed for +this to work properly.

Writing a tool description

As was said earlier, nodes in the compilation graph represent tools, -which are described separately. A tool definition looks like this -(taken from the Tools.td file):

Describing options

Command-line options supported by the driver are defined by using an +OptionList:

-def llvm_gcc_cpp : Tool<[
-    (in_language "c++"),
-    (out_language "llvm-assembler"),
-    (output_suffix "bc"),
-    (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"),
-    (sink)
-    ]>;
-

This defines a new tool called llvm_gcc_cpp, which is an alias for -llvm-g++. As you can see, a tool definition is just a list of -properties; most of them should be self-explanatory. The sink -property means that this tool should be passed all command-line -options that lack explicit descriptions.

The complete list of the currently implemented tool properties follows:

Possible tool properties:
- in_language - input language name.
- out_language - output language name.
- output_suffix - output file suffix.
- cmd_line - the actual command used to run the tool. You can -use $INFILE and $OUTFILE variables, output redirection -with >, hook invocations ($CALL), environment variables -(via $ENV) and the case construct (more on this below).
- join - this tool is a "join node" in the graph, i.e. it gets a -list of input files and joins them together. Used for linkers.
- sink - all command-line options that are not handled by other -tools are passed to this tool.
-

The next tool definition is slightly more complex:

-def llvm_gcc_linker : Tool<[
-    (in_language "object-code"),
-    (out_language "executable"),
-    (output_suffix "out"),
-    (cmd_line "llvm-gcc $INFILE -o $OUTFILE"),
-    (join),
-    (prefix_list_option "L", (forward),
-                        (help "add a directory to link path")),
-    (prefix_list_option "l", (forward),
-                        (help "search a library when linking")),
-    (prefix_list_option "Wl", (unpack_values),
-                        (help "pass options to linker"))
-    ]>;
+def Options : OptionList<[
+(switch_option "E", (help "Help string")),
+(alias_option "quiet", "q")
+...
+]>;

This tool has a "join" property, which means that it behaves like a -linker. This tool also defines several command-line options: -l, --L and -Wl which have their usual meaning. An option has two -attributes: a name and a (possibly empty) list of properties. All -currently implemented option types and properties are described below:

As you can see, the option list is just a list of DAGs, where each DAG is an +option description consisting of the option name and some properties. More than +one option list can be defined (they are all merged together in the end), which +can be handy if one wants to separate option groups syntactically.

Possible option types:
- switch_option - a simple boolean switch, for example -time.
- parameter_option - option that takes an argument, for example --std=c99;
- parameter_list_option - same as the above, but more than one -occurence of the option is allowed.
- prefix_option - same as the parameter_option, but the option name -and parameter value are not separated.
- prefix_list_option - same as the above, but more than one -occurence of the option is allowed; example: -lm -lpthread.
- alias_option - a special option type for creating -aliases. Unlike other option types, aliases are not allowed to -have any properties besides the aliased option name. Usage -example: (alias_option "preprocess", "E")
- switch_option - a simple boolean switch without arguments, for example +-O2 or -time. At most one occurrence is allowed by default.
- parameter_option - option that takes one argument, for example +-std=c99. It is also allowed to use spaces instead of the equality +sign: -std c99. At most one occurrence is allowed.
- parameter_list_option - same as the above, but more than one option +occurrence is allowed.
- prefix_option - same as the parameter_option, but the option name and +argument do not have to be separated. Example: -ofile. This can be also +specified as -o file; however, -o=file will be parsed incorrectly +(=file will be interpreted as option value). At most one occurrence is +allowed.
- prefix_list_option - same as the above, but more than one occurrence of +the option is allowed; example: -lm -lpthread.
- alias_option - a special option type for creating aliases. Unlike other +option types, aliases are not allowed to have any properties besides the +aliased option name. +Usage example: (alias_option "preprocess", "E")
- switch_list_option - like switch_option with the zero_or_more +property, but remembers how many times the switch was turned on. Useful +mostly for forwarding. Example: when -foo is a switch option (with the +zero_or_more property), the command driver -foo -foo is forwarded +as some-tool -foo, but when -foo is a switch list, the same command +is forwarded as some-tool -foo -foo.
Possible option properties:
- append_cmd - append a string to the tool invocation command.
- forward - forward this option unchanged.
- output_suffix - modify the output suffix of this -tool. Example : (switch "E", (output_suffix "i").
- stop_compilation - stop compilation after this phase.
- unpack_values - used for for splitting and forwarding -comma-separated lists of options, e.g. -Wa,-foo=bar,-baz is -converted to -foo=bar -baz and appended to the tool invocation -command.
- help - help string associated with this option. Used for ---help output.
- required - this option is obligatory.
- help - help string associated with this option. Used for --help +output.
- required - this option must be specified exactly once (or, in case of +the list options without the multi_val property, at least +once). Incompatible with optional and one_or_more.
- optional - the option can be specified either zero times or exactly +once. The default for switch options. Useful only for list options in +conjunction with multi_val. Incompatible with required, +zero_or_more and one_or_more.
- one_or_more - the option must be specified at least once. Can be useful +to allow switch options be both obligatory and be specified multiple +times. For list options is useful only in conjunction with multi_val; +for ordinary it is synonymous with required. Incompatible with +required, optional and zero_or_more.
- zero_or_more - the option can be specified zero or more times. Useful +to allow a single switch option to be specified more than +once. Incompatible with required, optional and one_or_more.
- hidden - the description of this option will not appear in +the --help output (but will appear in the --help-hidden +output).
- really_hidden - the option will not be mentioned in any help +output.
- comma_separated - Indicates that any commas specified for an option's +value should be used to split the value up into multiple values for the +option. This property is valid only for list options. In conjunction with +forward_value can be used to implement option forwarding in style of +gcc's -Wa,.
- multi_val n - this option takes n arguments (can be useful in some +special cases). Usage example: (parameter_list_option "foo", (multi_val +3)); the command-line syntax is '-foo a b c'. Only list options can have +this attribute; you can, however, use the one_or_more, optional +and required properties.
- init - this option has a default value, either a string (if it is a +parameter), or a boolean (if it is a switch; as in C++, boolean constants +are called true and false). List options can't have init +attribute. +Usage examples: (switch_option "foo", (init true)); (prefix_option +"bar", (init "baz")).

Option list - specifying all options in a single place

It can be handy to have all information about options gathered in a -single place to provide an overview. This can be achieved by using a -so-called OptionList:

-def Options : OptionList<[
-(switch_option "E", (help "Help string")),
-(alias_option "quiet", "q")
-...
-]>;
-

OptionList is also a good place to specify option aliases.

Tool-specific option properties like append_cmd have (obviously) -no meaning in the context of OptionList, so the only properties -allowed there are help and required.

Option lists are used at the file scope. See file -examples/Clang.td for an example of OptionList usage.

Using hooks and environment variables in the cmd_line property

Normally, LLVMC executes programs from the system PATH. Sometimes, -this is not sufficient: for example, we may want to specify tool names -in the configuration file. This can be achieved via the mechanism of -hooks - to compile LLVMC with your hooks, just drop a .cpp file into -tools/llvmc2 directory. Hooks should live in the hooks -namespace and have the signature std::string hooks::MyHookName -(void). They can be used from the cmd_line tool property:

-(cmd_line "$CALL(MyHook)/path/to/file -o $CALL(AnotherHook)")
-

It is also possible to use environment variables in the same manner:

-(cmd_line "$ENV(VAR1)/path/to/file -o $ENV(VAR2)")
-

To change the command line string based on user-provided options use -the case expression (documented below):

-(cmd_line
-  (case
-    (switch_on "E"),
-       "llvm-g++ -E -x c $INFILE -o $OUTFILE",
-    (default),
-       "llvm-g++ -c -x c $INFILE -o $OUTFILE -emit-llvm"))
-

Conditional evaluation: the case expression

The 'case' construct can be used to calculate weights of the optional -edges and to choose between several alternative command line strings -in the cmd_line tool property. It is designed after the -similarly-named construct in functional languages and takes the form -(case (test_1), statement_1, (test_2), statement_2, ... (test_N), -statement_N). The statements are evaluated only if the corresponding -tests evaluate to true.

Conditional evaluation

The 'case' construct is the main means by which programmability is achieved in +LLVMC. It can be used to calculate edge weights, program actions and modify the +shell commands to be executed. The 'case' expression is designed after the +similarly-named construct in functional languages and takes the form (case +(test_1), statement_1, (test_2), statement_2, ... (test_N), statement_N). The +statements are evaluated only if the corresponding tests evaluate to true.

Examples:

+// Edge weight calculation
+
 // Increases edge weight by 5 if "-A" is provided on the
 // command-line, and by 5 more if "-B" is also provided.
 (case
     (switch_on "A"), (inc_weight 5),
     (switch_on "B"), (inc_weight 5))
 
-// Evaluates to "cmdline1" if option "-A" is provided on the
-// command line, otherwise to "cmdline2"
+
+// Tool command line specification
+
+// Evaluates to "cmdline1" if the option "-A" is provided on the
+// command line; to "cmdline2" if "-B" is provided;
+// otherwise to "cmdline3".
+
 (case
     (switch_on "A"), "cmdline1",
     (switch_on "B"), "cmdline2",
     (default), "cmdline3")

Note the slight difference in 'case' expression handling in contexts -of edge weights and command line specification - in the second example -the value of the "B" switch is never checked when switch "A" is -enabled, and the whole expression always evaluates to "cmdline1" in -that case.

Note the slight difference in 'case' expression handling in contexts of edge +weights and command line specification - in the second example the value of the +"B" switch is never checked when switch "A" is enabled, and the whole +expression always evaluates to "cmdline1" in that case.

Case expressions can also be nested, i.e. the following is legal:

 (case (switch_on "E"), (case (switch_on "o"), ..., (default), ...)
       (default), ...)

You should, however, try to avoid doing that because it hurts -readability. It is usually better to split tool descriptions and/or -use TableGen inheritance instead.

You should, however, try to avoid doing that because it hurts readability. It is +usually better to split tool descriptions and/or use TableGen inheritance +instead.

Possible tests are:
- switch_on - Returns true if a given command-line option is -provided by the user. Example: (switch_on "opt"). Note that -you have to define all possible command-line options separately in -the tool descriptions. See the next doc_text for the discussion of -different kinds of command-line options.
- parameter_equals - Returns true if a command-line parameter equals -a given value. Example: (parameter_equals "W", "all").
- element_in_list - Returns true if a command-line parameter list -includes a given value. Example: (parameter_in_list "l", "pthread").
- input_languages_contain - Returns true if a given language -belongs to the current input language set. Example: -`(input_languages_contain "c++").
- in_language - Evaluates to true if the language of the input -file equals to the argument. Valid only when using case -expression in a cmd_line tool property. Example: -`(in_language "c++").
- not_empty - Returns true if a given option (which should be -either a parameter or a parameter list) is set by the -user. Example: `(not_empty "o").
- default - Always evaluates to true. Should always be the last -test in the case expression.
- and - A standard logical combinator that returns true iff all -of its arguments return true. Used like this: (and (test1), -(test2), ... (testN)). Nesting of and and or is allowed, -but not encouraged.
- or - Another logical combinator that returns true only if any -one of its arguments returns true. Example: (or (test1), -(test2), ... (testN)).
- switch_on - Returns true if a given command-line switch is provided by +the user. Can be given multiple arguments, in that case (switch_on "foo", +"bar", "baz") is equivalent to (and (switch_on "foo"), (switch_on +"bar"), (switch_on "baz")). +Example: (switch_on "opt").
- any_switch_on - Given a number of switch options, returns true if any of +the switches is turned on. +Example: (any_switch_on "foo", "bar", "baz") is equivalent to (or +(switch_on "foo"), (switch_on "bar"), (switch_on "baz")).
- parameter_equals - Returns true if a command-line parameter (first +argument) equals a given value (second argument). +Example: (parameter_equals "W", "all").
- element_in_list - Returns true if a command-line parameter list (first +argument) contains a given value (second argument). +Example: (element_in_list "l", "pthread").
- input_languages_contain - Returns true if a given language +belongs to the current input language set. +Example: (input_languages_contain "c++").
- in_language - Evaluates to true if the input file language is equal to +the argument. At the moment works only with command and actions (on +non-join nodes). +Example: (in_language "c++").
- not_empty - Returns true if a given option (which should be either a +parameter or a parameter list) is set by the user. Like switch_on, can +be also given multiple arguments. +Examples: (not_empty "o"), (not_empty "o", "l").
- any_not_empty - Returns true if not_empty returns true for any of +the provided options. +Example: (any_not_empty "foo", "bar", "baz") is equivalent to (or +(not_empty "foo"), (not_empty "bar"), (not_empty "baz")).
- empty - The opposite of not_empty. Equivalent to (not (not_empty +X)). Can be given multiple arguments.
- any_not_empty - Returns true if not_empty returns true for any of +the provided options. +Example: (any_empty "foo", "bar", "baz") is equivalent to (or +(not_empty "foo"), (not_empty "bar"), (not_empty "baz")).
- single_input_file - Returns true if there was only one input file +provided on the command-line. Used without arguments: +(single_input_file).
- multiple_input_files - Equivalent to (not (single_input_file)) (the +case of zero input files is considered an error).
- default - Always evaluates to true. Should always be the last +test in the case expression.
- and - A standard logical combinator that returns true iff all of +its arguments return true. Used like this: (and (test1), (test2), +... (testN)). Nesting of and and or is allowed, but not +encouraged.
- or - A logical combinator that returns true iff any of its arguments +return true. +Example: (or (test1), (test2), ... (testN)).
- not - Standard unary logical combinator that negates its +argument. +Example: (not (or (test1), (test2), ... (testN))).

Language map

One last thing that you will need to modify when adding support for a -new language to LLVMC is the language map, which defines mappings from -file extensions to language names. It is used to choose the proper -toolchain(s) for a given input file set. Language map definition is -located in the file Tools.td and looks like this:

Writing a tool description

As was said earlier, nodes in the compilation graph represent tools, which are +described separately. A tool definition looks like this (taken from the +llvmc/src/Base.td file):

+def llvm_gcc_cpp : Tool<[
+    (in_language "c++"),
+    (out_language "llvm-assembler"),
+    (output_suffix "bc"),
+    (command "llvm-g++ -c -emit-llvm"),
+    (sink)
+    ]>;
+

This defines a new tool called llvm_gcc_cpp, which is an alias for +llvm-g++. As you can see, a tool definition is just a list of properties; +most of them should be self-explanatory. The sink property means that this +tool should be passed all command-line options that aren't mentioned in the +option list.

The complete list of all currently implemented tool properties follows.

Possible tool properties:
- in_language - input language name. Can be given multiple arguments, in +case the tool supports multiple input languages. Used for typechecking and +mapping file extensions to tools.
- out_language - output language name. Multiple output languages are +allowed. Used for typechecking the compilation graph.
- output_suffix - output file suffix. Can also be changed dynamically, see +documentation on actions.
+

+
+
command - the actual command used to run the tool. You can use output +redirection with >, hook invocations ($CALL), environment variables +(via $ENV) and the case construct.
+
join - this tool is a "join node" in the graph, i.e. it gets a list of +input files and joins them together. Used for linkers.
+
sink - all command-line options that are not handled by other tools are +passed to this tool.
+
actions - A single big case expression that specifies how this tool +reacts on command-line options (described in more detail below).
+
+

+
+
out_file_option, in_file_option - Options appended to the +command string to designate output and input files. Default values are +"-o" and "", respectively.
+
+

Actions

A tool often needs to react to command-line options, and this is precisely what +the actions property is for. The next example illustrates this feature:

+def llvm_gcc_linker : Tool<[
+    (in_language "object-code"),
+    (out_language "executable"),
+    (output_suffix "out"),
+    (command "llvm-gcc"),
+    (join),
+    (actions (case (not_empty "L"), (forward "L"),
+                   (not_empty "l"), (forward "l"),
+                   (not_empty "dummy"),
+                             [(append_cmd "-dummy1"), (append_cmd "-dummy2")])
+    ]>;
+

The actions tool property is implemented on top of the omnipresent case +expression. It associates one or more different actions with given +conditions - in the example, the actions are forward, which forwards a given +option unchanged, and append_cmd, which appends a given string to the tool +execution command. Multiple actions can be associated with a single condition by +using a list of actions (used in the example to append some dummy options). The +same case construct can also be used in the cmd_line property to modify +the tool command line.

The "join" property used in the example means that this tool behaves like a +linker.

The list of all possible actions follows.

Possible actions:
+
+
- append_cmd - Append a string to the tool invocation command. +Example: (case (switch_on "pthread"), (append_cmd "-lpthread")).
- error - Exit with error. +Example: (error "Mixing -c and -S is not allowed!").
- warning - Print a warning. +Example: (warning "Specifying both -O1 and -O2 is meaningless!").
- forward - Forward the option unchanged. +Example: (forward "Wall").
- forward_as - Change the option's name, but forward the argument +unchanged. +Example: (forward_as "O0", "--disable-optimization").
- forward_value - Forward only option's value. Cannot be used with switch +options (since they don't have values), but works fine with lists. +Example: (forward_value "Wa,").
- forward_transformed_value - As above, but applies a hook to the +option's value before forwarding (see below). When +forward_transformed_value is applied to a list +option, the hook must have signature +std::string hooks::HookName (const std::vector<std::string>&). +Example: (forward_transformed_value "m", "ConvertToMAttr").
- output_suffix - Modify the output suffix of this tool. +Example: (output_suffix "i").
- stop_compilation - Stop compilation after this tool processes its +input. Used without arguments. +Example: (stop_compilation).
+
+

Language map

If you are adding support for a new language to LLVMC, you'll need to modify the +language map, which defines mappings from file extensions to language names. It +is used to choose the proper toolchain(s) for a given input file set. Language +map definition looks like this:

 def LanguageMap : LanguageMap<
     [LangToSuffixes<"c++", ["cc", "cp", "cxx", "cpp", "CPP", "c++", "C"]>,
@@ -395,26 +553,135 @@ def LanguageMap : LanguageMap<
      ...
     ]>;

For example, without those definitions the following command wouldn't work:

+$ llvmc hello.cpp
+llvmc: Unknown suffix: cpp
+

The language map entries are needed only for the tools that are linked from the +root node. A tool can have multiple output languages.

Option preprocessor

It is sometimes useful to run error-checking code before processing the +compilation graph. For example, if optimization options "-O1" and "-O2" are +implemented as switches, we might want to output a warning if the user invokes +the driver with both of these options enabled.

The OptionPreprocessor feature is reserved specially for these +occasions. Example (adapted from llvm/src/Base.td.in):

+def Preprocess : OptionPreprocessor<
+(case (not (any_switch_on "O0", "O1", "O2", "O3")),
+           (set_option "O2"),
+      (and (switch_on "O3"), (any_switch_on "O0", "O1", "O2")),
+           (unset_option "O0", "O1", "O2"),
+      (and (switch_on "O2"), (any_switch_on "O0", "O1")),
+           (unset_option "O0", "O1"),
+      (and (switch_on "O1"), (switch_on "O0")),
+           (unset_option "O0"))
+>;
+

Here, OptionPreprocessor is used to unset all spurious -O options so +that they are not forwarded to the compiler. If no optimization options are +specified, -O2 is enabled.

OptionPreprocessor is basically a single big case expression, which is +evaluated only once right after the driver is started. The only allowed actions +in OptionPreprocessor are error, warning, and two special actions: +unset_option and set_option. As their names suggest, they can be used to +set or unset a given option. To set an option with set_option, use the +two-argument form: (set_option "parameter", VALUE). Here, VALUE can be +either a string, a string list, or a boolean constant.

For convenience, set_option and unset_option also work with multiple +arguments. That is, instead of [(unset_option "A"), (unset_option "B")] you +can use (unset_option "A", "B"). Obviously, (set_option "A", "B") is +only valid if both A and B are switches.

References

- - - - - -