Customizing LLVMC: Reference Manual

+ + +

Contents

Introduction
Compiling with LLVMC
Predefined options
Compiling LLVMC plugins
Compiling standalone LLVMC-based drivers
Customizing LLVMC: the compilation graph
Describing options
- External options
+
Conditional evaluation
Writing a tool description
- Actions
+
Language map
Option preprocessor
More advanced topics
+

Introduction

LLVMC is a generic compiler driver, designed to be customizable and +extensible. It plays the same role for LLVM as the gcc program +does for GCC - LLVMC's job is essentially to transform a set of input +files into a set of targets depending on configuration rules and user +options. What makes LLVMC different is that these transformation rules +are completely customizable - in fact, LLVMC knows nothing about the +specifics of transformation (even the command-line options are mostly +not hard-coded) and regards the transformation structure as an +abstract graph. The structure of this graph is completely determined +by plugins, which can be either statically or dynamically linked. This +makes it possible to easily adapt LLVMC for other purposes - for +example, as a build tool for game resources.

Because LLVMC employs TableGen as its configuration language, you +need to be familiar with it to customize LLVMC.

- - -

Abstract

- -

This document describes the requirements, design, and configuration of the - LLVM compiler driver, llvmc. The compiler driver knows about LLVM's - tool set and can be configured to know about a variety of compilers for - source languages. It uses this knowledge to execute the tools necessary - to accomplish general compilation, optimization, and linking tasks. The main - purpose of llvmc is to provide a simple and consistent interface to - all compilation tasks. This reduces the burden on the end user who can just - learn to use llvmc instead of the entire LLVM tool set and all the - source language compilers compatible with LLVM.

Compiling with LLVMC

LLVMC tries hard to be as compatible with gcc as possible, +although there are some small differences. Most of the time, however, +you shouldn't be able to notice them:

+$ # This works as expected:
+$ llvmc -O3 -Wall hello.cpp
+$ ./a.out
+hello
+

One nice feature of LLVMC is that one doesn't have to distinguish between +different compilers for different languages (think g++ vs. gcc) - the +right toolchain is chosen automatically based on input language names (which +are, in turn, determined from file extensions). If you want to force files +ending with ".c" to compile as C++, use the -x option, just like you would +do it with gcc:

+$ # hello.c is really a C++ file
+$ llvmc -x c++ hello.c
+$ ./a.out
+hello
+

On the other hand, when using LLVMC as a linker to combine several C++ +object files you should provide the --linker option since it's +impossible for LLVMC to choose the right linker in that case:

+$ llvmc -c hello.cpp
+$ llvmc hello.o
+[A lot of link-time errors skipped]
+$ llvmc --linker=c++ hello.o
+$ ./a.out
+hello
+

By default, LLVMC uses llvm-gcc to compile the source code. It is also +possible to choose the clang compiler with the -clang option.

- -

Introduction

- -

The llvmc tool is a configurable compiler - driver. As such, it isn't a compiler, optimizer, - or a linker itself but it drives (invokes) other software that perform those - tasks. If you are familiar with the GNU Compiler Collection's gcc - tool, llvmc is very similar.

The following introductory sections will help you understand why this tool - is necessary and what it does.

Predefined options

LLVMC has some built-in options that can't be overridden in the +configuration libraries:

-o FILE - Output file name.
-x LANGUAGE - Specify the language of the following input files +until the next -x option.
-load PLUGIN_NAME - Load the specified plugin DLL. Example: +-load $LLVM_DIR/Release/lib/LLVMCSimple.so.
-v - Enable verbose mode, i.e. print out all executed commands.
--save-temps - Write temporary files to the current directory and do not +delete them on exit. This option can also take an argument: the +--save-temps=obj switch will write files into the directory specified with +the -o option. The --save-temps=cwd and --save-temps switches are +both synonyms for the default behaviour.
--temp-dir DIRECTORY - Store temporary files in the given directory. This +directory is deleted on exit unless --save-temps is specified. If +--save-temps=obj is also specified, --temp-dir is given the +precedence.
--check-graph - Check the compilation for common errors like mismatched +output/input language names, multiple default edges and cycles. Because of +plugins, these checks can't be performed at compile-time. Exit with code zero +if no errors were found, and return the number of found errors +otherwise. Hidden option, useful for debugging LLVMC plugins.
--view-graph - Show a graphical representation of the compilation graph +and exit. Requires that you have dot and gv programs installed. Hidden +option, useful for debugging LLVMC plugins.
--write-graph - Write a compilation-graph.dot file in the current +directory with the compilation graph description in Graphviz format (identical +to the file used by the --view-graph option). The -o option can be +used to set the output file name. Hidden option, useful for debugging LLVMC +plugins.
-help, -help-hidden, --version - These options have +their standard meaning.

- - -

Purpose

llvmc was invented to make compilation of user programs with - LLVM-based tools easier. To accomplish this, llvmc strives to:

Be the single point of access to most of the LLVM tool set.
Hide the complexities of the LLVM tools through a single interface.
Provide a consistent interface for compiling all languages.

Additionally, llvmc makes it easier to write a compiler for use - with LLVM, because it:

Makes integration of existing non-LLVM tools simple.
Extends the capabilities of minimal compiler tools by optimizing their - output.
Reduces the number of interfaces a compiler writer must know about - before a working compiler can be completed (essentially only the VMCore - interfaces need to be understood).
Supports source language translator invocation via both dynamically - loadable shared objects and invocation of an executable.

Compiling LLVMC plugins

It's easiest to start working on your own LLVMC plugin by copying the +skeleton project which lives under $LLVMC_DIR/plugins/Simple:

+$ cd $LLVMC_DIR/plugins
+$ cp -r Simple MyPlugin
+$ cd MyPlugin
+$ ls
+Makefile PluginMain.cpp Simple.td
+

As you can see, our basic plugin consists of only two files (not +counting the build script). Simple.td contains TableGen +description of the compilation graph; its format is documented in the +following sections. PluginMain.cpp is just a helper file used to +compile the auto-generated C++ code produced from TableGen source. It +can also contain hook definitions (see below).

The first thing that you should do is to change the LLVMC_PLUGIN +variable in the Makefile to avoid conflicts (since this variable +is used to name the resulting library):

+LLVMC_PLUGIN=MyPlugin
+

It is also a good idea to rename Simple.td to something less +generic:

+$ mv Simple.td MyPlugin.td
+

To build your plugin as a dynamic library, just cd to its source +directory and run make. The resulting file will be called +plugin_llvmc_$(LLVMC_PLUGIN).$(DLL_EXTENSION) (in our case, +plugin_llvmc_MyPlugin.so). This library can be then loaded in with the +-load option. Example:

+$ cd $LLVMC_DIR/plugins/Simple
+$ make
+$ llvmc -load $LLVM_DIR/Release/lib/plugin_llvmc_Simple.so
+

- - -

Operation

At a high level, llvmc operation is very simple. The basic action - taken by llvmc is to simply invoke some tool or set of tools to fill - the user's request for compilation. Every execution of llvmctakes the - following sequence of steps:

Collect Command Line Options: The command line options provide the marching orders to llvmc - on what actions it should perform. This is the request the user is making - of llvmc and it is interpreted first. See the llvmc - manual page for details on the - options.
Read Configuration Files: Based on the options and the suffixes of the filenames presented, a set - of configuration files are read to configure the actions llvmc will - take. Configuration files are provided by either LLVM or the - compiler tools that llvmc invokes. These files determine what - actions llvmc will take in response to the user's request. See - the section on configuration for more details. -
Determine Phases To Execute: Based on the command line options and configuration files, - llvmc determines the compilation phases that - must be executed by the user's request. This is the primary work of - llvmc.
Determine Actions To Execute: Each phase to be executed can result in the - invocation of one or more actions. An action is - either a whole program or a function in a dynamically linked shared library. - In this step, llvmc determines the sequence of actions that must be - executed. Actions will always be executed in a deterministic order.
Execute Actions: The actions necessary to support the user's - original request are executed sequentially and deterministically. All - actions result in either the invocation of a whole program to perform the - action or the loading of a dynamically linkable shared library and invocation - of a standard interface function within that library.
Termination: If any action fails (returns a non-zero result code), llvmc - also fails and returns the result code from the failing action. If - everything succeeds, llvmc will return a zero result code.

llvmc's operation must be simple, regular and predictable. - Developers need to be able to rely on it to take a consistent approach to - compilation. For example, the invocation:


-    llvmc -O2 x.c y.c z.c -o xyz

must produce exactly the same results as:


-    llvmc -O2 x.c -o x.o
-    llvmc -O2 y.c -o y.o
-    llvmc -O2 z.c -o z.o
-    llvmc -O2 x.o y.o z.o -o xyz

To accomplish this, llvmc uses a very simple goal oriented - procedure to do its work. The overall goal is to produce a functioning - executable. To accomplish this, llvmc always attempts to execute a - series of compilation phases in the same sequence. - However, the user's options to llvmc can cause the sequence of phases - to start in the middle or finish early.

Compiling standalone LLVMC-based drivers

By default, the llvmc executable consists of a driver core plus several +statically linked plugins (Base and Clang at the moment). You can +produce a standalone LLVMC-based driver executable by linking the core with your +own plugins. The recommended way to do this is by starting with the provided +Skeleton example ($LLVMC_DIR/example/Skeleton):

+$ cd $LLVMC_DIR/example/
+$ cp -r Skeleton mydriver
+$ cd mydriver
+$ vim Makefile
+[...]
+$ make
+

If you're compiling LLVM with different source and object directories, then you +must perform the following additional steps before running make:

+# LLVMC_SRC_DIR = $LLVM_SRC_DIR/tools/llvmc/
+# LLVMC_OBJ_DIR = $LLVM_OBJ_DIR/tools/llvmc/
+$ cp $LLVMC_SRC_DIR/example/mydriver/Makefile \
+  $LLVMC_OBJ_DIR/example/mydriver/
+$ cd $LLVMC_OBJ_DIR/example/mydriver
+$ make
+

Another way to do the same thing is by using the following command:

+$ cd $LLVMC_DIR
+$ make LLVMC_BUILTIN_PLUGINS=MyPlugin LLVMC_BASED_DRIVER_NAME=mydriver
+

This works with both srcdir == objdir and srcdir != objdir, but assumes that the +plugin source directory was placed under $LLVMC_DIR/plugins.

Sometimes, you will want a 'bare-bones' version of LLVMC that has no +built-in plugins. It can be compiled with the following command:

+$ cd $LLVMC_DIR
+$ make LLVMC_BUILTIN_PLUGINS=""
+

- - -

Phases

llvmc breaks every compilation task into the following five - distinct phases:

Preprocessing: Not all languages support preprocessing; - but for those that do, this phase can be invoked. This phase is for - languages that provide combining, filtering, or otherwise altering with the - source language input before the translator parses it. Although C and C++ - are the most common users of this phase, other languages may provide their - own preprocessor (whether its the C pre-processor or not).

Translation: The translation phase converts the source - language input into something that LLVM can interpret and use for - downstream phases. The translation is essentially from "non-LLVM form" to - "LLVM form".

Optimization: Once an LLVM Module has been obtained from - the translation phase, the program enters the optimization phase. This phase - attempts to optimize all of the input provided on the command line according - to the options provided.

Linking: The inputs are combined to form a complete - program.

The following table shows the inputs, outputs, and command line options - applicable to each phase.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Phase	Inputs	Outputs	Options
Preprocessing	Source Language File	Source Language File	- `-E` - Stops the compilation after preprocessing -
Translation	- Source Language File -	- LLVM Assembly - LLVM Bitcode - LLVM C++ IR -	- `-c` - Stops the compilation after translation so that optimization and - linking are not done. - `-S` - Stops the compilation before object code is written so that only - assembly code remains. -
Optimization	- LLVM Assembly - LLVM Bitcode -	- LLVM Bitcode -	- `-Ox` - This group of options controls the amount of optimization - performed. -
Linking	- LLVM Bitcode - Native Object Code - LLVM Library - Native Library -	- LLVM Bitcode Executable - Native Executable -	- `-L` Specifies a path for library search. - `-l` Specifies a library to link in. -

Customizing LLVMC: the compilation graph

Each TableGen configuration file should include the common +definitions:

+include "llvm/CompilerDriver/Common.td"
+

Internally, LLVMC stores information about possible source +transformations in form of a graph. Nodes in this graph represent +tools, and edges between two nodes represent a transformation path. A +special "root" node is used to mark entry points for the +transformations. LLVMC also assigns a weight to each edge (more on +this later) to choose between several alternative edges.

The definition of the compilation graph (see file +plugins/Base/Base.td for an example) is just a list of edges:

+def CompilationGraph : CompilationGraph<[
+    Edge<"root", "llvm_gcc_c">,
+    Edge<"root", "llvm_gcc_assembler">,
+    ...
+
+    Edge<"llvm_gcc_c", "llc">,
+    Edge<"llvm_gcc_cpp", "llc">,
+    ...
+
+    OptionalEdge<"llvm_gcc_c", "opt", (case (switch_on "opt"),
+                                      (inc_weight))>,
+    OptionalEdge<"llvm_gcc_cpp", "opt", (case (switch_on "opt"),
+                                              (inc_weight))>,
+    ...
+
+    OptionalEdge<"llvm_gcc_assembler", "llvm_gcc_cpp_linker",
+        (case (input_languages_contain "c++"), (inc_weight),
+              (or (parameter_equals "linker", "g++"),
+                  (parameter_equals "linker", "c++")), (inc_weight))>,
+    ...
+
+    ]>;
+

As you can see, the edges can be either default or optional, where +optional edges are differentiated by an additional case expression +used to calculate the weight of this edge. Notice also that we refer +to tools via their names (as strings). This makes it possible to add +edges to an existing compilation graph in plugins without having to +know about all tool definitions used in the graph.

The default edges are assigned a weight of 1, and optional edges get a +weight of 0 + 2*N where N is the number of tests that evaluated to +true in the case expression. It is also possible to provide an +integer parameter to inc_weight and dec_weight - in this case, +the weight is increased (or decreased) by the provided value instead +of the default 2. It is also possible to change the default weight of +an optional edge by using the default clause of the case +construct.

When passing an input file through the graph, LLVMC picks the edge +with the maximum weight. To avoid ambiguity, there should be only one +default edge between two nodes (with the exception of the root node, +which gets a special treatment - there you are allowed to specify one +default edge per language).

When multiple plugins are loaded, their compilation graphs are merged +together. Since multiple edges that have the same end nodes are not +allowed (i.e. the graph is not a multigraph), an edge defined in +several plugins will be replaced by the definition from the plugin +that was loaded last. Plugin load order can be controlled by using the +plugin priority feature described above.

To get a visual representation of the compilation graph (useful for +debugging), run llvmc --view-graph. You will need dot and +gsview installed for this to work properly.

- - -

Actions

An action, with regard to llvmc is a basic operation that it takes - in order to fulfill the user's request. Each phase of compilation will invoke - zero or more actions in order to accomplish that phase.

Actions come in two forms:

Invokable Executables
Functions in a shared library

Describing options

Command-line options that the plugin supports are defined by using an +OptionList:

+def Options : OptionList<[
+(switch_option "E", (help "Help string")),
+(alias_option "quiet", "q")
+...
+]>;
+

As you can see, the option list is just a list of DAGs, where each DAG +is an option description consisting of the option name and some +properties. A plugin can define more than one option list (they are +all merged together in the end), which can be handy if one wants to +separate option groups syntactically.

Possible option types:
+
+
- switch_option - a simple boolean switch without arguments, for example +-O2 or -time. At most one occurrence is allowed.
- parameter_option - option that takes one argument, for example +-std=c99. It is also allowed to use spaces instead of the equality +sign: -std c99. At most one occurrence is allowed.
- parameter_list_option - same as the above, but more than one option +occurence is allowed.
- prefix_option - same as the parameter_option, but the option name and +argument do not have to be separated. Example: -ofile. This can be also +specified as -o file; however, -o=file will be parsed incorrectly +(=file will be interpreted as option value). At most one occurrence is +allowed.
- prefix_list_option - same as the above, but more than one occurence of +the option is allowed; example: -lm -lpthread.
- alias_option - a special option type for creating aliases. Unlike other +option types, aliases are not allowed to have any properties besides the +aliased option name. Usage example: (alias_option "preprocess", "E")
+
+
Possible option properties:
+
+
- help - help string associated with this option. Used for -help +output.
- required - this option must be specified exactly once (or, in case of +the list options without the multi_val property, at least +once). Incompatible with zero_or_one and one_or_more.
- one_or_more - the option must be specified at least one time. Useful +only for list options in conjunction with multi_val; for ordinary lists +it is synonymous with required. Incompatible with required and +zero_or_one.
- optional - the option can be specified zero or one times. Useful only +for list options in conjunction with multi_val. Incompatible with +required and one_or_more.
- hidden - the description of this option will not appear in +the -help output (but will appear in the -help-hidden +output).
- really_hidden - the option will not be mentioned in any help +output.
- comma_separated - Indicates that any commas specified for an option's +value should be used to split the value up into multiple values for the +option. This property is valid only for list options. In conjunction with +forward_value can be used to implement option forwarding in style of +gcc's -Wa,.
- multi_val n - this option takes n arguments (can be useful in some +special cases). Usage example: (parameter_list_option "foo", (multi_val +3)); the command-line syntax is '-foo a b c'. Only list options can have +this attribute; you can, however, use the one_or_more, optional +and required properties.
- init - this option has a default value, either a string (if it is a +parameter), or a boolean (if it is a switch; as in C++, boolean constants +are called true and false). List options can't have init +attribute. +Usage examples: (switch_option "foo", (init true)); (prefix_option +"bar", (init "baz")).
- extern - this option is defined in some other plugin, see below.
+
+

External options

Sometimes, when linking several plugins together, one plugin needs to +access options defined in some other plugin. Because of the way +options are implemented, such options must be marked as +extern. This is what the extern option property is +for. Example:

+...
+(switch_option "E", (extern))
+...
+

If an external option has additional attributes besides 'extern', they are +ignored. See also the section on plugin priorities.

- - -

Configuration

- -

This section of the document describes the configuration files used by - llvmc. Configuration information is relatively static for a - given release of LLVM and a compiler tool. However, the details may - change from release to release of either. Users are encouraged to simply use - the various options of the llvmc command and ignore the configuration - of the tool. These configuration files are for compiler writers and LLVM - developers. Those wishing to simply use llvmc don't need to understand - this section but it may be instructive on how the tool works.

- - -

Overview

llvmc is highly configurable both on the command line and in -configuration files. The options it understands are generic, consistent and -simple by design. Furthermore, the llvmc options apply to the -compilation of any LLVM enabled programming language. To be enabled as a -supported source language compiler, a compiler writer must provide a -configuration file that tells llvmc how to invoke the compiler -and what its capabilities are. The purpose of the configuration files then -is to allow compiler writers to specify to llvmc how the compiler -should be invoked. Users may but are not advised to alter the compiler's -llvmc configuration.

- -

Because llvmc just invokes other programs, it must deal with the -available command line options for those programs regardless of whether they -were written for LLVM or not. Furthermore, not all compiler tools will -have the same capabilities. Some compiler tools will simply generate LLVM assembly -code, others will be able to generate fully optimized bitcode. In general, -llvmc doesn't make any assumptions about the capabilities or command -line options of a sub-tool. It simply uses the details found in the -configuration files and leaves it to the compiler writer to specify the -configuration correctly.

- -

This approach means that new compiler tools can be up and working very -quickly. As a first cut, a tool can simply compile its source to raw -(unoptimized) bitcode or LLVM assembly and llvmc can be configured -to pick up the slack (translate LLVM assembly to bitcode, optimize the -bitcode, generate native assembly, link, etc.). In fact, the compiler tools -need not use any LLVM libraries, and it could be written in any language -(instead of C++). The configuration data will allow the full range of -optimization, assembly, and linking capabilities that LLVM provides to be added -to these kinds of tools. Enabling the rapid development of front-ends is one -of the primary goals of llvmc.

- -

As a compiler tool matures, it may utilize the LLVM libraries and tools -to more efficiently produce optimized bitcode directly in a single compilation -and optimization program. In these cases, multiple tools would not be needed -and the configuration data for the compiler would change.

- -

Configuring llvmc to the needs and capabilities of a source language -compiler is relatively straight-forward. A compiler writer must provide a -definition of what to do for each of the five compilation phases for each of -the optimization levels. The specification consists simply of prototypical -command lines into which llvmc can substitute command line -arguments and file names. Note that any given phase can be completely blank if -the source language's compiler combines multiple phases into a single program. -For example, quite often pre-processing, translation, and optimization are -combined into a single program. The specification for such a compiler would have -blank entries for pre-processing and translation but a full command line for -optimization.

Conditional evaluation

The 'case' construct is the main means by which programmability is +achieved in LLVMC. It can be used to calculate edge weights, program +actions and modify the shell commands to be executed. The 'case' +expression is designed after the similarly-named construct in +functional languages and takes the form (case (test_1), statement_1, +(test_2), statement_2, ... (test_N), statement_N). The statements +are evaluated only if the corresponding tests evaluate to true.

Examples:

+// Edge weight calculation
+
+// Increases edge weight by 5 if "-A" is provided on the
+// command-line, and by 5 more if "-B" is also provided.
+(case
+    (switch_on "A"), (inc_weight 5),
+    (switch_on "B"), (inc_weight 5))
+
+
+// Tool command line specification
+
+// Evaluates to "cmdline1" if the option "-A" is provided on the
+// command line; to "cmdline2" if "-B" is provided;
+// otherwise to "cmdline3".
+
+(case
+    (switch_on "A"), "cmdline1",
+    (switch_on "B"), "cmdline2",
+    (default), "cmdline3")
+

Note the slight difference in 'case' expression handling in contexts +of edge weights and command line specification - in the second example +the value of the "B" switch is never checked when switch "A" is +enabled, and the whole expression always evaluates to "cmdline1" in +that case.

Case expressions can also be nested, i.e. the following is legal:

+(case (switch_on "E"), (case (switch_on "o"), ..., (default), ...)
+      (default), ...)
+

You should, however, try to avoid doing that because it hurts +readability. It is usually better to split tool descriptions and/or +use TableGen inheritance instead.

Possible tests are:
- switch_on - Returns true if a given command-line switch is provided by +the user. Can be given a list as argument, in that case (switch_on ["foo", +"bar", "baz"]) is equivalent to (and (switch_on "foo"), (switch_on +"bar"), (switch_on "baz")). +Example: (switch_on "opt").
- any_switch_on - Given a list of switch options, returns true if any of +the switches is turned on. +Example: (any_switch_on ["foo", "bar", "baz"]) is equivalent to (or +(switch_on "foo"), (switch_on "bar"), (switch_on "baz")).
- parameter_equals - Returns true if a command-line parameter equals +a given value. +Example: (parameter_equals "W", "all").
- element_in_list - Returns true if a command-line parameter +list contains a given value. +Example: (element_in_list "l", "pthread").
- input_languages_contain - Returns true if a given language +belongs to the current input language set. +Example: (input_languages_contain "c++").
- in_language - Evaluates to true if the input file language is equal to +the argument. At the moment works only with cmd_line and actions (on +non-join nodes). +Example: (in_language "c++").
- not_empty - Returns true if a given option (which should be either a +parameter or a parameter list) is set by the user. Like switch_on, can +be also given a list as argument. +Example: (not_empty "o").
- any_not_empty - Returns true if not_empty returns true for any of +the options in the list. +Example: (any_not_empty ["foo", "bar", "baz"]) is equivalent to (or +(not_empty "foo"), (not_empty "bar"), (not_empty "baz")).
- empty - The opposite of not_empty. Equivalent to (not (not_empty +X)). Provided for convenience. Can be given a list as argument.
- any_not_empty - Returns true if not_empty returns true for any of +the options in the list. +Example: (any_empty ["foo", "bar", "baz"]) is equivalent to (not (and +(not_empty "foo"), (not_empty "bar"), (not_empty "baz"))).
- single_input_file - Returns true if there was only one input file +provided on the command-line. Used without arguments: +(single_input_file).
- multiple_input_files - Equivalent to (not (single_input_file)) (the +case of zero input files is considered an error).
- default - Always evaluates to true. Should always be the last +test in the case expression.
- and - A standard binary logical combinator that returns true iff all of +its arguments return true. Used like this: (and (test1), (test2), +... (testN)). Nesting of and and or is allowed, but not +encouraged.
- or - A binary logical combinator that returns true iff any of its +arguments returns true. Example: (or (test1), (test2), ... (testN)).
- not - Standard unary logical combinator that negates its +argument. Example: (not (or (test1), (test2), ... (testN))).
+

- - -

Configuration Files

File Contents

Each configuration file provides the details for a single source language - that is to be compiled. This configuration information tells llvmc - how to invoke the language's pre-processor, translator, optimizer, assembler - and linker. Note that a given source language needn't provide all these tools - as many of them exist in llvm currently.

Writing a tool description

As was said earlier, nodes in the compilation graph represent tools, +which are described separately. A tool definition looks like this +(taken from the include/llvm/CompilerDriver/Tools.td file):

+def llvm_gcc_cpp : Tool<[
+    (in_language "c++"),
+    (out_language "llvm-assembler"),
+    (output_suffix "bc"),
+    (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"),
+    (sink)
+    ]>;
+

This defines a new tool called llvm_gcc_cpp, which is an alias for +llvm-g++. As you can see, a tool definition is just a list of +properties; most of them should be self-explanatory. The sink +property means that this tool should be passed all command-line +options that aren't mentioned in the option list.

The complete list of all currently implemented tool properties follows.

Possible tool properties:
- in_language - input language name. Can be either a string or a +list, in case the tool supports multiple input languages.
- out_language - output language name. Multiple output languages are not +allowed.
- output_suffix - output file suffix. Can also be changed +dynamically, see documentation on actions.
- cmd_line - the actual command used to run the tool. You can +use $INFILE and $OUTFILE variables, output redirection +with >, hook invocations ($CALL), environment variables +(via $ENV) and the case construct.
- join - this tool is a "join node" in the graph, i.e. it gets a +list of input files and joins them together. Used for linkers.
- sink - all command-line options that are not handled by other +tools are passed to this tool.
- actions - A single big case expression that specifies how +this tool reacts on command-line options (described in more detail +below).
+

Actions

A tool often needs to react to command-line options, and this is +precisely what the actions property is for. The next example +illustrates this feature:

+def llvm_gcc_linker : Tool<[
+    (in_language "object-code"),
+    (out_language "executable"),
+    (output_suffix "out"),
+    (cmd_line "llvm-gcc $INFILE -o $OUTFILE"),
+    (join),
+    (actions (case (not_empty "L"), (forward "L"),
+                   (not_empty "l"), (forward "l"),
+                   (not_empty "dummy"),
+                             [(append_cmd "-dummy1"), (append_cmd "-dummy2")])
+    ]>;
+

The actions tool property is implemented on top of the omnipresent +case expression. It associates one or more different actions +with given conditions - in the example, the actions are forward, +which forwards a given option unchanged, and append_cmd, which +appends a given string to the tool execution command. Multiple actions +can be associated with a single condition by using a list of actions +(used in the example to append some dummy options). The same case +construct can also be used in the cmd_line property to modify the +tool command line.

The "join" property used in the example means that this tool behaves +like a linker.

The list of all possible actions follows.

Possible actions:
+
+
- append_cmd - Append a string to the tool invocation command. +Example: (case (switch_on "pthread"), (append_cmd "-lpthread")).
- error - Exit with error. +Example: (error "Mixing -c and -S is not allowed!").
- warning - Print a warning. +Example: (warning "Specifying both -O1 and -O2 is meaningless!").
- forward - Forward the option unchanged. +Example: (forward "Wall").
- forward_as - Change the option's name, but forward the argument +unchanged. +Example: (forward_as "O0", "--disable-optimization").
- forward_value - Forward only option's value. Cannot be used with switch +options (since they don't have values), but works fine with lists. +Example: (forward_value "Wa,").
- forward_transformed_value - As above, but applies a hook to the +option's value before forwarding (see below). When +forward_transformed_value is applied to a list +option, the hook must have signature +std::string hooks::HookName (const std::vector<std::string>&). +Example: (forward_transformed_value "m", "ConvertToMAttr").
- output_suffix - Modify the output suffix of this tool. +Example: (output_suffix "i").
- stop_compilation - Stop compilation after this tool processes its +input. Used without arguments. +Example: (stop_compilation).
+
+

- - -

Directory Search

llvmc always looks for files of a specific name. It uses the - first file with the name its looking for by searching directories in the - following order:
-

Any directory specified by the -config-dir option will be - checked first.
If the environment variable LLVM_CONFIG_DIR is set, and it contains - the name of a valid directory, that directory will be searched next.
If the user's home directory (typically /home/user contains - a sub-directory named .llvm and that directory contains a - sub-directory named etc then that directory will be tried - next.
If the LLVM installation directory (typically /usr/local/llvm - contains a sub-directory named etc then that directory will be - tried last.
A standard "system" directory will be searched next. This is typically - /etc/llvm on UNIX™ and C:\WINNT on Microsoft - Windows™.
If the configuration file sought still can't be found, llvmc - will print an error message and exit.

The first file found in this search will be used. Other files with the - same name will be ignored even if they exist in one of the subsequent search - locations.

- -

File Names

In the directories searched, each configuration file is given a specific - name to foster faster lookup (so llvmc doesn't have to do directory searches). - The name of a given language specific configuration file is simply the same - as the suffix used to identify files containing source in that language. - For example, a configuration file for C++ source might be named - cpp, C, or cxx. For languages that support multiple - file suffixes, multiple (probably identical) files (or symbolic links) will - need to be provided.

Language map

If you are adding support for a new language to LLVMC, you'll need to +modify the language map, which defines mappings from file extensions +to language names. It is used to choose the proper toolchain(s) for a +given input file set. Language map definition looks like this:

+def LanguageMap : LanguageMap<
+    [LangToSuffixes<"c++", ["cc", "cp", "cxx", "cpp", "CPP", "c++", "C"]>,
+     LangToSuffixes<"c", ["c"]>,
+     ...
+    ]>;
+

For example, without those definitions the following command wouldn't work:

+$ llvmc hello.cpp
+llvmc: Unknown suffix: cpp
+

The language map entries are needed only for the tools that are linked from the +root node. Since a tool can't have multiple output languages, for inner nodes of +the graph the input and output languages should match. This is enforced at +compile-time.

- -

What Gets Read

Which configuration files are read depends on the command line options and - the suffixes of the file names provided on llvmc's command line. Note - that the -x LANGUAGE option alters the language that llvmc - uses for the subsequent files on the command line. Only the configuration - files actually needed to complete llvmc's task are read. Other - language specific files will be ignored.

Option preprocessor

It is sometimes useful to run error-checking code before processing the +compilation graph. For example, if optimization options "-O1" and "-O2" are +implemented as switches, we might want to output a warning if the user invokes +the driver with both of these options enabled.

The OptionPreprocessor feature is reserved specially for these +occasions. Example (adapted from the built-in Base plugin):

+def Preprocess : OptionPreprocessor<
+(case (not (any_switch_on ["O0", "O1", "O2", "O3"])),
+           (set_option "O2"),
+      (and (switch_on "O3"), (any_switch_on ["O0", "O1", "O2"])),
+           (unset_option ["O0", "O1", "O2"]),
+      (and (switch_on "O2"), (any_switch_on ["O0", "O1"])),
+           (unset_option ["O0", "O1"]),
+      (and (switch_on "O1"), (switch_on "O0")),
+           (unset_option "O0"))
+>;
+

Here, OptionPreprocessor is used to unset all spurious -O options so +that they are not forwarded to the compiler. If no optimization options are +specified, -O2 is enabled.

OptionPreprocessor is basically a single big case expression, which is +evaluated only once right after the plugin is loaded. The only allowed actions +in OptionPreprocessor are error, warning, and two special actions: +unset_option and set_option. As their names suggest, they can be used to +set or unset a given option. To set an option with set_option, use the +two-argument form: (set_option "parameter", VALUE). Here, VALUE can be +either a string, a string list, or a boolean constant.

For convenience, set_option and unset_option also work on lists. That +is, instead of [(unset_option "A"), (unset_option "B")] you can use +(unset_option ["A", "B"]). Obviously, (set_option ["A", "B"]) is valid +only if both A and B are switches.

- - -

Syntax

The syntax of the configuration files is very simple and somewhat - compatible with Java's property files. Here are the syntax rules:

The file encoding is ASCII.
The file is line oriented. There should be one configuration definition - per line. Lines are terminated by the newline (0x0A) and/or carriage return - characters (0x0D)
A backslash (\) before a newline causes the newline to be - ignored. This is useful for line continuation of long definitions. A - backslash anywhere else is recognized as a backslash.
A configuration item consists of a name, an = and a value.
A name consists of a sequence of identifiers separated by period.
An identifier consists of specific keywords made up of only lower case - and upper case letters (e.g. lang.name).
Values come in four flavors: booleans, integers, commands and - strings.
Valid "false" boolean values are false False FALSE no No NO - off Off and OFF.
Valid "true" boolean values are true True TRUE yes Yes YES - on On and ON.
Integers are simply sequences of digits.
Commands start with a program name and are followed by a sequence of - words that are passed to that program as command line arguments. Program - arguments that begin and end with the % sign will have their value - substituted. Program names beginning with / are considered to be - absolute. Otherwise the PATH will be applied to find the program to - execute.
Strings are composed of multiple sequences of characters from the - character class [-A-Za-z0-9_:%+/\\|,] separated by white - space.
White space on a line is folded. Multiple blanks or tabs will be - reduced to a single blank.
White space before the configuration item's name is ignored.
White space on either side of the = is ignored.
White space in a string value is used to separate the individual - components of the string value but otherwise ignored.
Comments are introduced by the # character. Everything after a - # and before the end of line is ignored.

More advanced topics

Hooks and environment variables

Normally, LLVMC executes programs from the system PATH. Sometimes, +this is not sufficient: for example, we may want to specify tool paths +or names in the configuration file. This can be easily achieved via +the hooks mechanism. To write your own hooks, just add their +definitions to the PluginMain.cpp or drop a .cpp file into the +your plugin directory. Hooks should live in the hooks namespace +and have the signature std::string hooks::MyHookName ([const char* +Arg0 [ const char* Arg2 [, ...]]]). They can be used from the +cmd_line tool property:

+(cmd_line "$CALL(MyHook)/path/to/file -o $CALL(AnotherHook)")
+

To pass arguments to hooks, use the following syntax:

+(cmd_line "$CALL(MyHook, 'Arg1', 'Arg2', 'Arg # 3')/path/to/file -o1 -o2")
+

It is also possible to use environment variables in the same manner:

+(cmd_line "$ENV(VAR1)/path/to/file -o $ENV(VAR2)")
+

To change the command line string based on user-provided options use +the case expression (documented above):

+(cmd_line
+  (case
+    (switch_on "E"),
+       "llvm-g++ -E -x c $INFILE -o $OUTFILE",
+    (default),
+       "llvm-g++ -c -x c $INFILE -o $OUTFILE -emit-llvm"))
+

- - -

Configuration Items

The table below provides definitions of the allowed configuration items - that may appear in a configuration file. Every item has a default value and - does not need to appear in the configuration file. Missing items will have the - default value. Each identifier may appear as all lower case, first letter - capitalized or all upper case.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Name	Value Type	Description	Default
LLVMC ITEMS
version	string	Provides the version string for the contents of this - configuration file. What is accepted as a legal configuration file - will change over time and this item tells `llvmc` which version - should be expected.	b
LANG ITEMS
lang.name	string	Provides the common name for a language definition. - For example "C++", "Pascal", "FORTRAN", etc.	blank
lang.opt1	string	Specifies the parameters to give the optimizer when - `-O1` is specified on the `llvmc` command line.	`-simplifycfg -instcombine -mem2reg`
lang.opt2	string	Specifies the parameters to give the optimizer when - `-O2` is specified on the `llvmc` command line.	TBD
lang.opt3	string	Specifies the parameters to give the optimizer when - `-O3` is specified on the `llvmc` command line.	TBD
lang.opt4	string	Specifies the parameters to give the optimizer when - `-O4` is specified on the `llvmc` command line.	TBD
lang.opt5	string	Specifies the parameters to give the optimizer when - `-O5` is specified on the `llvmc` command line.	TBD
PREPROCESSOR ITEMS
preprocessor.command	command	This provides the command prototype that will be used - to run the preprocessor. This is generally only used with the - `-E` option.	<blank>
preprocessor.required	boolean	This item specifies whether the pre-processing phase - is required by the language. If the value is true, then the - `preprocessor.command` value must not be blank. With this option, - `llvmc` will always run the preprocessor as it assumes that the - translation and optimization phases don't know how to pre-process their - input.	false
TRANSLATOR ITEMS
translator.command	command	This provides the command prototype that will be used - to run the translator. Valid substitutions are `%in%` for the - input file and `%out%` for the output file.	<blank>
translator.output	`bitcode` or `assembly`	This item specifies the kind of output the language's - translator generates.	`bitcode`
translator.preprocesses	boolean	Indicates that the translator also preprocesses. If - this is true, then `llvmc` will skip the pre-processing phase - whenever the final phase is not pre-processing.	`false`
OPTIMIZER ITEMS
optimizer.command	command	This provides the command prototype that will be used - to run the optimizer. Valid substitutions are `%in%` for the - input file and `%out%` for the output file.	<blank>
optimizer.output	`bitcode` or `assembly`	This item specifies the kind of output the language's - optimizer generates. Valid values are "assembly" and "bitcode"	`bitcode`
optimizer.preprocesses	boolean	Indicates that the optimizer also preprocesses. If - this is true, then `llvmc` will skip the pre-processing phase - whenever the final phase is optimization or later.	`false`
optimizer.translates	boolean	Indicates that the optimizer also translates. If - this is true, then `llvmc` will skip the translation phase - whenever the final phase is optimization or later.	`false`
ASSEMBLER ITEMS
assembler.command	command	This provides the command prototype that will be used - to run the assembler. Valid substitutions are `%in%` for the - input file and `%out%` for the output file.	<blank>

How plugins are loaded

It is possible for LLVMC plugins to depend on each other. For example, +one can create edges between nodes defined in some other plugin. To +make this work, however, that plugin should be loaded first. To +achieve this, the concept of plugin priority was introduced. By +default, every plugin has priority zero; to specify the priority +explicitly, put the following line in your plugin's TableGen file:

+def Priority : PluginPriority<$PRIORITY_VALUE>;
+# Where PRIORITY_VALUE is some integer > 0
+

Plugins are loaded in order of their (increasing) priority, starting +with 0. Therefore, the plugin with the highest priority value will be +loaded last.

- - -

Substitutions

On any configuration item that ends in command, you must - specify substitution tokens. Substitution tokens begin and end with a percent - sign (%) and are replaced by the corresponding text. Any substitution - token may be given on any command line but some are more useful than - others. In particular each command should have both an %in% - and an %out% substitution. The table below provides definitions of - each of the allowed substitution tokens.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Substitution Token	Replacement Description
`%args%`	Replaced with all the tool-specific arguments given - to `llvmc` via the `-T` set of options. This just allows - you to place these arguments in the correct place on the command line. - If the `%args%` option does not appear on your command line, - then you are explicitly disallowing the `-T` option for your - tool. -
`%force%`	Replaced with the `-f` option if it was - specified on the `llvmc` command line. This is intended to tell - the compiler tool to force the overwrite of output files. -
`%in%`	Replaced with the full path of the input file. You - needn't worry about the cascading of file names. `llvmc` will - create temporary files and ensure that the output of one phase is the - input to the next phase.
`%opt%`	Replaced with the optimization options for the - tool. If the tool understands the `-O` options then that will - be passed. Otherwise, the `lang.optN` series of configuration - items will specify which arguments are to be given.
`%out%`	Replaced with the full path of the output file. - Note that this is not necessarily the output file specified with the - `-o` option on `llvmc`'s command line. It might be a - temporary file that will be passed to a subsequent phase's input. -
`%stats%`	If your command accepts the `-stats` option, - use this substitution token. If the user requested `-stats` - from the `llvmc` command line then this token will be replaced - with `-stats`, otherwise it will be ignored. -
`%target%`	Replaced with the name of the target "machine" for - which code should be generated. The value used here is taken from the - `llvmc` option `-march`. -
`%time%`	If your command accepts the `-time-passes` - option, use this substitution token. If the user requested - `-time-passes` from the `llvmc` command line then this - token will be replaced with `-time-passes`, otherwise it will - be ignored. -

Debugging

When writing LLVMC plugins, it can be useful to get a visual view of +the resulting compilation graph. This can be achieved via the command +line option --view-graph. This command assumes that Graphviz and +Ghostview are installed. There is also a --write-graph option that +creates a Graphviz source file (compilation-graph.dot) in the +current directory.

Another useful llvmc option is --check-graph. It checks the +compilation graph for common errors like mismatched output/input +language names, multiple default edges and cycles. These checks can't +be performed at compile-time because the plugins can load code +dynamically. When invoked with --check-graph, llvmc doesn't +perform any compilation tasks and returns the number of encountered +errors as its status code.

Conditioning on the executable name

For now, the executable name (the value passed to the driver in argv[0]) is +accessible only in the C++ code (i.e. hooks). Use the following code:

+namespace llvmc {
+extern const char* ProgramName;
+}
+
+namespace hooks {
+
+std::string MyHook() {
+//...
+if (strcmp(ProgramName, "mydriver") == 0) {
+   //...
+
+}
+
+} // end namespace hooks
+

In general, you're encouraged not to make the behaviour dependent on the +executable file name, and use command-line switches instead. See for example how +the Base plugin behaves when it needs to choose the correct linker options +(think g++ vs. gcc).

+ +

+ +

+ +Mikhail Glushenkov
+LLVM Compiler Infrastructure
- -

Sample Config File

Since an example is always instructive, here's how the Stacker language - configuration file looks.


-# Stacker Configuration File For llvmc
-
-##########################################################
-# Language definitions
-##########################################################
-  lang.name=Stacker 
-  lang.opt1=-simplifycfg -instcombine -mem2reg
-  lang.opt2=-simplifycfg -instcombine -mem2reg -load-vn \
-    -gcse -dse -scalarrepl -sccp 
-  lang.opt3=-simplifycfg -instcombine -mem2reg -load-vn \
-    -gcse -dse -scalarrepl -sccp -branch-combine -adce \
-    -globaldce -inline -licm 
-  lang.opt4=-simplifycfg -instcombine -mem2reg -load-vn \
-    -gcse -dse -scalarrepl -sccp -ipconstprop \
-    -branch-combine -adce -globaldce -inline -licm 
-  lang.opt5=-simplifycfg -instcombine -mem2reg --load-vn \
-    -gcse -dse scalarrepl -sccp -ipconstprop \
-    -branch-combine -adce -globaldce -inline -licm \
-    -block-placement
-
-##########################################################
-# Pre-processor definitions
-##########################################################
-
-  # Stacker doesn't have a preprocessor but the following
-  # allows the -E option to be supported
-  preprocessor.command=cp %in% %out%
-  preprocessor.required=false
-
-##########################################################
-# Translator definitions
-##########################################################
-
-  # To compile stacker source, we just run the stacker
-  # compiler with a default stack size of 2048 entries.
-  translator.command=stkrc -s 2048 %in% -o %out% %time% \
-    %stats% %force% %args%
-
-  # stkrc doesn't preprocess but we set this to true so
-  # that we don't run the cp command by default.
-  translator.preprocesses=true
-
-  # The translator is required to run.
-  translator.required=true
-
-  # stkrc doesn't handle the -On options
-  translator.output=bitcode
-
-##########################################################
-# Optimizer definitions
-##########################################################
-  
-  # For optimization, we use the LLVM "opt" program
-  optimizer.command=opt %in% -o %out% %opt% %time% %stats% \
-    %force% %args%
-
-  optimizer.required = true
-
-  # opt doesn't translate
-  optimizer.translates = no
-
-  # opt doesn't preprocess
-  optimizer.preprocesses=no
-
-  # opt produces bitcode
-  optimizer.output = bc
-
-##########################################################
-# Assembler definitions
-##########################################################
-  assembler.command=llc %in% -o %out% %target% %time% %stats%
-

- - -

Glossary

- -

This document uses precise terms in reference to the various artifacts and - concepts related to compilation. The terms used throughout this document are - defined below.

assembly: A compilation phase in which LLVM bitcode or - LLVM assembly code is assembled to a native code format (either target - specific aseembly language or the platform's native object file format). -
compiler: Refers to any program that can be invoked by llvmc to accomplish - the work of one or more compilation phases.
driver: Refers to llvmc itself.
linking: A compilation phase in which LLVM bitcode files - and (optionally) native system libraries are combined to form a complete - executable program.
optimization: A compilation phase in which LLVM bitcode is - optimized.
phase: Refers to any one of the five compilation phases that that - llvmc supports. The five phases are: - preprocessing, - translation, - optimization, - assembly, - linking.
source language: Any common programming language (e.g. C, C++, Java, Stacker, ML, - FORTRAN). These languages are distinguished from any of the lower level - languages (such as LLVM or native assembly), by the fact that a - translation phase - is required before LLVM can be applied.
tool: Refers to any program in the LLVM tool set.
translation: A compilation phase in which - source language code is translated into - either LLVM assembly language or LLVM bitcode.

- -

Reid Spencer
-The LLVM Compiler Infrastructure
Last modified: $Date$ -

- +

Customizing LLVMC: Reference Manual

LLVMC ITEMS

LANG ITEMS

PREPROCESSOR ITEMS

TRANSLATOR ITEMS

OPTIMIZER ITEMS

ASSEMBLER ITEMS