X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FCompilerDriver.html;h=dd7526753df6e7b0e33e8b42416e74390c695cc2;hb=52bb2db70998c42c99d22069ac66eb7bbb492f3a;hp=a0579bc97372ac0396a40f9e6eec8229e677a71a;hpb=24fdc1d1dfcc0bb089ceea1f6582764eb0f814a8;p=oota-llvm.git diff --git a/docs/CompilerDriver.html b/docs/CompilerDriver.html index a0579bc9737..dd7526753df 100644 --- a/docs/CompilerDriver.html +++ b/docs/CompilerDriver.html @@ -1,821 +1,420 @@ - - + + + - - The LLVM Compiler Driver (llvmc) - - - + + +Customizing LLVMC: Reference Manual + -
The LLVM Compiler Driver (llvmc)
-

NOTE: This document is a work in progress!

-
    -
  1. Abstract
  2. -
  3. Introduction -
      -
    1. Purpose
    2. -
    3. Operation
    4. -
    5. Phases
    6. -
    7. Actions
    8. -
    -
  4. -
  5. Configuration -
      -
    1. Overview
    2. -
    3. Configuration Files
    4. -
    5. Syntax
    6. -
    7. Substitutions
    8. -
    9. Sample Config File
    10. -
    -
  6. Glossary -
-
-

Written by Reid Spencer -

-
+
- -
Abstract
- -
-

This document describes the requirements, design, and configuration of the - LLVM compiler driver, llvmc. The compiler driver knows about LLVM's - tool set and can be configured to know about a variety of compilers for - source languages. It uses this knowledge to execute the tools necessary - to accomplish general compilation, optimization, and linking tasks. The main - purpose of llvmc is to provide a simple and consistent interface to - all compilation tasks. This reduces the burden on the end user who can just - learn to use llvmc instead of the entire LLVM tool set and all the - source language compilers compatible with LLVM.

-
- -
Introduction
- -
-

The llvmc tool is a configurable compiler - driver. As such, it isn't a compiler, optimizer, - or a linker itself but it drives (invokes) other software that perform those - tasks. If you are familiar with the GNU Compiler Collection's gcc - tool, llvmc is very similar.

-

The following introductory sections will help you understand why this tool - is necessary and what it does.

-
+
Customizing LLVMC: Reference Manual
- -
Purpose
-
-

llvmc was invented to make compilation of user programs with - LLVM-based tools easier. To accomplish this, llvmc strives to:

- -

Additionally, llvmc makes it easier to write a compiler for use - with LLVM, because it:

- +
+

Note: This document is a work-in-progress. Additions and clarifications + are welcome.

- - -
-

At a high level, llvmc operation is very simple. The basic action - taken by llvmc is to simply invoke some tool or set of tools to fill - the user's request for compilation. Every execution of llvmctakes the - following sequence of steps:

-
-
Collect Command Line Options
-
The command line options provide the marching orders to llvmc - on what actions it should perform. This is the request the user is making - of llvmc and it is interpreted first. See the llvmc - manual page for details on the - options.
-
Read Configuration Files
-
Based on the options and the suffixes of the filenames presented, a set - of configuration files are read to configure the actions llvmc will - take. Configuration files are provided by either LLVM or the - compiler tools that llvmc invokes. These files determine what - actions llvmc will take in response to the user's request. See - the section on configuration for more details. -
-
Determine Phases To Execute
-
Based on the command line options and configuration files, - llvmc determines the compilation phases that - must be executed by the user's request. This is the primary work of - llvmc.
-
Determine Actions To Execute
-
Each phase to be executed can result in the - invocation of one or more actions. An action is - either a whole program or a function in a dynamically linked shared library. - In this step, llvmc determines the sequence of actions that must be - executed. Actions will always be executed in a deterministic order.
-
Execute Actions
-
The actions necessary to support the user's - original request are executed sequentially and deterministically. All - actions result in either the invocation of a whole program to perform the - action or the loading of a dynamically linkable shared library and invocation - of a standard interface function within that library.
-
Termination
-
If any action fails (returns a non-zero result code), llvmc - also fails and returns the result code from the failing action. If - everything succeeds, llvmc will return a zero result code.
-
-

llvmc's operation must be simple, regular and predictable. - Developers need to be able to rely on it to take a consistent approach to - compilation. For example, the invocation:

- - llvmc -O2 x.c y.c z.c -o xyz -

must produce exactly the same results as:

-

-    llvmc -O2 x.c -o x.o
-    llvmc -O2 y.c -o y.o
-    llvmc -O2 z.c -o z.o
-    llvmc -O2 x.o y.o z.o -o xyz
-

To accomplish this, llvmc uses a very simple goal oriented - procedure to do its work. The overall goal is to produce a functioning - executable. To accomplish this, llvmc always attempts to execute a - series of compilation phases in the same sequence. - However, the user's options to llvmc can cause the sequence of phases - to start in the middle or finish early.

+

LLVMC is a generic compiler driver, designed to be customizable and +extensible. It plays the same role for LLVM as the gcc program +does for GCC - LLVMC's job is essentially to transform a set of input +files into a set of targets depending on configuration rules and user +options. What makes LLVMC different is that these transformation rules +are completely customizable - in fact, LLVMC knows nothing about the +specifics of transformation (even the command-line options are mostly +not hard-coded) and regards the transformation structure as an +abstract graph. This makes it possible to adapt LLVMC for other +purposes - for example, as a build tool for game resources.

+

Because LLVMC employs TableGen [1] as its configuration language, you +need to be familiar with it to customize LLVMC.

+ - -
Phases
-
-

llvmc breaks every compilation task into the following five - distinct phases:

-
Preprocessing
Not all languages support preprocessing; - but for those that do, this phase can be invoked. This phase is for - languages that provide combining, filtering, or otherwise altering with the - source language input before the translator parses it. Although C and C++ - are the most common users of this phase, other languages may provide their - own preprocessor (whether its the C pre-processor or not).
-
-
Translation
The translation phase converts the source - language input into something that LLVM can interpret and use for - downstream phases. The translation is essentially from "non-LLVM form" to - "LLVM form".
-
-
Optimization
Once an LLVM Module has been obtained from - the translation phase, the program enters the optimization phase. This phase - attempts to optimize all of the input provided on the command line according - to the options provided.
-
-
Linking
The inputs are combined to form a complete - program.
-
-

The following table shows the inputs, outputs, and command line options - applicabe to each phase.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PhaseInputsOutputsOptions
Preprocessing
  • Source Language File
  • Source Language File
-
-E
-
Stops the compilation after preprocessing
-
Translation
    -
  • Source Language File
  • -
    -
  • LLVM Assembly
  • -
  • LLVM Bytecode
  • -
  • LLVM C++ IR
  • -
-
-c
-
Stops the compilation after translation so that optimization and - linking are not done.
-
-S
-
Stops the compilation before object code is written so that only - assembly code remains.
-
Optimization
    -
  • LLVM Assembly
  • -
  • LLVM Bytecode
  • -
    -
  • LLVM Bytecode
  • -
-
-Ox -
This group of options controls the amount of optimization - performed.
-
Linking
    -
  • LLVM Bytecode
  • -
  • Native Object Code
  • -
  • LLVM Library
  • -
  • Native Library
  • -
    -
  • LLVM Bytecode Executable
  • -
  • Native Executable
  • -
-
-L
Specifies a path for library search.
-
-l
Specifies a library to link in.
-
-
+
Written by Mikhail Glushenkov
- -
Actions
-

An action, with regard to llvmc is a basic operation that it takes - in order to fulfill the user's request. Each phase of compilation will invoke - zero or more actions in order to accomplish that phase.

-

Actions come in two forms:

-
    -
  • Invokable Executables
  • -
  • Functions in a shared library
  • -
+ +

LLVMC tries hard to be as compatible with gcc as possible, +although there are some small differences. Most of the time, however, +you shouldn't be able to notice them:

+
+$ # This works as expected:
+$ llvmc2 -O3 -Wall hello.cpp
+$ ./a.out
+hello
+
+

One nice feature of LLVMC is that one doesn't have to distinguish +between different compilers for different languages (think g++ and +gcc) - the right toolchain is chosen automatically based on input +language names (which are, in turn, determined from file +extensions). If you want to force files ending with ".c" to compile as +C++, use the -x option, just like you would do it with gcc:

+
+$ llvmc2 -x c hello.cpp
+$ # hello.cpp is really a C file
+$ ./a.out
+hello
+
+

On the other hand, when using LLVMC as a linker to combine several C++ +object files you should provide the --linker option since it's +impossible for LLVMC to choose the right linker in that case:

+
+$ llvmc2 -c hello.cpp
+$ llvmc2 hello.o
+[A lot of link-time errors skipped]
+$ llvmc2 --linker=c++ hello.o
+$ ./a.out
+hello
+
- - - -
-

This section of the document describes the configuration files used by - llvmc. Configuration information is relatively static for a - given release of LLVM and a compiler tool. However, the details may - change from release to release of either. Users are encouraged to simply use - the various options of the llvmc command and ignore the configuration - of the tool. These configuration files are for compiler writers and LLVM - developers. Those wishing to simply use llvmc don't need to understand - this section but it may be instructive on how the tool works.

+ +

LLVMC has some built-in options that can't be overridden in the +configuration files:

+
    +
  • -o FILE - Output file name.
  • +
  • -x LANGUAGE - Specify the language of the following input files +until the next -x option.
  • +
  • -v - Enable verbose mode, i.e. print out all executed commands.
  • +
  • --view-graph - Show a graphical representation of the compilation +graph. Requires that you have dot and gv commands +installed. Hidden option, useful for debugging.
  • +
  • --write-graph - Write a compilation-graph.dot file in the +current directory with the compilation graph description in the +Graphviz format. Hidden option, useful for debugging.
  • +
  • --save-temps - Write temporary files to the current directory +and do not delete them on exit. Hidden option, useful for debugging.
  • +
  • --help, --help-hidden, --version - These options have +their standard meaning.
  • +
- - -
Overview
-

llvmc is highly configurable both on the command line and in -configuration files. The options it understands are generic, consistent and -simple by design. Furthermore, the llvmc options apply to the -compilation of any LLVM enabled programming language. To be enabled as a -supported source language compiler, a compiler writer must provide a -configuration file that tells llvmc how to invoke the compiler -and what its capabilities are. The purpose of the configuration files then -is to allow compiler writers to specify to llvmc how the compiler -should be invoked. Users may but are not advised to alter the compiler's -llvmc configuration.

- -

Because llvmc just invokes other programs, it must deal with the -available command line options for those programs regardless of whether they -were written for LLVM or not. Furthermore, not all compiler tools will -have the same capabilities. Some compiler tools will simply generate LLVM assembly -code, others will be able to generate fully optimized byte code. In general, -llvmc doesn't make any assumptions about the capabilities or command -line options of a sub-tool. It simply uses the details found in the -configuration files and leaves it to the compiler writer to specify the -configuration correctly.

- -

This approach means that new compiler tools can be up and working very -quickly. As a first cut, a tool can simply compile its source to raw -(unoptimized) bytecode or LLVM assembly and llvmc can be configured -to pick up the slack (translate LLVM assembly to bytecode, optimize the -bytecode, generate native assembly, link, etc.). In fact, the compiler tools -need not use any LLVM libraries, and it could be written in any language -(instead of C++). The configuration data will allow the full range of -optimization, assembly, and linking capabilities that LLVM provides to be added -to these kinds of tools. Enabling the rapid development of front-ends is one -of the primary goals of llvmc.

- -

As a compiler tool matures, it may utilize the LLVM libraries and tools -to more efficiently produce optimized bytecode directly in a single compilation -and optimization program. In these cases, multiple tools would not be needed -and the configuration data for the compiler would change.

- -

Configuring llvmc to the needs and capabilities of a source language -compiler is relatively straight-forward. A compiler writer must provide a -definition of what to do for each of the five compilation phases for each of -the optimization levels. The specification consists simply of prototypical -command lines into which llvmc can substitute command line -arguments and file names. Note that any given phase can be completely blank if -the source language's compiler combines multiple phases into a single program. -For example, quite often pre-processing, translation, and optimization are -combined into a single program. The specification for such a compiler would have -blank entries for pre-processing and translation but a full command line for -optimization.

-
- - - - -
-

Each configuration file provides the details for a single source language - that is to be compiled. This configuration information tells llvmc - how to invoke the language's pre-processor, translator, optimizer, assembler - and linker. Note that a given source language needn't provide all these tools - as many of them exist in llvm currently.

+ +

At the time of writing LLVMC does not support on-the-fly reloading of +configuration, so to customize LLVMC you'll have to recompile the +source code (which lives under $LLVM_DIR/tools/llvmc2). The +default configuration files are Common.td (contains common +definitions, don't forget to include it in your configuration +files), Tools.td (tool descriptions) and Graph.td (compilation +graph definition).

+

To compile LLVMC with your own configuration file (say,``MyGraph.td``), +run make like this:

+
+$ cd $LLVM_DIR/tools/llvmc2
+$ make GRAPH=MyGraph.td TOOLNAME=my_llvmc
+
+

This will build an executable named my_llvmc. There are also +several sample configuration files in the llvmc2/examples +subdirectory that should help to get you started.

+

Internally, LLVMC stores information about possible source +transformations in form of a graph. Nodes in this graph represent +tools, and edges between two nodes represent a transformation path. A +special "root" node is used to mark entry points for the +transformations. LLVMC also assigns a weight to each edge (more on +this later) to choose between several alternative edges.

+

The definition of the compilation graph (see file Graph.td) is +just a list of edges:

+
+def CompilationGraph : CompilationGraph<[
+    Edge<root, llvm_gcc_c>,
+    Edge<root, llvm_gcc_assembler>,
+    ...
+
+    Edge<llvm_gcc_c, llc>,
+    Edge<llvm_gcc_cpp, llc>,
+    ...
+
+    OptionalEdge<llvm_gcc_c, opt, [(switch_on "opt")]>,
+    OptionalEdge<llvm_gcc_cpp, opt, [(switch_on "opt")]>,
+    ...
+
+    OptionalEdge<llvm_gcc_assembler, llvm_gcc_cpp_linker,
+        (case (input_languages_contain "c++"), (inc_weight),
+              (or (parameter_equals "linker", "g++"),
+                  (parameter_equals "linker", "c++")), (inc_weight))>,
+    ...
+
+    ]>;
+
+

As you can see, the edges can be either default or optional, where +optional edges are differentiated by sporting a case expression +used to calculate the edge's weight.

+

The default edges are assigned a weight of 1, and optional edges get a +weight of 0 + 2*N where N is the number of tests that evaluated to +true in the case expression. It is also possible to provide an +integer parameter to inc_weight and dec_weight - in this case, +the weight is increased (or decreased) by the provided value instead +of the default 2.

+

When passing an input file through the graph, LLVMC picks the edge +with the maximum weight. To avoid ambiguity, there should be only one +default edge between two nodes (with the exception of the root node, +which gets a special treatment - there you are allowed to specify one +default edge per language).

+

To get a visual representation of the compilation graph (useful for +debugging), run llvmc2 --view-graph. You will need dot and +gsview installed for this to work properly.

-
-

llvmc always looks for files of a specific name. It uses the - first file with the name its looking for by searching directories in the - following order:
-

    -
  1. Any directory specified by the -config-dir option will be - checked first.
  2. -
  3. If the environment variable LLVM_CONFIG_DIR is set, and it contains - the name of a valid directory, that directory will be searched next.
  4. -
  5. If the user's home directory (typically /home/user contains - a sub-directory named .llvm and that directory contains a - sub-directory named etc then that directory will be tried - next.
  6. -
  7. If the LLVM installation directory (typically /usr/local/llvm - contains a sub-directory named etc then that directory will be - tried last.
  8. -
  9. A standard "system" directory will be searched next. This is typically - /etc/llvm on UNIX™ and C:\WINNT on Microsoft - Windows™.
  10. -
  11. If the configuration file sought still can't be found, llvmc - will print an error message and exit.
  12. -
-

The first file found in this search will be used. Other files with the - same name will be ignored even if they exist in one of the subsequent search - locations.

+ +

As was said earlier, nodes in the compilation graph represent tools, +which are described separately. A tool definition looks like this +(taken from the Tools.td file):

+
+def llvm_gcc_cpp : Tool<[
+    (in_language "c++"),
+    (out_language "llvm-assembler"),
+    (output_suffix "bc"),
+    (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"),
+    (sink)
+    ]>;
+
+

This defines a new tool called llvm_gcc_cpp, which is an alias for +llvm-g++. As you can see, a tool definition is just a list of +properties; most of them should be self-explanatory. The sink +property means that this tool should be passed all command-line +options that lack explicit descriptions.

+

The complete list of the currently implemented tool properties follows:

+
    +
  • Possible tool properties:
      +
    • in_language - input language name.
    • +
    • out_language - output language name.
    • +
    • output_suffix - output file suffix.
    • +
    • cmd_line - the actual command used to run the tool. You can +use $INFILE and $OUTFILE variables, output redirection +with >, hook invocations ($CALL), environment variables +(via $ENV) and the case construct (more on this below).
    • +
    • join - this tool is a "join node" in the graph, i.e. it gets a +list of input files and joins them together. Used for linkers.
    • +
    • sink - all command-line options that are not handled by other +tools are passed to this tool.
    • +
    +
  • +
+

The next tool definition is slightly more complex:

+
+def llvm_gcc_linker : Tool<[
+    (in_language "object-code"),
+    (out_language "executable"),
+    (output_suffix "out"),
+    (cmd_line "llvm-gcc $INFILE -o $OUTFILE"),
+    (join),
+    (prefix_list_option "L", (forward),
+                        (help "add a directory to link path")),
+    (prefix_list_option "l", (forward),
+                        (help "search a library when linking")),
+    (prefix_list_option "Wl", (unpack_values),
+                        (help "pass options to linker"))
+    ]>;
+
+

This tool has a "join" property, which means that it behaves like a +linker. This tool also defines several command-line options: -l, +-L and -Wl which have their usual meaning. An option has two +attributes: a name and a (possibly empty) list of properties. All +currently implemented option types and properties are described below:

+
    +
  • Possible option types:

    +
    +
      +
    • switch_option - a simple boolean switch, for example -time.
    • +
    • parameter_option - option that takes an argument, for example +-std=c99;
    • +
    • parameter_list_option - same as the above, but more than one +occurence of the option is allowed.
    • +
    • prefix_option - same as the parameter_option, but the option name +and parameter value are not separated.
    • +
    • prefix_list_option - same as the above, but more than one +occurence of the option is allowed; example: -lm -lpthread.
    • +
    • alias_option - a special option type for creating +aliases. Unlike other option types, aliases are not allowed to +have any properties besides the aliased option name. Usage +example: (alias_option "preprocess", "E")
    • +
    +
    +
  • +
  • Possible option properties:

    +
    +
      +
    • append_cmd - append a string to the tool invocation command.
    • +
    • forward - forward this option unchanged.
    • +
    • output_suffix - modify the output suffix of this +tool. Example : (switch "E", (output_suffix "i").
    • +
    • stop_compilation - stop compilation after this phase.
    • +
    • unpack_values - used for for splitting and forwarding +comma-separated lists of options, e.g. -Wa,-foo=bar,-baz is +converted to -foo=bar -baz and appended to the tool invocation +command.
    • +
    • help - help string associated with this option. Used for +--help output.
    • +
    • required - this option is obligatory.
    • +
    +
    +
  • +
- -
-

In the directories searched, each configuration file is given a specific - name to foster faster lookup (so llvmc doesn't have to do directory searches). - The name of a given language specific configuration file is simply the same - as the suffix used to identify files containing source in that language. - For example, a configuration file for C++ source might be named - cpp, C, or cxx. For languages that support multiple - file suffixes, multiple (probably identical) files (or symbolic links) will - need to be provided.

+ +

It can be handy to have all information about options gathered in a +single place to provide an overview. This can be achieved by using a +so-called OptionList:

+
+def Options : OptionList<[
+(switch_option "E", (help "Help string")),
+(alias_option "quiet", "q")
+...
+]>;
+
+

OptionList is also a good place to specify option aliases.

+

Tool-specific option properties like append_cmd have (obviously) +no meaning in the context of OptionList, so the only properties +allowed there are help and required.

+

Option lists are used at the file scope. See file +examples/Clang.td for an example of OptionList usage.

- -
-

Which configuration files are read depends on the command line options and - the suffixes of the file names provided on llvmc's command line. Note - that the -x LANGUAGE option alters the language that llvmc - uses for the subsequent files on the command line. Only the configuration - files actually needed to complete llvmc's task are read. Other - language specific files will be ignored.

+ +

Normally, LLVMC executes programs from the system PATH. Sometimes, +this is not sufficient: for example, we may want to specify tool names +in the configuration file. This can be achieved via the mechanism of +hooks - to compile LLVMC with your hooks, just drop a .cpp file into +tools/llvmc2 directory. Hooks should live in the hooks +namespace and have the signature std::string hooks::MyHookName +(void). They can be used from the cmd_line tool property:

+
+(cmd_line "$CALL(MyHook)/path/to/file -o $CALL(AnotherHook)")
+
+

It is also possible to use environment variables in the same manner:

+
+(cmd_line "$ENV(VAR1)/path/to/file -o $ENV(VAR2)")
+
+

To change the command line string based on user-provided options use +the case expression (documented below):

+
+(cmd_line
+  (case
+    (switch_on "E"),
+       "llvm-g++ -E -x c $INFILE -o $OUTFILE",
+    (default),
+       "llvm-g++ -c -x c $INFILE -o $OUTFILE -emit-llvm"))
+
- - -
Syntax
-

The syntax of the configuration files is very simple and somewhat - compatible with Java's property files. Here are the syntax rules:

-
    -
  • The file encoding is ASCII.
  • -
  • The file is line oriented. There should be one configuration definition - per line. Lines are terminated by the newline (0x0A) and/or carriage return - characters (0x0D)
  • -
  • A backslash (\) before a newline causes the newline to be - ignored. This is useful for line continuation of long definitions. A - backslash anywhere else is recognized as a backslash.
  • -
  • A configuration item consists of a name, an = and a value.
  • -
  • A name consists of a sequence of identifiers separated by period.
  • -
  • An identifier consists of specific keywords made up of only lower case - and upper case letters (e.g. lang.name).
  • -
  • Values come in four flavors: booleans, integers, commands and - strings.
  • -
  • Valid "false" boolean values are false False FALSE no No NO - off Off and OFF.
  • -
  • Valid "true" boolean values are true True TRUE yes Yes YES - on On and ON.
  • -
  • Integers are simply sequences of digits.
  • -
  • Commands start with a program name and are followed by a sequence of - words that are passed to that program as command line arguments. Program - arguments that begin and end with the % sign will have their value - substituted. Program names beginning with / are considered to be - absolute. Otherwise the PATH will be applied to find the program to - execute.
  • -
  • Strings are composed of multiple sequences of characters from the - character class [-A-Za-z0-9_:%+/\\|,] separated by white - space.
  • -
  • White space on a line is folded. Multiple blanks or tabs will be - reduced to a single blank.
  • -
  • White space before the configuration item's name is ignored.
  • -
  • White space on either side of the = is ignored.
  • -
  • White space in a string value is used to separate the individual - components of the string value but otherwise ignored.
  • -
  • Comments are introduced by the # character. Everything after a - # and before the end of line is ignored.
  • -
+ +

The 'case' construct can be used to calculate weights of the optional +edges and to choose between several alternative command line strings +in the cmd_line tool property. It is designed after the +similarly-named construct in functional languages and takes the form +(case (test_1), statement_1, (test_2), statement_2, ... (test_N), +statement_N). The statements are evaluated only if the corresponding +tests evaluate to true.

+

Examples:

+
+// Increases edge weight by 5 if "-A" is provided on the
+// command-line, and by 5 more if "-B" is also provided.
+(case
+    (switch_on "A"), (inc_weight 5),
+    (switch_on "B"), (inc_weight 5))
+
+// Evaluates to "cmdline1" if option "-A" is provided on the
+// command line, otherwise to "cmdline2"
+(case
+    (switch_on "A"), "cmdline1",
+    (switch_on "B"), "cmdline2",
+    (default), "cmdline3")
+
+

Note the slight difference in 'case' expression handling in contexts +of edge weights and command line specification - in the second example +the value of the "B" switch is never checked when switch "A" is +enabled, and the whole expression always evaluates to "cmdline1" in +that case.

+

Case expressions can also be nested, i.e. the following is legal:

+
+(case (switch_on "E"), (case (switch_on "o"), ..., (default), ...)
+      (default), ...)
+
+

You should, however, try to avoid doing that because it hurts +readability. It is usually better to split tool descriptions and/or +use TableGen inheritance instead.

+
    +
  • Possible tests are:
      +
    • switch_on - Returns true if a given command-line option is +provided by the user. Example: (switch_on "opt"). Note that +you have to define all possible command-line options separately in +the tool descriptions. See the next doc_text for the discussion of +different kinds of command-line options.
    • +
    • parameter_equals - Returns true if a command-line parameter equals +a given value. Example: (parameter_equals "W", "all").
    • +
    • element_in_list - Returns true if a command-line parameter list +includes a given value. Example: (parameter_in_list "l", "pthread").
    • +
    • input_languages_contain - Returns true if a given language +belongs to the current input language set. Example: +`(input_languages_contain "c++").
    • +
    • in_language - Evaluates to true if the language of the input +file equals to the argument. Valid only when using case +expression in a cmd_line tool property. Example: +`(in_language "c++").
    • +
    • not_empty - Returns true if a given option (which should be +either a parameter or a parameter list) is set by the +user. Example: `(not_empty "o").
    • +
    • default - Always evaluates to true. Should always be the last +test in the case expression.
    • +
    • and - A standard logical combinator that returns true iff all +of its arguments return true. Used like this: (and (test1), +(test2), ... (testN)). Nesting of and and or is allowed, +but not encouraged.
    • +
    • or - Another logical combinator that returns true only if any +one of its arguments returns true. Example: (or (test1), +(test2), ... (testN)).
    • +
    +
  • +
- - -
-

The table below provides definitions of the allowed configuration items - that may appear in a configuration file. Every item has a default value and - does not need to appear in the configuration file. Missing items will have the - default value. Each identifier may appear as all lower case, first letter - capitalized or all upper case.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
NameValue TypeDescriptionDefault

LLVMC ITEMS

versionstringProvides the version string for the contents of this - configuration file. What is accepted as a legal configuration file - will change over time and this item tells llvmc which version - should be expected.b

LANG ITEMS

lang.namestringProvides the common name for a language definition. - For example "C++", "Pascal", "FORTRAN", etc.blank
lang.opt1stringSpecifies the parameters to give the optimizer when - -O1 is specified on the llvmc command line.-simplifycfg -instcombine -mem2reg
lang.opt2stringSpecifies the parameters to give the optimizer when - -O2 is specified on the llvmc command line.TBD
lang.opt3stringSpecifies the parameters to give the optimizer when - -O3 is specified on the llvmc command line.TBD
lang.opt4stringSpecifies the parameters to give the optimizer when - -O4 is specified on the llvmc command line.TBD
lang.opt5stringSpecifies the parameters to give the optimizer when - -O5 is specified on the llvmc command line.TBD

PREPROCESSOR ITEMS

preprocessor.commandcommandThis provides the command prototype that will be used - to run the preprocessor. This is generally only used with the - -E option.<blank>
preprocessor.requiredbooleanThis item specifies whether the pre-processing phase - is required by the language. If the value is true, then the - preprocessor.command value must not be blank. With this option, - llvmc will always run the preprocessor as it assumes that the - translation and optimization phases don't know how to pre-process their - input.false

TRANSLATOR ITEMS

translator.commandcommandThis provides the command prototype that will be used - to run the translator. Valid substitutions are %in% for the - input file and %out% for the output file.<blank>
translator.outputbytecode or assemblyThis item specifies the kind of output the language's - translator generates.bytecode
translator.preprocessesbooleanIndicates that the translator also preprocesses. If - this is true, then llvmc will skip the pre-processing phase - whenever the final phase is not pre-processing.false

OPTIMIZER ITEMS

optimizer.commandcommandThis provides the command prototype that will be used - to run the optimizer. Valid substitutions are %in% for the - input file and %out% for the output file.<blank>
optimizer.outputbytecode or assemblyThis item specifies the kind of output the language's - optimizer generates. Valid values are "assembly" and "bytecode" - bytecode
optimizer.preprocessesbooleanIndicates that the optimizer also preprocesses. If - this is true, then llvmc will skip the pre-processing phase - whenever the final phase is optimization or later.false
optimizer.translatesbooleanIndicates that the optimizer also translates. If - this is true, then llvmc will skip the translation phase - whenever the final phase is optimization or later.false

ASSEMBLER ITEMS

assembler.commandcommandThis provides the command prototype that will be used - to run the assembler. Valid substitutions are %in% for the - input file and %out% for the output file.<blank>
+ +

One last thing that you will need to modify when adding support for a +new language to LLVMC is the language map, which defines mappings from +file extensions to language names. It is used to choose the proper +toolchain(s) for a given input file set. Language map definition is +located in the file Tools.td and looks like this:

+
+def LanguageMap : LanguageMap<
+    [LangToSuffixes<"c++", ["cc", "cp", "cxx", "cpp", "CPP", "c++", "C"]>,
+     LangToSuffixes<"c", ["c"]>,
+     ...
+    ]>;
+
- - -
-

On any configruation item that ends in command, you must - specify substitution tokens. Substitution tokens begin and end with a percent - sign (%) and are replaced by the corresponding text. Any substitution - token may be given on any command line but some are more useful than - others. In particular each command should have both an %in% - and an %out% substittution. The table below provides definitions of - each of the allowed substitution tokens.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Substitution TokenReplacement Description
%args%Replaced with all the tool-specific arguments given - to llvmc via the -T set of options. This just allows - you to place these arguments in the correct place on the command line. - If the %args% option does not appear on your command line, - then you are explicitly disallowing the -T option for your - tool. -
%force%Replaced with the -f option if it was - specified on the llvmc command line. This is intended to tell - the compiler tool to force the overwrite of output files. -
%in%Replaced with the full path of the input file. You - needn't worry about the cascading of file names. llvmc will - create temporary files and ensure that the output of one phase is the - input to the next phase.
%opt%Replaced with the optimization options for the - tool. If the tool understands the -O options then that will - be passed. Otherwise, the lang.optN series of configuration - items will specify which arguments are to be given.
%out%Replaced with the full path of the output file. - Note that this is not necessarily the output file specified with the - -o option on llvmc's command line. It might be a - temporary file that will be passed to a subsequent phase's input. -
%stats%If your command accepts the -stats option, - use this substitution token. If the user requested -stats - from the llvmc command line then this token will be replaced - with -stats, otherwise it will be ignored. -
%target%Replaced with the name of the target "machine" for - which code should be generated. The value used here is taken from the - llvmc option -march. -
%time%If your command accepts the -time-passes - option, use this substitution token. If the user requested - -time-passes from the llvmc command line then this - token will be replaced with -time-passes, otherwise it will - be ignored. -
+ + + + + + +
[1]TableGen Fundamentals +http://llvm.cs.uiuc.edu/docs/TableGenFundamentals.html
- - - -
-

Since an example is always instructive, here's how the Stacker language - configuration file looks.

-

-# Stacker Configuration File For llvmc
-
-##########################################################
-# Language definitions
-##########################################################
-  lang.name=Stacker 
-  lang.opt1=-simplifycfg -instcombine -mem2reg
-  lang.opt2=-simplifycfg -instcombine -mem2reg -load-vn \
-    -gcse -dse -scalarrepl -sccp 
-  lang.opt3=-simplifycfg -instcombine -mem2reg -load-vn \
-    -gcse -dse -scalarrepl -sccp -branch-combine -adce \
-    -globaldce -inline -licm 
-  lang.opt4=-simplifycfg -instcombine -mem2reg -load-vn \
-    -gcse -dse -scalarrepl -sccp -ipconstprop \
-    -branch-combine -adce -globaldce -inline -licm 
-  lang.opt5=-simplifycfg -instcombine -mem2reg --load-vn \
-    -gcse -dse scalarrepl -sccp -ipconstprop \
-    -branch-combine -adce -globaldce -inline -licm \
-    -block-placement
-
-##########################################################
-# Pre-processor definitions
-##########################################################
-
-  # Stacker doesn't have a preprocessor but the following
-  # allows the -E option to be supported
-  preprocessor.command=cp %in% %out%
-  preprocessor.required=false
-
-##########################################################
-# Translator definitions
-##########################################################
-
-  # To compile stacker source, we just run the stacker
-  # compiler with a default stack size of 2048 entries.
-  translator.command=stkrc -s 2048 %in% -o %out% %time% \
-    %stats% %force% %args%
-
-  # stkrc doesn't preprocess but we set this to true so
-  # that we don't run the cp command by default.
-  translator.preprocesses=true
-
-  # The translator is required to run.
-  translator.required=true
-
-  # stkrc doesn't handle the -On options
-  translator.output=bytecode
-
-##########################################################
-# Optimizer definitions
-##########################################################
-  
-  # For optimization, we use the LLVM "opt" program
-  optimizer.command=opt %in% -o %out% %opt% %time% %stats% \
-    %force% %args%
-
-  optimizer.required = true
-
-  # opt doesn't translate
-  optimizer.translates = no
-
-  # opt doesn't preprocess
-  optimizer.preprocesses=no
-
-  # opt produces bytecode
-  optimizer.output = bc
-
-##########################################################
-# Assembler definitions
-##########################################################
-  assembler.command=llc %in% -o %out% %target% %time% %stats%
-
- - - - - -
-

This document uses precise terms in reference to the various artifacts and - concepts related to compilation. The terms used throughout this document are - defined below.

-
-
assembly
-
A compilation phase in which LLVM bytecode or - LLVM assembly code is assembled to a native code format (either target - specific aseembly language or the platform's native object file format). -
- -
compiler
-
Refers to any program that can be invoked by llvmc to accomplish - the work of one or more compilation phases.
- -
driver
-
Refers to llvmc itself.
- -
linking
-
A compilation phase in which LLVM bytecode files - and (optionally) native system libraries are combined to form a complete - executable program.
- -
optimization
-
A compilation phase in which LLVM bytecode is - optimized.
- -
phase
-
Refers to any one of the five compilation phases that that - llvmc supports. The five phases are: - preprocessing, - translation, - optimization, - assembly, - linking.
- -
source language
-
Any common programming language (e.g. C, C++, Java, Stacker, ML, - FORTRAN). These languages are distinguished from any of the lower level - languages (such as LLVM or native assembly), by the fact that a - translation phase - is required before LLVM can be applied.
- -
tool
-
Refers to any program in the LLVM tool set.
- -
translation
-
A compilation phase in which - source language code is translated into - either LLVM assembly language or LLVM bytecode.
-
- -
-
Valid CSS!Valid HTML 4.01!Reid Spencer
-The LLVM Compiler Infrastructure
-Last modified: $Date$ +
+
+ Valid CSS! + Valid XHTML 1.0! + The LLVM Compiler Infrastructure
+ Last modified: $Date$
-