X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FCompilerDriver.html;h=dd7526753df6e7b0e33e8b42416e74390c695cc2;hb=52bb2db70998c42c99d22069ac66eb7bbb492f3a;hp=a5ba1a68542493548c7b5be3c906645b9407f897;hpb=b1254a124796a77f88c6d5adc37e2e324d210bc2;p=oota-llvm.git diff --git a/docs/CompilerDriver.html b/docs/CompilerDriver.html index a5ba1a68542..dd7526753df 100644 --- a/docs/CompilerDriver.html +++ b/docs/CompilerDriver.html @@ -1,572 +1,420 @@ - - + + + - - The LLVM Compiler Driver (llvmc) - - - - + + +Customizing LLVMC: Reference Manual + -
The LLVM Compiler Driver (llvmc)
-

NOTE: This document is a work in progress!

-
    -
  1. Abstract
  2. -
  3. Introduction -
      -
    1. Purpose
    2. -
    3. Operation
    4. -
    5. Phases
    6. -
    7. Actions
    8. -
    -
  4. -
  5. Details -
  6. Configuration -
  7. Glossary -
-
-

Written by Reid Spencer -

-
+
- -
Abstract
- -
-

This document describes the requirements, design, and configuration of the - LLVM compiler driver, llvmc. The compiler driver knows about LLVM's - tool set and can be configured to know about a variety of compilers for - source languages. It uses this knowledge to execute the tools necessary - to accomplish general compilation, optimization, and linking tasks. The main - purpose of llvmc is to provide a simple and consistent interface to - all compilation tasks. This reduces the burden on the end user who can just - learn to use llvmc instead of the entire LLVM tool set and all the - source language compilers compatible with LLVM.

-
- -
Introduction
- -
-

The llvmc tool is a configurable compiler - driver. As such, it isn't the compiler, optimizer, - or linker itself but it drives (invokes) other software that perform those - tasks. If you are familiar with the GNU Compiler Collection's gcc - tool, llvmc is very similar.

-

The following introductory sections will help you understand why this tool - is necessary and what it does.

-
+
Customizing LLVMC: Reference Manual
- -
Purpose
-
-

llvmc was invented to make compilation with LLVM based compilers - easier. To accomplish this, llvmc strives to:

- -

Additionally, llvmc makes it easier to write a compiler for use - with LLVM, because it:

-

+ +

LLVMC tries hard to be as compatible with gcc as possible, +although there are some small differences. Most of the time, however, +you shouldn't be able to notice them:

+
+$ # This works as expected:
+$ llvmc2 -O3 -Wall hello.cpp
+$ ./a.out
+hello
+
+

One nice feature of LLVMC is that one doesn't have to distinguish +between different compilers for different languages (think g++ and +gcc) - the right toolchain is chosen automatically based on input +language names (which are, in turn, determined from file +extensions). If you want to force files ending with ".c" to compile as +C++, use the -x option, just like you would do it with gcc:

+
+$ llvmc2 -x c hello.cpp
+$ # hello.cpp is really a C file
+$ ./a.out
+hello
+
+

On the other hand, when using LLVMC as a linker to combine several C++ +object files you should provide the --linker option since it's +impossible for LLVMC to choose the right linker in that case:

+
+$ llvmc2 -c hello.cpp
+$ llvmc2 hello.o
+[A lot of link-time errors skipped]
+$ llvmc2 --linker=c++ hello.o
+$ ./a.out
+hello
+
- - -
Details
-
+ +

LLVMC has some built-in options that can't be overridden in the +configuration files:

+
- - -
Configuration
-
-

This section of the document describes the configuration files used by - llvmc. Configuration information is relatively static for a - given release of LLVM and a front end compiler. However, the details may - change from release to release of either. Users are encouraged to simply use - the various options of the B command and ignore the configuration of - the tool. These configuration files are for compiler writers and LLVM - developers. Those wishing to simply use B don't need to understand - this section but it may be instructive on how the tool works.

+ +

At the time of writing LLVMC does not support on-the-fly reloading of +configuration, so to customize LLVMC you'll have to recompile the +source code (which lives under $LLVM_DIR/tools/llvmc2). The +default configuration files are Common.td (contains common +definitions, don't forget to include it in your configuration +files), Tools.td (tool descriptions) and Graph.td (compilation +graph definition).

+

To compile LLVMC with your own configuration file (say,``MyGraph.td``), +run make like this:

+
+$ cd $LLVM_DIR/tools/llvmc2
+$ make GRAPH=MyGraph.td TOOLNAME=my_llvmc
+
+

This will build an executable named my_llvmc. There are also +several sample configuration files in the llvmc2/examples +subdirectory that should help to get you started.

+

Internally, LLVMC stores information about possible source +transformations in form of a graph. Nodes in this graph represent +tools, and edges between two nodes represent a transformation path. A +special "root" node is used to mark entry points for the +transformations. LLVMC also assigns a weight to each edge (more on +this later) to choose between several alternative edges.

+

The definition of the compilation graph (see file Graph.td) is +just a list of edges:

+
+def CompilationGraph : CompilationGraph<[
+    Edge<root, llvm_gcc_c>,
+    Edge<root, llvm_gcc_assembler>,
+    ...
+
+    Edge<llvm_gcc_c, llc>,
+    Edge<llvm_gcc_cpp, llc>,
+    ...
+
+    OptionalEdge<llvm_gcc_c, opt, [(switch_on "opt")]>,
+    OptionalEdge<llvm_gcc_cpp, opt, [(switch_on "opt")]>,
+    ...
+
+    OptionalEdge<llvm_gcc_assembler, llvm_gcc_cpp_linker,
+        (case (input_languages_contain "c++"), (inc_weight),
+              (or (parameter_equals "linker", "g++"),
+                  (parameter_equals "linker", "c++")), (inc_weight))>,
+    ...
+
+    ]>;
+
+

As you can see, the edges can be either default or optional, where +optional edges are differentiated by sporting a case expression +used to calculate the edge's weight.

+

The default edges are assigned a weight of 1, and optional edges get a +weight of 0 + 2*N where N is the number of tests that evaluated to +true in the case expression. It is also possible to provide an +integer parameter to inc_weight and dec_weight - in this case, +the weight is increased (or decreased) by the provided value instead +of the default 2.

+

When passing an input file through the graph, LLVMC picks the edge +with the maximum weight. To avoid ambiguity, there should be only one +default edge between two nodes (with the exception of the root node, +which gets a special treatment - there you are allowed to specify one +default edge per language).

+

To get a visual representation of the compilation graph (useful for +debugging), run llvmc2 --view-graph. You will need dot and +gsview installed for this to work properly.

- - -
Overview
-

llvmc is highly configurable both on the command line and in -configuration files. The options it understands are generic, consistent and -simple by design. Furthermore, the llvmc options apply to the -compilation of any LLVM enabled programming language. To be enabled as a -supported source language compiler, a compiler writer must provide a -configuration file that tells llvmc how to invoke the compiler -and what its capabilities are. The purpose of the configuration files then -is to allow compiler writers to specify to llvmc how the compiler -should be invoked. Users may but are not advised to alter the compiler's -llvmc configuration.

- -

Because llvmc just invokes other programs, it must deal with the -available command line options for those programs regardless of whether they -were written for LLVM or not. Furthermore, not all compilation front ends will -have the same capabilities. Some front ends will simply generate LLVM assembly -code, others will be able to generate fully optimized byte code. In general, -llvmc doesn't make any assumptions about the capabilities or command -line options of a sub-tool. It simply uses the details found in the configuration -files and leaves it to the compiler writer to specify the configuration -correctly.

- -

This approach means that new compiler front ends can be up and working very -quickly. As a first cut, a front end can simply compile its source to raw -(unoptimized) bytecode or LLVM assembly and llvmc can be configured -to pick up the slack (translate LLVM assembly to bytecode, optimize the -bytecode, generate native assembly, link, etc.). In fact, the front end need -not use any LLVM libraries, and it could be written in any language (instead of -C++). The configuration data will allow the full range of optimization, -assembly, and linking capabilities that LLVM provides to be added to these kinds -of tools. Enabling the rapid development of front-ends is one of the primary -goals of llvmc.

- -

As a compiler front end matures, it may utilize the LLVM libraries and tools -to more efficiently produce optimized bytecode directly in a single compilation -and optimization program. In these cases, multiple tools would not be needed -and the configuration data for the compiler would change.

- -

Configuring llvmc to the needs and capabilities of a source language -compiler is relatively straight forward. A compiler writer must provide a -definition of what to do for each of the five compilation phases for each of -the optimization levels. The specification consists simply of prototypical -command lines into which llvmc can substitute command line -arguments and file names. Note that any given phase can be completely blank if -the source language's compiler combines multiple phases into a single program. -For example, quite often pre-processing, translation, and optimization are -combined into a single program. The specification for such a compiler would have -blank entries for pre-processing and translation but a full command line for -optimization.

+ +

As was said earlier, nodes in the compilation graph represent tools, +which are described separately. A tool definition looks like this +(taken from the Tools.td file):

+
+def llvm_gcc_cpp : Tool<[
+    (in_language "c++"),
+    (out_language "llvm-assembler"),
+    (output_suffix "bc"),
+    (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"),
+    (sink)
+    ]>;
+
+

This defines a new tool called llvm_gcc_cpp, which is an alias for +llvm-g++. As you can see, a tool definition is just a list of +properties; most of them should be self-explanatory. The sink +property means that this tool should be passed all command-line +options that lack explicit descriptions.

+

The complete list of the currently implemented tool properties follows:

+ +

The next tool definition is slightly more complex:

+
+def llvm_gcc_linker : Tool<[
+    (in_language "object-code"),
+    (out_language "executable"),
+    (output_suffix "out"),
+    (cmd_line "llvm-gcc $INFILE -o $OUTFILE"),
+    (join),
+    (prefix_list_option "L", (forward),
+                        (help "add a directory to link path")),
+    (prefix_list_option "l", (forward),
+                        (help "search a library when linking")),
+    (prefix_list_option "Wl", (unpack_values),
+                        (help "pass options to linker"))
+    ]>;
+
+

This tool has a "join" property, which means that it behaves like a +linker. This tool also defines several command-line options: -l, +-L and -Wl which have their usual meaning. An option has two +attributes: a name and a (possibly empty) list of properties. All +currently implemented option types and properties are described below:

+
- - -
Configuration Files
-

Types of Files

-

There are two types of configuration files: the master configuration file - and the language specific configuration file. The master configuration file - contains the general configuration of llvmc itself and is supplied - with the tool. It contains information that is source language agnostic. - Language specific configuration files tell llvmc how to invoke the - language's compiler for a variety of different tasks and what other tools - are needed to backfill the compiler's missing features (e.g. - optimization).

- -

Directory Search

-

llvmc always looks for files of a specific name. It uses the - first file with the name its looking for by searching directories in the - following order:
-

    -
  1. Any directory specified by the --config-dir option will be - checked first.
  2. -
  3. If the environment variable LLVM_CONFIG_DIR is set, and it contains - the name of a valid directory, that directory will be searched next.
  4. -
  5. If the user's home directory (typically /home/user contains - a sub-directory named .llvm and that directory contains a - sub-directory named etc then that directory will be tried - next.
  6. -
  7. If the LLVM installation directory (typically /usr/local/llvm - contains a sub-directory named etc then that directory will be - tried last.
  8. -
  9. If the configuration file sought still can't be found, llvmc - will print an error message and exit.
  10. -
- The first file found in this search will be used. Other files with the same - name will be ignored even if they exist in one of the subsequent search - locations.

- -

File Names

-

In the directories searched, a file named master will be - recognized as the master configuration file for llvmc. Note that - users may override the master file with a copy in their home directory - but they are advised not to. This capability is only useful for compiler - implementers needing to alter the master configuration while developing - their compiler front end. When reading the configuration files, the master - files are always read first.

-

Language specific configuration files are given specific names to foster - faster lookup. The name of a given language specific configuration file is - the same as the suffix used to identify files containing source in that - language. For example, a configuration file for C++ source might be named - cpp, C, or cxx.

- -

What Gets Read

-

The master configuration file is always read. Which language specific - configuration files are read depends on the command line options and the - suffixes of the file names provided on llvmc's command line. Note - that the --x LANGUAGE option alters the language that llvmc - uses for the subsequent files on the command line. Only the language - specific configuration files actually needed to complete llvmc's - task are read. Other language specific files will be ignored.

+ +

It can be handy to have all information about options gathered in a +single place to provide an overview. This can be achieved by using a +so-called OptionList:

+
+def Options : OptionList<[
+(switch_option "E", (help "Help string")),
+(alias_option "quiet", "q")
+...
+]>;
+
+

OptionList is also a good place to specify option aliases.

+

Tool-specific option properties like append_cmd have (obviously) +no meaning in the context of OptionList, so the only properties +allowed there are help and required.

+

Option lists are used at the file scope. See file +examples/Clang.td for an example of OptionList usage.

- - -
Syntax
-

The syntax of the configuration files is yet to be determined. There are - two viable options remaining:
-

-
- - -
- Master Configuration Items + +

Normally, LLVMC executes programs from the system PATH. Sometimes, +this is not sufficient: for example, we may want to specify tool names +in the configuration file. This can be achieved via the mechanism of +hooks - to compile LLVMC with your hooks, just drop a .cpp file into +tools/llvmc2 directory. Hooks should live in the hooks +namespace and have the signature std::string hooks::MyHookName +(void). They can be used from the cmd_line tool property:

+
+(cmd_line "$CALL(MyHook)/path/to/file -o $CALL(AnotherHook)")
+
+

It is also possible to use environment variables in the same manner:

+
+(cmd_line "$ENV(VAR1)/path/to/file -o $ENV(VAR2)")
+
+

To change the command line string based on user-provided options use +the case expression (documented below):

+
+(cmd_line
+  (case
+    (switch_on "E"),
+       "llvm-g++ -E -x c $INFILE -o $OUTFILE",
+    (default),
+       "llvm-g++ -c -x c $INFILE -o $OUTFILE -emit-llvm"))
+
-
-
-=head3 Section: [lang=I]
-
-This section provides the master configuration data for a given language. The
-language specific data will be found in a file named I.
-
-=over
-
-=item CI
-
-This adds the I specified to the list of recognized suffixes for
-the I identified in the section. As many suffixes as are commonly used
-for source files for the I should be specified. 
-
-=back
-
-=begin html
-
-

For example, the following might appear for C++: -


-[lang=C++]
-suffix=.cpp
-suffix=.cxx
-suffix=.C
-

- -=end html + +

The 'case' construct can be used to calculate weights of the optional +edges and to choose between several alternative command line strings +in the cmd_line tool property. It is designed after the +similarly-named construct in functional languages and takes the form +(case (test_1), statement_1, (test_2), statement_2, ... (test_N), +statement_N). The statements are evaluated only if the corresponding +tests evaluate to true.

+

Examples:

+
+// Increases edge weight by 5 if "-A" is provided on the
+// command-line, and by 5 more if "-B" is also provided.
+(case
+    (switch_on "A"), (inc_weight 5),
+    (switch_on "B"), (inc_weight 5))
+
+// Evaluates to "cmdline1" if option "-A" is provided on the
+// command line, otherwise to "cmdline2"
+(case
+    (switch_on "A"), "cmdline1",
+    (switch_on "B"), "cmdline2",
+    (default), "cmdline3")
 
-
- - -
- Language Specific Configuration Items +

Note the slight difference in 'case' expression handling in contexts +of edge weights and command line specification - in the second example +the value of the "B" switch is never checked when switch "A" is +enabled, and the whole expression always evaluates to "cmdline1" in +that case.

+

Case expressions can also be nested, i.e. the following is legal:

+
+(case (switch_on "E"), (case (switch_on "o"), ..., (default), ...)
+      (default), ...)
+
+

You should, however, try to avoid doing that because it hurts +readability. It is usually better to split tool descriptions and/or +use TableGen inheritance instead.

+
-
-=head3 Section: [general]
-
-=over
-
-=item C
-
-This item specifies whether the language has a pre-processing phase or not. This
-controls whether the B<-E> option works for the language or not.
-
-=item C
-
-This item specifies the kind of output the language's compiler generates. The
-choices are either bytecode (C) or LLVM assembly (C).
-
-=back
-
-=head3 Section: [-O0]
-
-=over
-
-=item CI
-
-This item specifies the I to use for pre-processing the input.
-
-=over
-
-Valid substitutions for this item are:
-
-=item %in%
-
-The input source file.
-
-=item %out%
-
-The output file.
-
-=item %options%
-
-Any pre-processing specific options (e.g. B<-I>).
-
-=back
-
-=item CI
-
-This item specifies the I to use for translating the source
-language input into the output format given by the C item.
-
-=item CI
-
-This item specifies the I for optimizing the translator's output.
-
-=back
+
+

One last thing that you will need to modify when adding support for a +new language to LLVMC is the language map, which defines mappings from +file extensions to language names. It is used to choose the proper +toolchain(s) for a given input file set. Language map definition is +located in the file Tools.td and looks like this:

+
+def LanguageMap : LanguageMap<
+    [LangToSuffixes<"c++", ["cc", "cp", "cxx", "cpp", "CPP", "c++", "C"]>,
+     LangToSuffixes<"c", ["c"]>,
+     ...
+    ]>;
 
- - -
Glossary
-
-

This document uses precise terms in reference to the various artifacts and - concepts related to compilation. The terms used throughout this document are - defined below.

-
-
assembly
-
A compilation phase in which LLVM bytecode or - LLVM assembly code is assembled to a native code format (either target - specific aseembly language or the platform's native object file format). -
- -
compiler
-
Refers to any program that can be invoked by llvmc to accomplish - the work of one or more compilation phases.
- -
driver
-
Refers to llvmc itself.
- -
linking
-
A compilation phase in which LLVM bytecode files - and (optionally) native system libraries are combined to form a complete - executable program.
- -
optimization
-
A compilation phase in which LLVM bytecode is - optimized.
- -
phase
-
Refers to any one of the five compilation phases that that - llvmc supports. The five phases are: - preprocessing, - translation, - optimization, - assembly, - linking.
- -
source language
-
Any common programming language (e.g. C, C++, Java, Stacker, ML, - FORTRAN). These languages are distinguished from any of the lower level - languages (such as LLVM or native assembly), by the fact that a - translation phase - is required before LLVM can be applied.
- -
tool
-
Refers to any program in the LLVM tool set.
- -
translation
-
A compilation phase in which - source language code is translated into - either LLVM assembly language or LLVM bytecode.
-
+ + + + + + +
[1]TableGen Fundamentals +http://llvm.cs.uiuc.edu/docs/TableGenFundamentals.html
+
- -
-
Valid CSS!Valid HTML 4.01!Reid Spencer
-The LLVM Compiler Infrastructure
-Last modified: $Date$ +
+
+ Valid CSS! + Valid XHTML 1.0! + The LLVM Compiler Infrastructure
+ Last modified: $Date$
-