X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FCompilerDriver.html;h=dd7526753df6e7b0e33e8b42416e74390c695cc2;hb=52bb2db70998c42c99d22069ac66eb7bbb492f3a;hp=a5ba1a68542493548c7b5be3c906645b9407f897;hpb=b1254a124796a77f88c6d5adc37e2e324d210bc2;p=oota-llvm.git diff --git a/docs/CompilerDriver.html b/docs/CompilerDriver.html index a5ba1a68542..dd7526753df 100644 --- a/docs/CompilerDriver.html +++ b/docs/CompilerDriver.html @@ -1,572 +1,420 @@ - - + + +
- -NOTE: This document is a work in progress!
-This document describes the requirements, design, and configuration of the - LLVM compiler driver, llvmc. The compiler driver knows about LLVM's - tool set and can be configured to know about a variety of compilers for - source languages. It uses this knowledge to execute the tools necessary - to accomplish general compilation, optimization, and linking tasks. The main - purpose of llvmc is to provide a simple and consistent interface to - all compilation tasks. This reduces the burden on the end user who can just - learn to use llvmc instead of the entire LLVM tool set and all the - source language compilers compatible with LLVM.
-The llvmc tool is a configurable compiler - driver. As such, it isn't the compiler, optimizer, - or linker itself but it drives (invokes) other software that perform those - tasks. If you are familiar with the GNU Compiler Collection's gcc - tool, llvmc is very similar.
-The following introductory sections will help you understand why this tool - is necessary and what it does.
-llvmc was invented to make compilation with LLVM based compilers - easier. To accomplish this, llvmc strives to:
-Additionally, llvmc makes it easier to write a compiler for use - with LLVM, because it:
-Note: This document is a work-in-progress. Additions and clarifications + are welcome.
At a high level, llvmc operation is very simple. The basic action
- taken by llvmc is to simply invoke some tool or set of tools to fill
- the user's request for compilation. Every execution of llvmctakes the
- following sequence of steps:
-
llvmc's operation must be simple, regular and predictable. - Developers need to be able to rely on it to take a consistent approach to - compilation. For example, the invocation:
-- llvmc -O2 x.c y.c z.c -o xyz-
must produce exactly the same results as:
-- llvmc -O2 x.c - llvmc -O2 y.c - llvmc -O2 z.c - llvmc -O2 x.o y.o z.o -o xyz-
To accomplish this, llvmc uses a very simple goal oriented - procedure to do its work. The overall goal is to produce a functioning - executable. To accomplish this, llvmc always attempts to execute a - series of compilation phases in the same sequence. - However, the user's options to llvmc can cause the sequence of phases - to start in the middle or finish early.
+LLVMC is a generic compiler driver, designed to be customizable and +extensible. It plays the same role for LLVM as the gcc program +does for GCC - LLVMC's job is essentially to transform a set of input +files into a set of targets depending on configuration rules and user +options. What makes LLVMC different is that these transformation rules +are completely customizable - in fact, LLVMC knows nothing about the +specifics of transformation (even the command-line options are mostly +not hard-coded) and regards the transformation structure as an +abstract graph. This makes it possible to adapt LLVMC for other +purposes - for example, as a build tool for game resources.
+Because LLVMC employs TableGen [1] as its configuration language, you +need to be familiar with it to customize LLVMC.
+llvmc breaks every compilation task into the following five - distinct phases:
-The following table shows the inputs, outputs, and command line options - applicabe to each phase.
-Phase | -Inputs | -Outputs | -Options | -
---|---|---|---|
Preprocessing | -
|
-
|
-
|
-
Translation | -
|
-
|
-
|
-
Optimization | -
|
-
|
-
|
-
Linking | -
|
-
|
-
|
-
An action, with regard to llvmc is a basic operation that it takes - in order to fulfill the user's request. Each phase of compilation will invoke - zero or more actions in order to accomplish that phase.
-Actions come in two forms:
LLVMC tries hard to be as compatible with gcc as possible, +although there are some small differences. Most of the time, however, +you shouldn't be able to notice them:
++$ # This works as expected: +$ llvmc2 -O3 -Wall hello.cpp +$ ./a.out +hello ++
One nice feature of LLVMC is that one doesn't have to distinguish +between different compilers for different languages (think g++ and +gcc) - the right toolchain is chosen automatically based on input +language names (which are, in turn, determined from file +extensions). If you want to force files ending with ".c" to compile as +C++, use the -x option, just like you would do it with gcc:
++$ llvmc2 -x c hello.cpp +$ # hello.cpp is really a C file +$ ./a.out +hello ++
On the other hand, when using LLVMC as a linker to combine several C++ +object files you should provide the --linker option since it's +impossible for LLVMC to choose the right linker in that case:
++$ llvmc2 -c hello.cpp +$ llvmc2 hello.o +[A lot of link-time errors skipped] +$ llvmc2 --linker=c++ hello.o +$ ./a.out +hello +
LLVMC has some built-in options that can't be overridden in the +configuration files:
+This section of the document describes the configuration files used by
- llvmc. Configuration information is relatively static for a
- given release of LLVM and a front end compiler. However, the details may
- change from release to release of either. Users are encouraged to simply use
- the various options of the B
At the time of writing LLVMC does not support on-the-fly reloading of +configuration, so to customize LLVMC you'll have to recompile the +source code (which lives under $LLVM_DIR/tools/llvmc2). The +default configuration files are Common.td (contains common +definitions, don't forget to include it in your configuration +files), Tools.td (tool descriptions) and Graph.td (compilation +graph definition).
+To compile LLVMC with your own configuration file (say,``MyGraph.td``), +run make like this:
++$ cd $LLVM_DIR/tools/llvmc2 +$ make GRAPH=MyGraph.td TOOLNAME=my_llvmc ++
This will build an executable named my_llvmc. There are also +several sample configuration files in the llvmc2/examples +subdirectory that should help to get you started.
+Internally, LLVMC stores information about possible source +transformations in form of a graph. Nodes in this graph represent +tools, and edges between two nodes represent a transformation path. A +special "root" node is used to mark entry points for the +transformations. LLVMC also assigns a weight to each edge (more on +this later) to choose between several alternative edges.
+The definition of the compilation graph (see file Graph.td) is +just a list of edges:
++def CompilationGraph : CompilationGraph<[ + Edge<root, llvm_gcc_c>, + Edge<root, llvm_gcc_assembler>, + ... + + Edge<llvm_gcc_c, llc>, + Edge<llvm_gcc_cpp, llc>, + ... + + OptionalEdge<llvm_gcc_c, opt, [(switch_on "opt")]>, + OptionalEdge<llvm_gcc_cpp, opt, [(switch_on "opt")]>, + ... + + OptionalEdge<llvm_gcc_assembler, llvm_gcc_cpp_linker, + (case (input_languages_contain "c++"), (inc_weight), + (or (parameter_equals "linker", "g++"), + (parameter_equals "linker", "c++")), (inc_weight))>, + ... + + ]>; ++
As you can see, the edges can be either default or optional, where +optional edges are differentiated by sporting a case expression +used to calculate the edge's weight.
+The default edges are assigned a weight of 1, and optional edges get a +weight of 0 + 2*N where N is the number of tests that evaluated to +true in the case expression. It is also possible to provide an +integer parameter to inc_weight and dec_weight - in this case, +the weight is increased (or decreased) by the provided value instead +of the default 2.
+When passing an input file through the graph, LLVMC picks the edge +with the maximum weight. To avoid ambiguity, there should be only one +default edge between two nodes (with the exception of the root node, +which gets a special treatment - there you are allowed to specify one +default edge per language).
+To get a visual representation of the compilation graph (useful for +debugging), run llvmc2 --view-graph. You will need dot and +gsview installed for this to work properly.
llvmc is highly configurable both on the command line and in -configuration files. The options it understands are generic, consistent and -simple by design. Furthermore, the llvmc options apply to the -compilation of any LLVM enabled programming language. To be enabled as a -supported source language compiler, a compiler writer must provide a -configuration file that tells llvmc how to invoke the compiler -and what its capabilities are. The purpose of the configuration files then -is to allow compiler writers to specify to llvmc how the compiler -should be invoked. Users may but are not advised to alter the compiler's -llvmc configuration.
- -Because llvmc just invokes other programs, it must deal with the -available command line options for those programs regardless of whether they -were written for LLVM or not. Furthermore, not all compilation front ends will -have the same capabilities. Some front ends will simply generate LLVM assembly -code, others will be able to generate fully optimized byte code. In general, -llvmc doesn't make any assumptions about the capabilities or command -line options of a sub-tool. It simply uses the details found in the configuration -files and leaves it to the compiler writer to specify the configuration -correctly.
- -This approach means that new compiler front ends can be up and working very -quickly. As a first cut, a front end can simply compile its source to raw -(unoptimized) bytecode or LLVM assembly and llvmc can be configured -to pick up the slack (translate LLVM assembly to bytecode, optimize the -bytecode, generate native assembly, link, etc.). In fact, the front end need -not use any LLVM libraries, and it could be written in any language (instead of -C++). The configuration data will allow the full range of optimization, -assembly, and linking capabilities that LLVM provides to be added to these kinds -of tools. Enabling the rapid development of front-ends is one of the primary -goals of llvmc.
- -As a compiler front end matures, it may utilize the LLVM libraries and tools -to more efficiently produce optimized bytecode directly in a single compilation -and optimization program. In these cases, multiple tools would not be needed -and the configuration data for the compiler would change.
- -Configuring llvmc to the needs and capabilities of a source language -compiler is relatively straight forward. A compiler writer must provide a -definition of what to do for each of the five compilation phases for each of -the optimization levels. The specification consists simply of prototypical -command lines into which llvmc can substitute command line -arguments and file names. Note that any given phase can be completely blank if -the source language's compiler combines multiple phases into a single program. -For example, quite often pre-processing, translation, and optimization are -combined into a single program. The specification for such a compiler would have -blank entries for pre-processing and translation but a full command line for -optimization.
+ +As was said earlier, nodes in the compilation graph represent tools, +which are described separately. A tool definition looks like this +(taken from the Tools.td file):
++def llvm_gcc_cpp : Tool<[ + (in_language "c++"), + (out_language "llvm-assembler"), + (output_suffix "bc"), + (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"), + (sink) + ]>; ++
This defines a new tool called llvm_gcc_cpp, which is an alias for +llvm-g++. As you can see, a tool definition is just a list of +properties; most of them should be self-explanatory. The sink +property means that this tool should be passed all command-line +options that lack explicit descriptions.
+The complete list of the currently implemented tool properties follows:
+The next tool definition is slightly more complex:
++def llvm_gcc_linker : Tool<[ + (in_language "object-code"), + (out_language "executable"), + (output_suffix "out"), + (cmd_line "llvm-gcc $INFILE -o $OUTFILE"), + (join), + (prefix_list_option "L", (forward), + (help "add a directory to link path")), + (prefix_list_option "l", (forward), + (help "search a library when linking")), + (prefix_list_option "Wl", (unpack_values), + (help "pass options to linker")) + ]>; ++
This tool has a "join" property, which means that it behaves like a +linker. This tool also defines several command-line options: -l, +-L and -Wl which have their usual meaning. An option has two +attributes: a name and a (possibly empty) list of properties. All +currently implemented option types and properties are described below:
+Possible option types:
++++
+- switch_option - a simple boolean switch, for example -time.
+- parameter_option - option that takes an argument, for example +-std=c99;
+- parameter_list_option - same as the above, but more than one +occurence of the option is allowed.
+- prefix_option - same as the parameter_option, but the option name +and parameter value are not separated.
+- prefix_list_option - same as the above, but more than one +occurence of the option is allowed; example: -lm -lpthread.
+- alias_option - a special option type for creating +aliases. Unlike other option types, aliases are not allowed to +have any properties besides the aliased option name. Usage +example: (alias_option "preprocess", "E")
+
Possible option properties:
++++
+- append_cmd - append a string to the tool invocation command.
+- forward - forward this option unchanged.
+- output_suffix - modify the output suffix of this +tool. Example : (switch "E", (output_suffix "i").
+- stop_compilation - stop compilation after this phase.
+- unpack_values - used for for splitting and forwarding +comma-separated lists of options, e.g. -Wa,-foo=bar,-baz is +converted to -foo=bar -baz and appended to the tool invocation +command.
+- help - help string associated with this option. Used for +--help output.
+- required - this option is obligatory.
+
There are two types of configuration files: the master configuration file - and the language specific configuration file. The master configuration file - contains the general configuration of llvmc itself and is supplied - with the tool. It contains information that is source language agnostic. - Language specific configuration files tell llvmc how to invoke the - language's compiler for a variety of different tasks and what other tools - are needed to backfill the compiler's missing features (e.g. - optimization).
- -llvmc always looks for files of a specific name. It uses the
- first file with the name its looking for by searching directories in the
- following order:
-
In the directories searched, a file named master will be - recognized as the master configuration file for llvmc. Note that - users may override the master file with a copy in their home directory - but they are advised not to. This capability is only useful for compiler - implementers needing to alter the master configuration while developing - their compiler front end. When reading the configuration files, the master - files are always read first.
-Language specific configuration files are given specific names to foster - faster lookup. The name of a given language specific configuration file is - the same as the suffix used to identify files containing source in that - language. For example, a configuration file for C++ source might be named - cpp, C, or cxx.
- -The master configuration file is always read. Which language specific - configuration files are read depends on the command line options and the - suffixes of the file names provided on llvmc's command line. Note - that the --x LANGUAGE option alters the language that llvmc - uses for the subsequent files on the command line. Only the language - specific configuration files actually needed to complete llvmc's - task are read. Other language specific files will be ignored.
+ +It can be handy to have all information about options gathered in a +single place to provide an overview. This can be achieved by using a +so-called OptionList:
++def Options : OptionList<[ +(switch_option "E", (help "Help string")), +(alias_option "quiet", "q") +... +]>; ++
OptionList is also a good place to specify option aliases.
+Tool-specific option properties like append_cmd have (obviously) +no meaning in the context of OptionList, so the only properties +allowed there are help and required.
+Option lists are used at the file scope. See file +examples/Clang.td for an example of OptionList usage.
The syntax of the configuration files is yet to be determined. There are
- two viable options remaining:
-
Normally, LLVMC executes programs from the system PATH. Sometimes, +this is not sufficient: for example, we may want to specify tool names +in the configuration file. This can be achieved via the mechanism of +hooks - to compile LLVMC with your hooks, just drop a .cpp file into +tools/llvmc2 directory. Hooks should live in the hooks +namespace and have the signature std::string hooks::MyHookName +(void). They can be used from the cmd_line tool property:
++(cmd_line "$CALL(MyHook)/path/to/file -o $CALL(AnotherHook)") ++
It is also possible to use environment variables in the same manner:
++(cmd_line "$ENV(VAR1)/path/to/file -o $ENV(VAR2)") ++
To change the command line string based on user-provided options use +the case expression (documented below):
++(cmd_line + (case + (switch_on "E"), + "llvm-g++ -E -x c $INFILE -o $OUTFILE", + (default), + "llvm-g++ -c -x c $INFILE -o $OUTFILE -emit-llvm")) +
- -=head3 Section: [lang=I] - -This section provides the master configuration data for a given language. The -language specific data will be found in a file named I . - -=over - -=item C I - -This adds the I specified to the list of recognized suffixes for -the I identified in the section. As many suffixes as are commonly used -for source files for the I should be specified. - -=back - -=begin html - - For example, the following might appear for C++: -
-[lang=C++] -suffix=.cpp -suffix=.cxx -suffix=.C -- -=end html + +The 'case' construct can be used to calculate weights of the optional +edges and to choose between several alternative command line strings +in the cmd_line tool property. It is designed after the +similarly-named construct in functional languages and takes the form +(case (test_1), statement_1, (test_2), statement_2, ... (test_N), +statement_N). The statements are evaluated only if the corresponding +tests evaluate to true.
+Examples:
++// Increases edge weight by 5 if "-A" is provided on the +// command-line, and by 5 more if "-B" is also provided. +(case + (switch_on "A"), (inc_weight 5), + (switch_on "B"), (inc_weight 5)) + +// Evaluates to "cmdline1" if option "-A" is provided on the +// command line, otherwise to "cmdline2" +(case + (switch_on "A"), "cmdline1", + (switch_on "B"), "cmdline2", + (default), "cmdline3")-
Note the slight difference in 'case' expression handling in contexts +of edge weights and command line specification - in the second example +the value of the "B" switch is never checked when switch "A" is +enabled, and the whole expression always evaluates to "cmdline1" in +that case.
+Case expressions can also be nested, i.e. the following is legal:
++(case (switch_on "E"), (case (switch_on "o"), ..., (default), ...) + (default), ...) ++
You should, however, try to avoid doing that because it hurts +readability. It is usually better to split tool descriptions and/or +use TableGen inheritance instead.
+-=head3 Section: [general] - -=over - -=item C- -This item specifies whether the language has a pre-processing phase or not. This -controls whether the B<-E> option works for the language or not. - -=item C
This document uses precise terms in reference to the various artifacts and - concepts related to compilation. The terms used throughout this document are - defined below.
-[1] | TableGen Fundamentals +http://llvm.cs.uiuc.edu/docs/TableGenFundamentals.html |