+
+<p>...and it also shows a convention that we follow in this document. When
+demonstrating instructions, we will follow an instruction with a comment that
+defines the type and name of value produced. Comments are shown in italic
+text.</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"> <a name="highlevel">High Level Structure</a> </div>
+<!-- *********************************************************************** -->
+
+<!-- ======================================================================= -->
+<div class="doc_subsection"> <a name="modulestructure">Module Structure</a>
+</div>
+
+<div class="doc_text">
+
+<p>LLVM programs are composed of "Module"s, each of which is a
+translation unit of the input programs. Each module consists of
+functions, global variables, and symbol table entries. Modules may be
+combined together with the LLVM linker, which merges function (and
+global variable) definitions, resolves forward declarations, and merges
+symbol table entries. Here is an example of the "hello world" module:</p>
+
+<pre><i>; Declare the string constant as a global constant...</i>
+<a href="#identifiers">%.LC0</a> = <a href="#linkage_internal">internal</a> <a
+ href="#globalvars">constant</a> <a href="#t_array">[13 x sbyte]</a> c"hello world\0A\00" <i>; [13 x sbyte]*</i>
+
+<i>; External declaration of the puts function</i>
+<a href="#functionstructure">declare</a> int %puts(sbyte*) <i>; int(sbyte*)* </i>
+
+<i>; Definition of main function</i>
+int %main() { <i>; int()* </i>
+ <i>; Convert [13x sbyte]* to sbyte *...</i>
+ %cast210 = <a
+ href="#i_getelementptr">getelementptr</a> [13 x sbyte]* %.LC0, long 0, long 0 <i>; sbyte*</i>
+
+ <i>; Call puts function to write out the string to stdout...</i>
+ <a
+ href="#i_call">call</a> int %puts(sbyte* %cast210) <i>; int</i>
+ <a
+ href="#i_ret">ret</a> int 0<br>}<br></pre>
+
+<p>This example is made up of a <a href="#globalvars">global variable</a>
+named "<tt>.LC0</tt>", an external declaration of the "<tt>puts</tt>"
+function, and a <a href="#functionstructure">function definition</a>
+for "<tt>main</tt>".</p>
+
+<p>In general, a module is made up of a list of global values,
+where both functions and global variables are global values. Global values are
+represented by a pointer to a memory location (in this case, a pointer to an
+array of char, and a pointer to a function), and have one of the following <a
+href="#linkage">linkage types</a>.</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="linkage">Linkage Types</a>
+</div>
+
+<div class="doc_text">
+
+<p>
+All Global Variables and Functions have one of the following types of linkage:
+</p>
+
+<dl>
+
+ <dt><tt><b><a name="linkage_internal">internal</a></b></tt> </dt>
+
+ <dd>Global values with internal linkage are only directly accessible by
+ objects in the current module. In particular, linking code into a module with
+ an internal global value may cause the internal to be renamed as necessary to
+ avoid collisions. Because the symbol is internal to the module, all
+ references can be updated. This corresponds to the notion of the
+ '<tt>static</tt>' keyword in C, or the idea of "anonymous namespaces" in C++.
+ </dd>
+
+ <dt><tt><b><a name="linkage_linkonce">linkonce</a></b></tt>: </dt>
+
+ <dd>"<tt>linkonce</tt>" linkage is similar to <tt>internal</tt> linkage, with
+ the twist that linking together two modules defining the same
+ <tt>linkonce</tt> globals will cause one of the globals to be discarded. This
+ is typically used to implement inline functions. Unreferenced
+ <tt>linkonce</tt> globals are allowed to be discarded.
+ </dd>
+
+ <dt><tt><b><a name="linkage_weak">weak</a></b></tt>: </dt>
+
+ <dd>"<tt>weak</tt>" linkage is exactly the same as <tt>linkonce</tt> linkage,
+ except that unreferenced <tt>weak</tt> globals may not be discarded. This is
+ used to implement constructs in C such as "<tt>int X;</tt>" at global scope.
+ </dd>
+
+ <dt><tt><b><a name="linkage_appending">appending</a></b></tt>: </dt>
+
+ <dd>"<tt>appending</tt>" linkage may only be applied to global variables of
+ pointer to array type. When two global variables with appending linkage are
+ linked together, the two global arrays are appended together. This is the
+ LLVM, typesafe, equivalent of having the system linker append together
+ "sections" with identical names when .o files are linked.
+ </dd>
+
+ <dt><tt><b><a name="linkage_external">externally visible</a></b></tt>:</dt>
+
+ <dd>If none of the above identifiers are used, the global is externally
+ visible, meaning that it participates in linkage and can be used to resolve
+ external symbol references.
+ </dd>
+</dl>
+
+<p><a name="linkage_external">For example, since the "<tt>.LC0</tt>"
+variable is defined to be internal, if another module defined a "<tt>.LC0</tt>"
+variable and was linked with this one, one of the two would be renamed,
+preventing a collision. Since "<tt>main</tt>" and "<tt>puts</tt>" are
+external (i.e., lacking any linkage declarations), they are accessible
+outside of the current module. It is illegal for a function <i>declaration</i>
+to have any linkage type other than "externally visible".</a></p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="callingconv">Calling Conventions</a>
+</div>
+
+<div class="doc_text">
+
+<p>LLVM <a href="#functionstructure">functions</a>, <a href="#i_call">calls</a>
+and <a href="#i_invoke">invokes</a> can all have an optional calling convention
+specified for the call. The calling convention of any pair of dynamic
+caller/callee must match, or the behavior of the program is undefined. The
+following calling conventions are supported by LLVM, and more may be added in
+the future:</p>
+
+<dl>
+ <dt><b>"<tt>ccc</tt>" - The C calling convention</b>:</dt>
+
+ <dd>This calling convention (the default if no other calling convention is
+ specified) matches the target C calling conventions. This calling convention
+ supports varargs function calls and tolerates some mismatch in the declared
+ prototype and implemented declaration of the function (as does normal C).
+ </dd>
+
+ <dt><b>"<tt>fastcc</tt>" - The fast calling convention</b>:</dt>
+
+ <dd>This calling convention attempts to make calls as fast as possible
+ (e.g. by passing things in registers). This calling convention allows the
+ target to use whatever tricks it wants to produce fast code for the target,
+ without having to conform to an externally specified ABI. Implementations of
+ this convention should allow arbitrary tail call optimization to be supported.
+ This calling convention does not support varargs and requires the prototype of
+ all callees to exactly match the prototype of the function definition.
+ </dd>
+
+ <dt><b>"<tt>coldcc</tt>" - The cold calling convention</b>:</dt>
+
+ <dd>This calling convention attempts to make code in the caller as efficient
+ as possible under the assumption that the call is not commonly executed. As
+ such, these calls often preserve all registers so that the call does not break
+ any live ranges in the caller side. This calling convention does not support
+ varargs and requires the prototype of all callees to exactly match the
+ prototype of the function definition.
+ </dd>
+
+ <dt><b>"<tt>cc <<em>n</em>></tt>" - Numbered convention</b>:</dt>
+
+ <dd>Any calling convention may be specified by number, allowing
+ target-specific calling conventions to be used. Target specific calling
+ conventions start at 64.
+ </dd>
+</dl>
+
+<p>More calling conventions can be added/defined on an as-needed basis, to
+support pascal conventions or any other well-known target-independent
+convention.</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="globalvars">Global Variables</a>
+</div>
+
+<div class="doc_text">
+
+<p>Global variables define regions of memory allocated at compilation time
+instead of run-time. Global variables may optionally be initialized. A
+variable may be defined as a global "constant", which indicates that the
+contents of the variable will <b>never</b> be modified (enabling better
+optimization, allowing the global data to be placed in the read-only section of
+an executable, etc). Note that variables that need runtime initialization
+cannot be marked "constant", as there is a store to the variable.</p>
+
+<p>
+LLVM explicitly allows <em>declarations</em> of global variables to be marked
+constant, even if the final definition of the global is not. This capability
+can be used to enable slightly better optimization of the program, but requires
+the language definition to guarantee that optimizations based on the
+'constantness' are valid for the translation units that do not include the
+definition.
+</p>
+
+<p>As SSA values, global variables define pointer values that are in
+scope (i.e. they dominate) all basic blocks in the program. Global
+variables always define a pointer to their "content" type because they
+describe a region of memory, and all memory objects in LLVM are
+accessed through pointers.</p>
+
+</div>
+
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="functionstructure">Functions</a>
+</div>
+
+<div class="doc_text">
+
+<p>LLVM function definitions consist of an optional <a href="#linkage">linkage
+type</a>, an optional <a href="#callingconv">calling convention</a>, a return
+type, a function name, a (possibly empty) argument list, an opening curly brace,
+a list of basic blocks, and a closing curly brace. LLVM function declarations
+are defined with the "<tt>declare</tt>" keyword, an optional <a
+href="#callingconv">calling convention</a>, a return type, a function name, and
+a possibly empty list of arguments.</p>
+
+<p>A function definition contains a list of basic blocks, forming the CFG for
+the function. Each basic block may optionally start with a label (giving the
+basic block a symbol table entry), contains a list of instructions, and ends
+with a <a href="#terminators">terminator</a> instruction (such as a branch or
+function return).</p>
+
+<p>The first basic block in a program is special in two ways: it is immediately
+executed on entrance to the function, and it is not allowed to have predecessor
+basic blocks (i.e. there can not be any branches to the entry block of a
+function). Because the block can have no predecessors, it also cannot have any
+<a href="#i_phi">PHI nodes</a>.</p>
+
+<p>LLVM functions are identified by their name and type signature. Hence, two
+functions with the same name but different parameter lists or return values are
+considered different functions, and LLVM will resolve references to each
+appropriately.</p>
+