X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FLangRef.html;h=47fc33ebbb7f59e08c2af983c7c4a044158143db;hb=812402019264506bea49749b23d0ead1be9fde7c;hp=b3d6d5212183a51e7df82a2090b261590f6219da;hpb=009505452b713ed2e3a8e99c5545a6e721c65495;p=oota-llvm.git diff --git a/docs/LangRef.html b/docs/LangRef.html index b3d6d521218..47fc33ebbb7 100644 --- a/docs/LangRef.html +++ b/docs/LangRef.html @@ -1,1376 +1,2599 @@ - -
llvm Assembly Language Reference Manual | -
-Abstract - |
- This document describes the LLVM assembly language IR/VM. LLVM is an SSA - based representation that attempts to be a useful midlevel IR by providing - type safety, low level operations, flexibility, and the capability to - represent 'all' high level languages cleanly. -- - - +
This document is a reference manual for the LLVM assembly language. +LLVM is an SSA based representation that provides type safety, +low-level operations, flexibility, and the capability of representing +'all' high-level languages cleanly. It is the common code +representation used throughout all phases of the LLVM compilation +strategy.
+-Introduction - |
- -This dual nature leads to three different representations of LLVM (the human readable assembly representation, the compact bytecode representation, and the in memory, pointer based, representation). This document describes the human readable representation and notation.
- -The LLVM representation aims to be a light weight and low level while being expressive, type safe, and extensible at the same time. It aims to be a "universal IR" of sorts, by being at a low enough level that high level ideas may be cleanly mapped to it. By providing type safety, LLVM can be used as the target of optimizations: for example, through pointer analysis, it can be proven that a C automatic variable is never accessed outside of the current function... allowing it to be promoted to a simple SSA value instead of a memory location.
+
The LLVM code representation is designed to be used in three +different forms: as an in-memory compiler IR, as an on-disk bytecode +representation (suitable for fast loading by a Just-In-Time compiler), +and as a human readable assembly language representation. This allows +LLVM to provide a powerful intermediate representation for efficient +compiler transformations and analysis, while providing a natural means +to debug and visualize the transformations. The three different forms +of LLVM are all equivalent. This document describes the human readable +representation and notation.
+ +The LLVM representation aims to be a light-weight and low-level +while being expressive, typed, and extensible at the same time. It +aims to be a "universal IR" of sorts, by being at a low enough level +that high-level ideas may be cleanly mapped to it (similar to how +microprocessors are "universal IR's", allowing many source languages to +be mapped to them). By providing type information, LLVM can be used as +the target of optimizations: for example, through pointer analysis, it +can be proven that a C automatic variable is never accessed outside of +the current function... allowing it to be promoted to a simple SSA +value instead of a memory location.
+ ++
It is important to note that this document describes 'well formed' +LLVM assembly language. There is a difference between what the parser +accepts and what is considered 'well formed'. For example, the +following instruction is syntactically okay, but not well formed:
%x = add int 1, %x-...because only a phi node may refer to itself. The LLVM api provides a verification function (verify) that may be used to verify that a whole module or a single method is well formed. It is useful to validate whether an optimization pass performed a well formed transformation to the code.
- - -Describe the typesetting conventions here. +
...because the definition of %x does not dominate all of +its uses. The LLVM infrastructure provides a verification pass that may +be used to verify that an LLVM module is well formed. This pass is +automatically run by the parser after parsing input assembly, and by +the optimizer before it outputs bytecode. The violations pointed out +by the verifier pass indicate bugs in transformation passes or input to +the parser.
+-Identifiers - |
- -
- -LLVM requires the values start with a '%' sign for two reasons: Compilers don't need to worry about name clashes with reserved words, and the set of reserved words may be expanded in the future without penalty. Additionally, unnamed identifiers allow a compiler to quickly come up with a temporary variable without having to avoid symbol table conflicts.
- -Reserved words in LLVM are very similar to reserved words in other languages. There are keywords for different opcodes ('add', 'cast', 'ret', etc...), for primitive type names ('void', 'uint', etc...), and others. These reserved words cannot conflict with variable names, because none of them may start with a '%' character.
- -Here is an example of LLVM code to multiply the integer variable '%X' by 8:
- -The easy way: -
- %result = mul int %X, 8 -- -After strength reduction: -
- %result = shl int %X, ubyte 3 -- -And the hard way: -
- add int %X, %X ; yields {int}:%0 - add int %0, %0 ; yields {int}:%1 - %result = add int %1, %1 -+
+
LLVM uses three different forms of identifiers, for different +purposes:
- -...and it also show a convention that we follow in this document. When demonstrating instructions, we will follow an instruction with a comment that defines the type and name of value produced. Comments are shown in italic text.
- - - +
LLVM requires that values start with a '%' sign for two reasons: +Compilers don't need to worry about name clashes with reserved words, +and the set of reserved words may be expanded in the future without +penalty. Additionally, unnamed identifiers allow a compiler to quickly +come up with a temporary variable without having to avoid symbol table +conflicts.
+Reserved words in LLVM are very similar to reserved words in other +languages. There are keywords for different opcodes ('add', 'cast', 'ret', etc...), for primitive type names ('void', 'uint', +etc...), and others. These reserved words cannot conflict with +variable names, because none of them start with a '%' character.
+Here is an example of LLVM code to multiply the integer variable '%X' +by 8:
+The easy way:
+%result = mul uint %X, 8+
After strength reduction:
+%result = shl uint %X, ubyte 3+
And the hard way:
+add uint %X, %X ; yields {uint}:%0 + add uint %0, %0 ; yields {uint}:%1 + %result = add uint %1, %1+
This last way of multiplying %X by 8 illustrates several +important lexical features of LLVM:
+...and it also show a convention that we follow in this document. +When demonstrating instructions, we will follow an instruction with a +comment that defines the type and name of value produced. Comments are +shown in italic text.
+The one non-intuitive notation for constants is the optional +hexidecimal form of floating point constants. For example, the form 'double +0x432ff973cafa8000' is equivalent to (but harder to read than) 'double +4.5e+15' which is also supported by the parser. The only time +hexadecimal floating point constants are useful (and the only time that +they are generated by the disassembler) is when an FP constant has to +be emitted that is not representable as a decimal floating point number +exactly. For example, NaN's, infinities, and other special cases are +represented in their IEEE hexadecimal format so that assembly and +disassembly do not cause any bits to change in the constants.
+-Type System - |
- -The assembly language form for the type system was heavily influenced by the type problems in the C language1.
- - - +
The LLVM type system is one of the most important features of the +intermediate representation. Being typed enables a number of +optimizations to be performed on the IR directly, without having to do +extra analyses on the side before the transformation. A strong type +system makes it easier to read the generated code and enables novel +analyses and transformations that are not feasible to perform on normal +three address code representations.
+-Primitive Types - |
- -