X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FLangRef.html;h=47fc33ebbb7f59e08c2af983c7c4a044158143db;hb=812402019264506bea49749b23d0ead1be9fde7c;hp=86041a4d49f50ae1be99cc30c31d98c23f193fbb;hpb=2b7d320d90c8d7989d9c33bd6a6280fb3fd9d2c9;p=oota-llvm.git diff --git a/docs/LangRef.html b/docs/LangRef.html index 86041a4d49f..47fc33ebbb7 100644 --- a/docs/LangRef.html +++ b/docs/LangRef.html @@ -1,1579 +1,2599 @@ - -
llvm Assembly Language Reference Manual | -
-Abstract - |
- This document describes the LLVM assembly language. LLVM is an SSA based - representation that is a useful midlevel IR, providing type safety, low level - operations, flexibility, and the capability of representing 'all' high level - languages cleanly. -- - - +
This document is a reference manual for the LLVM assembly language. +LLVM is an SSA based representation that provides type safety, +low-level operations, flexibility, and the capability of representing +'all' high-level languages cleanly. It is the common code +representation used throughout all phases of the LLVM compilation +strategy.
+-Introduction - |
- -The LLVM representation aims to be a light weight and low level while being -expressive, type safe, and extensible at the same time. It aims to be a -"universal IR" of sorts, by being at a low enough level that high level ideas -may be cleanly mapped to it (similar to how microprocessors are "universal -IR's", allowing many source languages to be mapped to them). By providing type -safety, LLVM can be used as the target of optimizations: for example, through -pointer analysis, it can be proven that a C automatic variable is never accessed -outside of the current function... allowing it to be promoted to a simple SSA -value instead of a memory location.
+
The LLVM code representation is designed to be used in three +different forms: as an in-memory compiler IR, as an on-disk bytecode +representation (suitable for fast loading by a Just-In-Time compiler), +and as a human readable assembly language representation. This allows +LLVM to provide a powerful intermediate representation for efficient +compiler transformations and analysis, while providing a natural means +to debug and visualize the transformations. The three different forms +of LLVM are all equivalent. This document describes the human readable +representation and notation.
+ +The LLVM representation aims to be a light-weight and low-level +while being expressive, typed, and extensible at the same time. It +aims to be a "universal IR" of sorts, by being at a low enough level +that high-level ideas may be cleanly mapped to it (similar to how +microprocessors are "universal IR's", allowing many source languages to +be mapped to them). By providing type information, LLVM can be used as +the target of optimizations: for example, through pointer analysis, it +can be proven that a C automatic variable is never accessed outside of +the current function... allowing it to be promoted to a simple SSA +value instead of a memory location.
+ ++
It is important to note that this document describes 'well formed' +LLVM assembly language. There is a difference between what the parser +accepts and what is considered 'well formed'. For example, the +following instruction is syntactically okay, but not well formed:
%x = add int 1, %x-...because only a phi node may refer to itself. -The LLVM api provides a verification pass (created by the -createVerifierPass function) that may be used to verify that an LLVM -module is well formed. This pass is automatically run by the parser after -parsing input assembly, and by the optimizer before it outputs bytecode. The -violations pointed out by the verifier pass indicate bugs in transformation -passes or input to the parser.
- -Describe the typesetting conventions here. +
...because the definition of %x does not dominate all of +its uses. The LLVM infrastructure provides a verification pass that may +be used to verify that an LLVM module is well formed. This pass is +automatically run by the parser after parsing input assembly, and by +the optimizer before it outputs bytecode. The violations pointed out +by the verifier pass indicate bugs in transformation passes or input to +the parser.
+-Identifiers - |
+
- -LLVM requires the values start with a '%' sign for two reasons: Compilers don't -need to worry about name clashes with reserved words, and the set of reserved -words may be expanded in the future without penalty. Additionally, unnamed -identifiers allow a compiler to quickly come up with a temporary variable -without having to avoid symbol table conflicts.
- -Reserved words in LLVM are very similar to reserved words in other languages. -There are keywords for different opcodes ('add', -'cast', 'ret', -etc...), for primitive type names ('void', -'uint', etc...), and others. These reserved -words cannot conflict with variable names, because none of them start with a '%' -character.
- -Here is an example of LLVM code to multiply the integer variable '%X' -by 8:
- -The easy way: -
- %result = mul uint %X, 8 -- -After strength reduction: -
- %result = shl uint %X, ubyte 3 -- -And the hard way: -
- add uint %X, %X ; yields {int}:%0 - add uint %0, %0 ; yields {int}:%1 - %result = add uint %1, %1 -- -This last way of multiplying %X by 8 illustrates several important lexical features of LLVM:
+
LLVM uses three different forms of identifiers, for different +purposes:
- -...and it also show a convention that we follow in this document. When -demonstrating instructions, we will follow an instruction with a comment that -defines the type and name of value produced. Comments are shown in italic -text.
-
-The one unintuitive notation for constants is the optional hexidecimal form of
-floating point constants. For example, the form 'double
+
LLVM requires that values start with a '%' sign for two reasons: +Compilers don't need to worry about name clashes with reserved words, +and the set of reserved words may be expanded in the future without +penalty. Additionally, unnamed identifiers allow a compiler to quickly +come up with a temporary variable without having to avoid symbol table +conflicts.
+Reserved words in LLVM are very similar to reserved words in other +languages. There are keywords for different opcodes ('add', 'cast', 'ret', etc...), for primitive type names ('void', 'uint', +etc...), and others. These reserved words cannot conflict with +variable names, because none of them start with a '%' character.
+Here is an example of LLVM code to multiply the integer variable '%X' +by 8:
+The easy way:
+%result = mul uint %X, 8+
After strength reduction:
+%result = shl uint %X, ubyte 3+
And the hard way:
+add uint %X, %X ; yields {uint}:%0 + add uint %0, %0 ; yields {uint}:%1 + %result = add uint %1, %1+
This last way of multiplying %X by 8 illustrates several +important lexical features of LLVM:
+...and it also show a convention that we follow in this document. +When demonstrating instructions, we will follow an instruction with a +comment that defines the type and name of value produced. Comments are +shown in italic text.
+The one non-intuitive notation for constants is the optional +hexidecimal form of floating point constants. For example, the form 'double 0x432ff973cafa8000' is equivalent to (but harder to read than) 'double -4.5e+15' which is also supported by the parser. The only time hexadecimal -floating point constants are useful (and the only time that they are generated -by the disassembler) is when an FP constant has to be emitted that is not -representable as a decimal floating point number exactly. For example, NaN's, -infinities, and other special cases are represented in their IEEE hexadecimal -format so that assembly and disassembly do not cause any bits to change in the -constants.
- - +4.5e+15' which is also supported by the parser. The only time +hexadecimal floating point constants are useful (and the only time that +they are generated by the disassembler) is when an FP constant has to +be emitted that is not representable as a decimal floating point number +exactly. For example, NaN's, infinities, and other special cases are +represented in their IEEE hexadecimal format so that assembly and +disassembly do not cause any bits to change in the constants.
+-Type System - |
- -The written form for the type system was heavily influenced by the syntactic -problems with types in the C language1.
- - - +
The LLVM type system is one of the most important features of the +intermediate representation. Being typed enables a number of +optimizations to be performed on the IR directly, without having to do +extra analyses on the side before the transformation. A strong type +system makes it easier to read the generated code and enables novel +analyses and transformations that are not feasible to perform on normal +three address code representations.
+-Primitive Types - |
- -