X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FLangRef.html;h=5dfc9539c9116ceddb37ac38fc94f2950acd4011;hb=31b24322edc0bcfb8c77277fc5eeebfe98267118;hp=cf514490a8a2de2ed084e4d2eee2bb2c1f352605;hpb=7faa88366572f1515cc4e5e818c8ea9e6d0f2f8b;p=oota-llvm.git diff --git a/docs/LangRef.html b/docs/LangRef.html index cf514490a8a..5dfc9539c91 100644 --- a/docs/LangRef.html +++ b/docs/LangRef.html @@ -1,1596 +1,2597 @@ - -
llvm Assembly Language Reference Manual | -
-Abstract - |
- This document describes the LLVM assembly language. LLVM is an SSA based - representation that is a useful midlevel IR, providing type safety, low level - operations, flexibility, and the capability to represent 'all' high level - languages cleanly. -- - - +
This document is a reference manual for the LLVM assembly language. +LLVM is an SSA based representation that provides type safety, +low-level operations, flexibility, and the capability of representing +'all' high-level languages cleanly. It is the common code +representation used throughout all phases of the LLVM compilation +strategy.
+-Introduction - |
- -The LLVM representation aims to be a light weight and low level while being -expressive, type safe, and extensible at the same time. It aims to be a -"universal IR" of sorts, by being at a low enough level that high level ideas -may be cleanly mapped to it (similar to how microprocessors are "universal -IR's", allowing many source languages to be mapped to them). By providing type -safety, LLVM can be used as the target of optimizations: for example, through -pointer analysis, it can be proven that a C automatic variable is never accessed -outside of the current function... allowing it to be promoted to a simple SSA -value instead of a memory location.
+
The LLVM code representation is designed to be used in three +different forms: as an in-memory compiler IR, as an on-disk bytecode +representation (suitable for fast loading by a Just-In-Time compiler), +and as a human readable assembly language representation. This allows +LLVM to provide a powerful intermediate representation for efficient +compiler transformations and analysis, while providing a natural means +to debug and visualize the transformations. The three different forms +of LLVM are all equivalent. This document describes the human readable +representation and notation.
+ +The LLVM representation aims to be a light-weight and low-level +while being expressive, typed, and extensible at the same time. It +aims to be a "universal IR" of sorts, by being at a low enough level +that high-level ideas may be cleanly mapped to it (similar to how +microprocessors are "universal IR's", allowing many source languages to +be mapped to them). By providing type information, LLVM can be used as +the target of optimizations: for example, through pointer analysis, it +can be proven that a C automatic variable is never accessed outside of +the current function... allowing it to be promoted to a simple SSA +value instead of a memory location.
+ ++
It is important to note that this document describes 'well formed' +LLVM assembly language. There is a difference between what the parser +accepts and what is considered 'well formed'. For example, the +following instruction is syntactically okay, but not well formed:
%x = add int 1, %x-...because only a phi node may refer to itself. -The LLVM api provides a verification pass (created by the -createVerifierPass function) that may be used to verify that an LLVM -module is well formed. This pass is automatically run by the parser after -parsing input assembly, and by the optimizer before it outputs bytecode. Often, -violations pointed out by the verifier pass indicate bugs in transformation -passes.
- - -Describe the typesetting conventions here. +
...because the definition of %x does not dominate all of +its uses. The LLVM infrastructure provides a verification pass that may +be used to verify that an LLVM module is well formed. This pass is +automatically run by the parser after parsing input assembly, and by +the optimizer before it outputs bytecode. The violations pointed out +by the verifier pass indicate bugs in transformation passes or input to +the parser.
+-Identifiers - |
- -
- -LLVM requires the values start with a '%' sign for two reasons: Compilers don't -need to worry about name clashes with reserved words, and the set of reserved -words may be expanded in the future without penalty. Additionally, unnamed -identifiers allow a compiler to quickly come up with a temporary variable -without having to avoid symbol table conflicts.
- -Reserved words in LLVM are very similar to reserved words in other languages. -There are keywords for different opcodes ('add', -'cast', 'ret', -etc...), for primitive type names ('void', -'uint', etc...), and others. These reserved -words cannot conflict with variable names, because none of them start with a '%' -character.
- -Here is an example of LLVM code to multiply the integer variable '%X' -by 8:
- -The easy way: -
- %result = mul int %X, 8 -- -After strength reduction: -
- %result = shl int %X, ubyte 3 -+
- add int %X, %X ; yields {int}:%0 - add int %0, %0 ; yields {int}:%1 - %result = add int %1, %1 -- -This last way of multiplying %X by 8 illustrates several important lexical features of LLVM:
+
LLVM uses three different forms of identifiers, for different +purposes:
- -...and it also show a convention that we follow in this document. When -demonstrating instructions, we will follow an instruction with a comment that -defines the type and name of value produced. Comments are shown in italic -text.
- - - +
LLVM requires that values start with a '%' sign for two reasons: +Compilers don't need to worry about name clashes with reserved words, +and the set of reserved words may be expanded in the future without +penalty. Additionally, unnamed identifiers allow a compiler to quickly +come up with a temporary variable without having to avoid symbol table +conflicts.
+Reserved words in LLVM are very similar to reserved words in other +languages. There are keywords for different opcodes ('add', 'cast', 'ret', etc...), for primitive type names ('void', 'uint', +etc...), and others. These reserved words cannot conflict with +variable names, because none of them start with a '%' character.
+Here is an example of LLVM code to multiply the integer variable '%X' +by 8:
+The easy way:
+%result = mul uint %X, 8+
After strength reduction:
+%result = shl uint %X, ubyte 3+
And the hard way:
+add uint %X, %X ; yields {uint}:%0 + add uint %0, %0 ; yields {uint}:%1 + %result = add uint %1, %1+
This last way of multiplying %X by 8 illustrates several +important lexical features of LLVM:
+...and it also show a convention that we follow in this document. +When demonstrating instructions, we will follow an instruction with a +comment that defines the type and name of value produced. Comments are +shown in italic text.
+The one non-intuitive notation for constants is the optional +hexidecimal form of floating point constants. For example, the form 'double +0x432ff973cafa8000' is equivalent to (but harder to read than) 'double +4.5e+15' which is also supported by the parser. The only time +hexadecimal floating point constants are useful (and the only time that +they are generated by the disassembler) is when an FP constant has to +be emitted that is not representable as a decimal floating point number +exactly. For example, NaN's, infinities, and other special cases are +represented in their IEEE hexadecimal format so that assembly and +disassembly do not cause any bits to change in the constants.
+-Type System - |
- -The assembly language form for the type system was heavily influenced by the -type problems in the C language1.
- - - +
The LLVM type system is one of the most important features of the +intermediate representation. Being typed enables a number of +optimizations to be performed on the IR directly, without having to do +extra analyses on the side before the transformation. A strong type +system makes it easier to read the generated code and enables novel +analyses and transformations that are not feasible to perform on normal +three address code representations.
+-Primitive Types - |
- -