X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FReleaseNotes.html;h=2f83b9447d1d98613e3ac0451f8aef501a58bb0f;hb=a75ce9f5d2236d93c117e861e60e6f3f748c9555;hp=9b1c5788ca7db8922e252f1a9d3b8885a79e1b71;hpb=bb11771eb6ffaf6867486f47719c3d52c4a0d63e;p=oota-llvm.git
diff --git a/docs/ReleaseNotes.html b/docs/ReleaseNotes.html
index 9b1c5788ca7..2f83b9447d1 100644
--- a/docs/ReleaseNotes.html
+++ b/docs/ReleaseNotes.html
@@ -119,10 +119,27 @@ production-quality compiler for C, Objective-C, C++ and Objective-C++ on x86
In the LLVM 2.8 time-frame, the Clang team has made many improvements:
-
-- Surely these guys have done something
-- X86-64 abi improvements? Did they make it in?
-
+
+ - Clang C++ is now feature-complete with respect to the ISO C++ 1998 and 2003 standards.
+ - Added support for Objective-C++.
+ - Clang now uses LLVM-MC to directly generate object code and to parse inline assembly (on Darwin).
+ - Introduced many new warnings, including
-Wmissing-field-initializers
, -Wshadow
, -Wno-protocol
, -Wtautological-compare
, -Wstrict-selector-match
, -Wcast-align
, -Wunused
improvements, and greatly improved format-string checking.
+ - Introduced the "libclang" library, a C interface to Clang intended to support IDE clients.
+ - Added support for
#pragma GCC visibility
, #pragma align
, and others.
+ - Added support for SSE, AVX, ARM NEON, and AltiVec.
+ - Improved support for many Microsoft extensions.
+ - Implemented support for blocks in C++.
+ - Implemented precompiled headers for C++.
+ - Improved abstract syntax trees to retain more accurate source information.
+ - Added driver support for handling LLVM IR and bitcode files directly.
+ - Major improvements to compiler correctness for exception handling.
+ - Improved generated code quality in some areas:
+
+ - Good code generation for X86-32 and X86-64 ABI handling.
+ - Improved code generation for bit-fields, although important work remains.
+
+
+
@@ -180,7 +197,6 @@ optimizers, rather than just a handful.
Fortran programs using common variables now link correctly.
GNU OMP constructs no longer crash the compiler.
-
@@ -253,7 +269,7 @@ support new platforms, new languages, new architectures, and new features.
-libc++ is another new member of the LLVM
+libc++ is another new member of the LLVM
family. It is an implementation of the C++ standard library, written from the
ground up to specifically target the forthcoming C++'0X standard and focus on
delivering great performance.
@@ -267,6 +283,43 @@ looking forward to the C++ committee finalizing the C++'0x standard.
+
+
+
+
+
+
+KLEE is a symbolic execution framework for
+programs in LLVM bitcode form. KLEE tries to symbolically evaluate "all" paths
+through the application and records state transitions that lead to fault
+states. This allows it to construct testcases that lead to faults and can even
+be used to verify some algorithms.
+
+
+
Although KLEE does not have any major new features as of 2.8, we have made
+various minor improvements, particular to ease development:
+
+ - Added support for LLVM 2.8. KLEE currently maintains compatibility with
+ LLVM 2.6, 2.7, and 2.8.
+ - Added a buildbot for 2.6, 2.7, and trunk. A 2.8 buildbot will be coming
+ soon following release.
+ - Fixed many C++ code issues to allow building with Clang++. Mostly
+ complete, except for the version of MiniSAT which is inside the KLEE STP
+ version.
+ - Improved support for building with separate source and build
+ directories.
+ - Added support for "long double" on x86.
+ - Initial work on KLEE support for using 'lit' test runner instead of
+ DejaGNU.
+ - Added configure support for using an external version of
+ STP.
+
+
+
+
+
External Open Source Projects Using LLVM 2.8
@@ -313,8 +366,8 @@ recompilation of larger parts of the compiler chain.
language and compiler written on top of LLVM, intended for producing
single-address-space managed code operating systems that
run faster than the equivalent multiple-address-space C systems.
-More in-depth blurb is available on
the wiki.
+More in-depth blurb is available on the
wiki.
@@ -325,14 +378,14 @@ href="http://www.quokforge.org/projects/horizon/wiki/Wiki">the wiki.
-Clam AntiVirus is an open source (GPL)
+Clam AntiVirus is an open source (GPL)
anti-virus toolkit for UNIX, designed especially for e-mail scanning on mail
gateways. Since version 0.96 it has bytecode
signatures that allow writing detections for complex malware. It
uses LLVM's JIT to speed up the execution of bytecode on
-X86,X86-64,PPC32/64, falling back to its own interpreter otherwise.
-The git version was updated to work with LLVM 2.8
+X86, X86-64, PPC32/64, falling back to its own interpreter otherwise.
+The git version was updated to work with LLVM 2.8.
The
Jade project is hosted as part of the Open
@@ -490,14 +543,14 @@ builds on LLVM 2.8.
DTMC provides support for
Transactional Memory, which is an easy-to-use and efficient way to synchronize
accesses to shared memory. Transactions can contain normal C/C++ code (e.g.,
-__transaction { list.remove(x); x.refCount--; }) and will be executed
+__transaction { list.remove(x); x.refCount--; }
) and will be executed
virtually atomically and isolated from other transactions.
@@ -547,23 +600,6 @@ in this section.
-
-
-
-
-
-
In addition to changes to the code, between LLVM 2.7 and 2.8, a number of
-organization changes have happened:
-
-
-
-- libc++ and lldb are new
-- Debugging optimized code support.
-
-
-
Major New Features
@@ -574,8 +610,16 @@ organization changes have happened:
LLVM 2.8 includes several major new capabilities:
-- llvm-diff
-- Direct .o file writing support for darwin/x86[64].
+- As mentioned above, libc++ and LLDB are major new additions to the LLVM collective.
+- LLVM 2.8 now has pretty decent support for debugging optimized code. You
+ should be able to reliably get debug info for function arguments, assuming
+ that the value is actually available where you have stopped.
+- A new 'llvm-diff' tool is available that does a semantic diff of .ll
+ files.
+- The MC subproject has made major progress in this release.
+ Direct .o file writing support for darwin/x86[-64] is now reliable and
+ support for other targets and object file formats are in progress.
@@ -590,13 +634,19 @@ organization changes have happened:
expose new optimization opportunities:
-
- memcpy, memmove, and memset now take address space qualified pointers + volatile.
- per-instruction debug info metadata is much faster and uses less space (new DebugLoc class).
- New "trap values" concept: http://llvm.org/docs/LangRef.html#trapvalues
- New linker_private_weak and linker_private_weak_def_auto linkage types
- Triples are now stored in normalized form. Triple::normalize.
-
+- The memcpy, memmove, and memset
+ intrinsics now take address space qualified pointers and a bit to indicate
+ whether the transfer is "volatile" or not.
+
+- Per-instruction debug info metadata is much faster and uses less memory by
+ using the new DebugLoc class.
+- LLVM IR now has a more formalized concept of "trap values", which allow the optimizer
+ to optimize more aggressively in the presence of undefined behavior, while
+ still producing predictable results.
+- LLVM IR now supports two new linkage
+ types (linker_private_weak and linker_private_weak_def_auto) which map
+ onto some obscure MachO concepts.
@@ -612,33 +662,38 @@ expose new optimization opportunities:
release includes a few major enhancements and additions to the optimizers:
+- As mentioned above, the optimizer now has support for updating debug
+ information as it goes. A key aspect of this is the new llvm.dbg.value
+ intrinsic. This intrinsic represents debug info for variables that are
+ promoted to SSA values (typically by mem2reg or the -scalarrepl passes).
+
+- The JumpThreading pass is now much more aggressive about implied value
+ relations, allowing it to thread conditions like "a == 4" when a is known to
+ be 13 in one of the predecessors of a block. It does this in conjunction
+ with the new LazyValueInfo analysis pass.
+- The new RegionInfo analysis pass identifies single-entry single-exit regions
+ in the CFG. You can play with it with the "opt -regions -analyze" or
+ "opt -view-regions" commands.
+- The loop optimizer has significantly improved strength reduction and analysis
+ capabilities. Notably it is able to build on the trap value and signed
+ integer overflow information to optimize <= and >= loops.
+- The CallGraphSCCPassManager now has some basic support for iterating within
+ an SCC when a optimizer devirtualizes a function call. This allows inlining
+ through indirect call sites that are devirtualized by store-load forwarding
+ and other optimizations.
+- The new -loweratomic pass is available
+ to lower atomic instructions into their non-atomic form. This can be useful
+ to optimize generic code that expects to run in a single-threaded
+ environment.
+
-
+
-
-
-
@@ -671,9 +726,9 @@ in.
The MC disassembler now fully supports ARM and Thumb. ARM assembler support
is still in early development though.
The X86 MC assembler now supports the X86 AES and AVX instruction set.
-Work on ELF and COFF support is well underway, but isn't useful yet in LLVM
- 2.8. Please contact the llvmdev mailing list if you're interested in
- this.
+Work on ELF and COFF object files and ARM target support is well underway,
+ but isn't useful yet in LLVM 2.8. Please contact the llvmdev mailing list
+ if you're interested in this.
For more information, please see the .
-
Target Independent Code Generator Improvements
@@ -697,35 +751,57 @@ infrastructure, which allows us to implement more aggressive algorithms and make
it run faster:
-
-
- MachineCSE tuned and on by default.
-
- Rewrote tblgen's type inference for backends to be more consistent and
- diagnose more target bugs. This also allows limited support for writing
- patterns for instructions that return multiple results, e.g. a virtual
- register and a flag result. Stuff that used 'parallel' before should use
- this.
-
- New -regalloc=fast, =local got removed
- New -regalloc=default option that chooses a register allocator based on the -O optimization level.
- New SubRegIndex tblgen class for targets -> jakob
-
- Bottom up fast isel. Simple Load reuse. No more machinedce.
- IR ABI: <3 x float> is passed as <4 x float> instead of 3 floats.
-
- New COPY instruction. copyRegToReg -> copyPhysReg, isMoveInstr is gone.
- RenderMachineFunction: -rendermf
- SplitKit?
- Evan: Teach bottom up pre-ra scheduler to track register pressure. Work in progress.
- Evan: Add an ILP scheduler. On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2 by 16%.
-
- New OptimizeExts+OptimizeCmps -> PeepholeOptimizer pass
- New LocalStackSlotAllocation.cpp pass (jimg)
- Atomics now get legalized when not natively supported (jim g)
-
- -ffunction-sections and -fdata-sections are supported on ELF targets.
- -momit-leaf-frame-pointer now supported.
+- The clang/gcc -momit-leaf-frame-pointer argument is now supported.
+- The clang/gcc -ffunction-sections and -fdata-sections arguments are now
+ supported on ELF targets (like GCC).
+- The MachineCSE pass is now tuned and on by default. It eliminates common
+ subexpressions that are exposed when lowering to machine instructions.
+- The "local" register allocator was replaced by a new "fast" register
+ allocator. This new allocator (which is often used at -O0) is substantially
+ faster and produces better code than the old local register allocator.
+- A new LLC "-regalloc=default" option is available, which automatically
+ chooses a register allocator based on the -O optimization level.
+- The common code generator code was modified to promote illegal argument and
+ return value vectors to wider ones when possible instead of scalarizing
+ them. For example, <3 x float> will now pass in one SSE register
+ instead of 3 on X86. This generates substantially better code since the
+ rest of the code generator was already expecting this.
+- The code generator uses a new "COPY" machine instruction. This speeds up
+ the code generator and eliminates the need for targets to implement the
+ isMoveInstr hook. Also, the copyRegToReg hook was renamed to copyPhysReg
+ and simplified.
+- The code generator now has a "LocalStackSlotPass", which optimizes stack
+ slot access for targets (like ARM) that have limited stack displacement
+ addressing.
+- A new "PeepholeOptimizer" is available, which eliminates sign and zero
+ extends, and optimizes away compare instructions when the condition result
+ is available from a previous instruction.
+- Atomic operations now get legalized into simpler atomic operations if not
+ natively supported, easing the implementation burden on targets.
+- We have added two new bottom-up pre-allocation register pressure aware schedulers:
+
+- The hybrid scheduler schedules aggressively to minimize schedule length when registers are available and avoid overscheduling in high pressure situations.
+- The instruction-level-parallelism scheduler schedules for maximum ILP when registers are available and avoid overscheduling in high pressure situations.
+
+- The tblgen type inference algorithm was rewritten to be more consistent and
+ diagnose more target bugs. If you have an out-of-tree backend, you may
+ find that it finds bugs in your target description. This support also
+ allows limited support for writing patterns for instructions that return
+ multiple results (e.g. a virtual register and a flag result). The
+ 'parallel' modifier in tblgen was removed, you should use the new support
+ for multiple results instead.
+- A new (experimental) "-rendermf" pass is available which renders a
+ MachineFunction into HTML, showing live ranges and other useful
+ details.
+- The new SubRegIndex tablegen class allows subregisters to be indexed
+ symbolically instead of numerically. If your target uses subregisters you
+ will need to adapt to use SubRegIndex when you upgrade to 2.8.
+
+
+- The -fast-isel instruction selection path (used at -O0 on X86) was rewritten
+ to work bottom-up on basic blocks instead of top down. This makes it
+ slightly faster (because the MachineDCE pass is not needed any longer) and
+ allows it to generate better code in some cases.
@@ -736,38 +812,46 @@ it run faster:
-
New features of the X86 target include:
+
New features and major changes in the X86 target include:
- The X86 backend now supports holding X87 floating point stack values
in registers across basic blocks, dramatically improving performance of code
- that uses long double, and when targetting CPUs that don't support SSE.
-
- New SSEDomainFix pass:
- On Nehalem and newer CPUs there is a 2 cycle latency penalty on using a
- register in a different domain than where it was defined. Some instructions
- have equvivalents for different domains, like por/orps/orpd. The
- SSEDomainFix pass tries to minimize the number of domain crossings by
- changing between equvivalent opcodes where possible.
-
- X86 backend attempts to promote 16-bit integer operations to 32-bits to avoid
- 0x66 prefixes, which are slow on some microarchitectures and bloat the code
- on others.
-
- New support for X86 "thiscall" calling convention (x86_thiscallcc in IR) for windows.
-
- New llvm.x86.int intrinsic (for int $42 and int3)
-
- Verbose assembly decodes X86 shuffle instructions, e.g.:
- insertps $113, %xmm3, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm3[1]
- unpcklps %xmm1, %xmm0 ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
- pshufd $1, %xmm1, %xmm1 ## xmm1 = xmm1[1,0,0,0]
+ that uses long double, and when targeting CPUs that don't support SSE.
+
+- The X86 backend now uses a SSEDomainFix pass to optimize SSE operations. On
+ Nehalem ("Core i7") and newer CPUs there is a 2 cycle latency penalty on
+ using a register in a different domain than where it was defined. This pass
+ optimizes away these stalls.
+
+- The X86 backend now promotes 16-bit integer operations to 32-bits when
+ possible. This avoids 0x66 prefixes, which are slow on some
+ microarchitectures and bloat the code on all of them.
+
+- The X86 backend now supports the Microsoft "thiscall" calling convention,
+ and a calling convention to support
+ ghc.
+
+- The X86 backend supports a new "llvm.x86.int" intrinsic, which maps onto
+ the X86 "int $42" and "int3" instructions.
+
+- At the IR level, the <2 x float> datatype is now promoted and passed
+ around as a <4 x float> instead of being passed and returned as an MMX
+ vector. If you have a frontend that uses this, please pass and return a
+ <2 x i32> instead (using bitcasts).
+
+- When printing .s files in verbose assembly mode (the default for clang -S),
+ the X86 backend now decodes X86 shuffle instructions and prints human
+ readable comments after the most inscrutable of them, e.g.:
+
+
+ insertps $113, %xmm3, %xmm0 # xmm0 = zero,xmm0[1,2],xmm3[1]
+ unpcklps %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+ pshufd $1, %xmm1, %xmm1 # xmm1 = xmm1[1,0,0,0]
+
+
- X86 ABI: <2 x float> in IR no longer maps onto MMX, it turns into <4 x float>
-
- new GHC calling convention
-
@@ -782,14 +866,22 @@ it run faster:
-
- NEON: Better performance for QQQQ (4-consecutive Q register) instructions. New reg sequence abstraction?
- ARM: Better scheduling (list-hybrid, hybrid?)
- ARM: Tail call support.
- ARM: General performance work and tuning.
-
- ARM: Half float support through intrinsics LangRef.html#int_fp16
-- ARMGlobalMerge:
+- The ARM backend now optimizes tail calls into jumps.
+- Scheduling is improved through the new list-hybrid scheduler as well
+ as through better modeling of structural hazards.
+- Half float instructions are now
+ supported.
+- NEON support has been improved to model instructions which operate onto
+ multiple consecutive registers more aggressively. This avoids lots of
+ extraneous register copies.
+- The ARM backend now uses a new "ARMGlobalMerge" pass, which merges several
+ global variables into one, saving extra address computation (all the global
+ variables can be accessed via same base address) and potentially reducing
+ register pressure.
+
+- The ARM backend has received many minor improvements and tweaks which lead
+ to substantially better performance in a wide range of different scenarios.
+
- The ARM NEON intrinsics have been substantially reworked to reduce
redundancy and improve code generation. Some of the major changes are:
@@ -807,7 +899,7 @@ it run faster:
-
The llvm.arm.neon.vabdl and llvm.arm.neon.vabal intrinsics (lengthening
- vector absolute difference with and without accumlation) have been removed.
+ vector absolute difference with and without accumulation) have been removed.
They are represented using the llvm.arm.neon.vabd intrinsic (vector absolute
difference) followed by a vector zero-extend operation, and for vabal,
a vector add.
@@ -839,39 +931,8 @@ it run faster:
-
-
-
-
-
-
-
-
-
This release includes a number of new APIs that are used internally, which
- may also be useful for external clients.
-
-
-
-
-
-
-
-
-
-
-
Other miscellaneous features include:
-
-
-
@@ -888,7 +949,7 @@ from the previous release.
- The build configuration machinery changed the output directory names. It
- wasn't clear to many people that "Release-Asserts" build was a release build
+ wasn't clear to many people that a "Release-Asserts" build was a release build
without asserts. To make this more clear, "Release" does not include
assertions and "Release+Asserts" does (likewise, "Debug" and
"Debug+Asserts").
@@ -903,6 +964,9 @@ from the previous release.
- If you're used to reading .ll files, you'll probably notice that .ll file
dumps don't produce #uses comments anymore. To get them, run a .bc file
through "llvm-dis --show-annotations".
+- Target triples are now stored in a normalized form, and all inputs from
+ humans are expected to be normalized by Triple::normalize before being
+ stored in a module triple or passed to another library.
@@ -928,7 +992,7 @@ API changes are:
operands are now address-space qualified.
If you were creating these intrinsic calls and prototypes yourself (as opposed
to using Intrinsic::getDeclaration), you can use
- UpgradeIntrinsicFunction/UpgradeIntrinsicCall to be portable accross releases.
+ UpgradeIntrinsicFunction/UpgradeIntrinsicCall to be portable across releases.
SetCurrentDebugLocation takes a DebugLoc now instead of a MDNode.
@@ -947,9 +1011,20 @@ API changes are:
LLVM. The Triple::normalize utility method has been added to help front-ends
deal with funky triples.
+
+ The signature of the GCMetadataPrinter::finishAssembly virtual
+ function changed: the raw_ostream and MCAsmInfo arguments
+ were dropped. GC plugins which compute stack maps must be updated to avoid
+ having the old definition overload the new signature.
+
+
+ The signature of MemoryBuffer::getMemBuffer changed. Unfortunately
+ calls intended for the old version still compile, but will not work correctly,
+ leading to a confusing error about an invalid header in the bitcode.
+
- Some APIs got renamed:
+ Some APIs were renamed:
- llvm_report_error -> report_fatal_error
- llvm_install_error_handler -> install_fatal_error_handler
@@ -958,10 +1033,56 @@ API changes are:
+
+ Some public headers were renamed:
+
+ - llvm/Assembly/AsmAnnotationWriter.h was renamed
+ to llvm/Assembly/AssemblyAnnotationWriter.h
+
+
+
+
+
+
+
+
This section lists changes to the LLVM development infrastructure. This
+mostly impacts users who actively work on LLVM or follow development on
+mainline, but may also impact users who leverage the LLVM build infrastructure
+or are interested in LLVM qualification.
+
+
+ - The default for make check is now to use
+ the lit testing tool, which is
+ part of LLVM itself. You can use lit directly as well, or use
+ the llvm-lit tool which is created as part of a Makefile or CMake
+ build (and knows how to find the appropriate tools). See the lit
+ documentation and the blog
+ post, and PR5217
+ for more information.
+
+ - The LLVM test-suite infrastructure has a new "simple" test format
+ (make TEST=simple). The new format is intended to require only a
+ compiler and not a full set of LLVM tools. This makes it useful for testing
+ released compilers, for running the test suite with other compilers (for
+ performance comparisons), and makes sure that we are testing the compiler as
+ users would see it. The new format is also designed to work using reference
+ outputs instead of comparison to a baseline compiler, which makes it run much
+ faster and makes it less system dependent.
+
+ - Significant progress has been made on a new interface to running the
+ LLVM test-suite (aka the LLVM "nightly tests") using
+ the LNT infrastructure. The LNT
+ interface to the test-suite brings significantly improved reporting
+ capabilities for monitoring the correctness and generated code quality
+ produced by LLVM over time.
+
+
@@ -993,7 +1114,7 @@ components, please contact us on the
LLVMdev list.
-- The Alpha, Blackfin, CellSPU, MicroBlaze, MSP430, MIPS, PIC16, SystemZ
+
- The Alpha, Blackfin, CellSPU, MicroBlaze, MSP430, MIPS, SystemZ
and XCore backends are experimental.
- llc "-filetype=obj" is experimental on all targets
other than darwin-i386 and darwin-x86_64.
@@ -1141,37 +1262,9 @@ Depending on it for anything serious is not advised.
4.2. If you are interested in Fortran, we recommend that you consider using
dragonegg instead.
-The llvm-gcc 4.2 Ada compiler has basic functionality. However, this is not a
-mature technology, and problems should be expected. For example:
-
-- The Ada front-end currently only builds on X86-32. This is mainly due
-to lack of trampoline support (pointers to nested functions) on other platforms.
-However, it also fails to build on X86-64
-which does support trampolines.
-- The Ada front-end fails to bootstrap.
-This is due to lack of LLVM support for setjmp/longjmp style
-exception handling, which is used internally by the compiler.
-Workaround: configure with --disable-bootstrap.
-- The c380004, c393010
-and cxg2021 ACATS tests fail
-(c380004 also fails with gcc-4.2 mainline).
-If the compiler is built with checks disabled then c393010
-causes the compiler to go into an infinite loop, using up all system memory.
-- Some GCC specific Ada tests continue to crash the compiler.
-- The -E binder option (exception backtraces)
-does not work and will result in programs
-crashing if an exception is raised. Workaround: do not use -E.
-- Only discrete types are allowed to start
-or finish at a non-byte offset in a record. Workaround: do not pack records
-or use representation clauses that result in a field of a non-discrete type
-starting or finishing in the middle of a byte.
-- The lli interpreter considers
-'main' as generated by the Ada binder to be invalid.
-Workaround: hand edit the file to use pointers for argv and
-envp rather than integers.
-- The -fstack-check option is
-ignored.
-
+The llvm-gcc 4.2 Ada compiler has basic functionality, but is no longer being
+actively maintained. If you are interested in Ada, we recommend that you
+consider using dragonegg instead.