From: Chris Lattner
Date: Wed, 21 Apr 2010 06:42:24 +0000 (+0000)
Subject: final hacking for tonight, still more to go.
X-Git-Url: http://demsky.eecs.uci.edu/git/?a=commitdiff_plain;h=a54c1f70b8bffa78316d1447756d5ba400bda895;p=oota-llvm.git
final hacking for tonight, still more to go.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@101995 91177308-0d34-0410-b5e6-96231b3b80d8
---
diff --git a/docs/ReleaseNotes.html b/docs/ReleaseNotes.html
index 9b65c6f3d54..129a4057c7b 100644
--- a/docs/ReleaseNotes.html
+++ b/docs/ReleaseNotes.html
@@ -501,28 +501,48 @@ release includes a few major enhancements and additions to the optimizers:
-
...
-Inliner reuses arrays allocas when inlining multiple callers to reduce stack usage.
-Optimal Edge Profiling?
-Instcombine is now a library, has its own IRBuilder to simplify itself.
-Better code size analysis in loop unswitch, inliner code split out to a new
- CodeMetrics class for reuse.
-Many changes to the pass ordering for improved optimization effectiveness.
-BasicAA improved to be less dependent on "type safe" pointers, it can now look
- through bitcasts more aggressively.
-GVN PHI Translation improvements. blog post: http://blog.llvm.org/2009/12/advanced-topics-in-redundant-load.html
-New SCEV AA pass: -scev-aa
-Target data now has notion of 'native' integer data types which optimizations can use.
-Opt now works conservatively if no target data is set (is this fully working?)
-New Analysis/InstructionSimplify.h interface for simplifying instructions that don't exist.
-Jump threading is now much more aggressive at simplifying correlated
+
Inliner reuses arrays allocas when inlining multiple callers to reduce stack usage.
+
Instcombine is now a library, has its own IRBuilder to simplify itself.
+
Better code size analysis in loop unswitch, inliner code split out to a new
+ CodeMetrics class for reuse.
+
Many changes to the pass ordering for improved optimization
+ effectiveness.
+
BasicAA improved to be less dependent on "type safe" pointers, it can now look
+ through bitcasts more aggressively.
+
GVN PHI Translation improvements. blog post: http://blog.llvm.org/2009/12/advanced-topics-in-redundant-load.html
+
New SCEV AA pass: -scev-aa
+
Target data now has notion of 'native' integer data types which optimizations can use.
+
Opt now works conservatively if no target data is set (is this fully working?)
+
New Analysis/InstructionSimplify.h interface for simplifying instructions that don't exist.
+
Jump threading is now much more aggressive at simplifying correlated
conditionals and threading blocks with otherwise complex logic. CondProp pass
- removed (functionality merged into jump threading).
-New SSAUpdater and MachineSSAUpdater classes for unstructured ssa updating,
+ removed (functionality merged into jump threading).
+
New SSAUpdater and MachineSSAUpdater classes for unstructured ssa updating,
changed jump threading, GVN, etc to use it which simplified them and speed
- them up.
+ them up.
+
+The Optimal Edge Profiling implementation in 2.6 was more a proof of
+concept. The current implementation (the one that will go into 2.7) is
+now stable and (as far as my tests go) bug free.
+
+The profiling with instrumentation via "opt" and analysis via the tool
+"llvm-prof" should Work As Expected (TM).
+
+Two things are missing:
+
+*) Still missing is the modification of all -std-compile-opt passes to
+update the profiling information according to the changes made to the
+CFG, I'm planning to do this after my master thesis is finished. This
+will enable all passes to use the ProfileInfo if available and base
+decisions on that information.
+
+*) GCC has the options "-pg", "-fprofile-arcs" and "--coverage" that
+insert profiling code and "-fprofile-use" to use them the next time
+during compilation. I guess this options should also work properly in
+llvm-gcc and clang?
+
@@ -568,25 +588,20 @@ it run faster:
New instruction selector [blog post?].
-
-Code generator MC'ized except for debug info and EH.
-
-New CodeGen Level CSE
-Combiner-AA improvements, why not on by default?
-Pre-regalloc tail duplication
-New LSR with "full strength reduction" mode. Description?
-Codegen level OptimizeExtsPass pass, takes advantage of x86 subregs.
-Support for the GCC option -fno-schedule-insns
-non-temporal load/store
-MachineSSAUpdater.h
-X86 and XCore supports returning arbitrary return values, returning too many values is
- supported by returning through a hidden pointer.
-verbose-asm now produces information about spill slots and loop nests
-GHC Haskell ABI / calling conv support.
-Many improvements to debug info
-
-
-
...
+
New LSR with "full strength reduction" mode. Description?
+
Code generator MC'ized except for debug info and EH.
+
New CodeGen Level CSE
+
Combiner-AA improvements, why not on by default?
+
Pre-regalloc tail duplication
+
Codegen level OptimizeExtsPass pass, takes advantage of x86 subregs.
+
Support for the GCC option -fno-schedule-insns
+
Non-temporal load/store, only implemented on X86, see LangRef.html#i_load.
+
MachineSSAUpdater.h
+
X86 and XCore supports returning arbitrary return values, returning too many values is
+ supported by returning through a hidden pointer.
+
verbose-asm now produces information about spill slots and loop nests
+
GHC Haskell ABI / calling conv support.
+
Many improvements to debug info
@@ -600,10 +615,13 @@ Many improvements to debug info
+
The X86 backend now optimizes tails calls much more aggressively for
+ functions that use the standard C calling convention.
+
The X86 backend now models scalar SSE registers as subregs of the SSE vector
+ registers, making the code generator more aggressive in cases where scalars
+ and vector types are mixed.
-
PostRA scheduler for X86?
-
x86 sibcall / tailcall optimization in CCC mode.
-
X86: XMM subreg modeling for extraction of the low element.
+
PostRA scheduler for X86? FIXME: is this on by default in 2.7?
@@ -917,9 +920,6 @@ compilation, and lacks support for debug information.
-
Support for the Advanced SIMD (Neon) instruction set is still incomplete
-and not well tested. Some features may not work at all, and the code quality
-may be poor in some cases.
Thumb mode works only on ARMv6 or higher processors. On sub-ARMv6
processors, thumb programs can crash or produce wrong
results (PR1388).