Target Independent Code Generator Improvements
@@ -715,35 +699,57 @@ infrastructure, which allows us to implement more aggressive algorithms and make
it run faster:
-
-
- MachineCSE tuned and on by default.
-
- Rewrote tblgen's type inference for backends to be more consistent and
- diagnose more target bugs. This also allows limited support for writing
- patterns for instructions that return multiple results, e.g. a virtual
- register and a flag result. Stuff that used 'parallel' before should use
- this.
-
- New -regalloc=fast, =local got removed
- New -regalloc=default option that chooses a register allocator based on the -O optimization level.
- New SubRegIndex tblgen class for targets -> jakob
-
- Bottom up fast isel. Simple Load reuse. No more machinedce.
- IR ABI: <3 x float> is passed as <4 x float> instead of 3 floats.
-
- New COPY instruction. copyRegToReg -> copyPhysReg, isMoveInstr is gone.
- RenderMachineFunction: -rendermf
- SplitKit?
- Evan: Teach bottom up pre-ra scheduler to track register pressure. Work in progress.
- Evan: Add an ILP scheduler. On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2 by 16%.
-
- New OptimizeExts+OptimizeCmps -> PeepholeOptimizer pass
- New LocalStackSlotAllocation.cpp pass (jimg)
- Atomics now get legalized when not natively supported (jim g)
-
- -ffunction-sections and -fdata-sections are supported on ELF targets.
- -momit-leaf-frame-pointer now supported.
+- The clang/gcc -momit-leaf-frame-pointer argument is now supported.
+- The clang/gcc -ffunction-sections and -fdata-sections arguments are now
+ supported on ELF targets (like GCC).
+- The MachineCSE pass is now tuned and on by default. It eliminates common
+ subexpressions that are exposed when lowering to machine instructions.
+- The "local" register allocator was replaced by a new "fast" register
+ allocator. This new allocator (which is often used at -O0) is substantially
+ faster and produces better code than the old local register allocator.
+- A new LLC "-regalloc=default" option is available, which automatically
+ chooses a register allocator based on the -O optimization level.
+- The common code generator code was modified to promote illegal argument and
+ return value vectors to wider ones when possible instead of scalarizing
+ them. For example, <3 x float> will now pass in one SSE register
+ instead of 3 on X86. This generates substantially better code since the
+ rest of the code generator was already expecting this.
+- The code generator uses a new "COPY" machine instruction. This speeds up
+ the code generator and eliminates the need for targets to implement the
+ isMoveInstr hook. Also, the copyRegToReg hook was renamed to copyPhysReg
+ and simplified.
+- The code generator now has a "LocalStackSlotPass", which optimizes stack
+ slot access for targets (like ARM) that have limited stack displacement
+ addressing.
+- A new "PeepholeOptimizer" is available, which eliminates sign and zero
+ extends, and optimizes away compare instructions when the condition result
+ is available from a previous instruction.
+- Atomic operations now get legalized into simpler atomic operations if not
+ natively supported, easy the implementation burden on targets.
+- The bottom-up pre-allocation scheduler is now register pressure aware,
+ allowing it to avoid overscheduling in high pressure situations while still
+ aggressively scheduling when registers are available.
+- A new instruction-level-parallelism pre-allocation scheduler is available,
+ which is also register pressure aware. This scheduler has shown substantial
+ wins on X86-64 and is on by default.
+- The tblgen type inference algorithm was rewritten to be more consistent and
+ diagnose more target bugs. If you have an out-of-tree backend, you may
+ find that it finds bugs in your target description. This support also
+ allows limited support for writing patterns for instructions that return
+ multiple results (e.g. a virtual register and a flag result). The
+ 'parallel' modifier in tblgen was removed, you should use the new support
+ for multiple results instead.
+- A new (experimental) "-rendermf" pass is available which renders a
+ MachineFunction into HTML, showing live ranges and other useful
+ details.
+
+
+
+
+- The -fast-isel instruction selection path (used at -O0 on X86) was rewritten
+ to work bottom-up on basic blocks instead of top down. This makes it
+ slightly faster (because the MachineDCE pass is not needed any longer) and
+ allows it to generate better code in some cases.
@@ -860,24 +866,6 @@ it run faster:
-
-
-
-
This release includes a number of new APIs that are used internally, which
- may also be useful for external clients.
-
-
-
-
-
-
-