+<p>We have put a significant amount of work into the code generator
+infrastructure, which allows us to implement more aggressive algorithms and make
+it run faster:</p>
+
+<ul>
+<li>The clang/gcc -momit-leaf-frame-pointer argument is now supported.</li>
+<li>The clang/gcc -ffunction-sections and -fdata-sections arguments are now
+ supported on ELF targets (like GCC).</li>
+<li>The MachineCSE pass is now tuned and on by default. It eliminates common
+ subexpressions that are exposed when lowering to machine instructions.</li>
+<li>The "local" register allocator was replaced by a new "fast" register
+ allocator. This new allocator (which is often used at -O0) is substantially
+ faster and produces better code than the old local register allocator.</li>
+<li>A new LLC "-regalloc=default" option is available, which automatically
+ chooses a register allocator based on the -O optimization level.</li>
+<li>The common code generator code was modified to promote illegal argument and
+ return value vectors to wider ones when possible instead of scalarizing
+ them. For example, <3 x float> will now pass in one SSE register
+ instead of 3 on X86. This generates substantially better code since the
+ rest of the code generator was already expecting this.</li>
+<li>The code generator uses a new "COPY" machine instruction. This speeds up
+ the code generator and eliminates the need for targets to implement the
+ isMoveInstr hook. Also, the copyRegToReg hook was renamed to copyPhysReg
+ and simplified.</li>
+<li>The code generator now has a "LocalStackSlotPass", which optimizes stack
+ slot access for targets (like ARM) that have limited stack displacement
+ addressing.</li>
+<li>A new "PeepholeOptimizer" is available, which eliminates sign and zero
+ extends, and optimizes away compare instructions when the condition result
+ is available from a previous instruction.</li>
+<li>Atomic operations now get legalized into simpler atomic operations if not
+ natively supported, easing the implementation burden on targets.</li>
+<li>We have added two new bottom-up pre-allocation register pressure aware schedulers:
+<ol>
+<li>The hybrid scheduler schedules aggressively to minimize schedule length when registers are available and avoid overscheduling in high pressure situations.</li>
+<li>The instruction-level-parallelism scheduler schedules for maximum ILP when registers are available and avoid overscheduling in high pressure situations.</li>
+</ol></li>
+<li>The tblgen type inference algorithm was rewritten to be more consistent and
+ diagnose more target bugs. If you have an out-of-tree backend, you may
+ find that it finds bugs in your target description. This support also
+ allows limited support for writing patterns for instructions that return
+ multiple results (e.g. a virtual register and a flag result). The
+ 'parallel' modifier in tblgen was removed, you should use the new support
+ for multiple results instead.</li>
+<li>A new (experimental) "-rendermf" pass is available which renders a
+ MachineFunction into HTML, showing live ranges and other useful
+ details.</li>
+<li>The new SubRegIndex tablegen class allows subregisters to be indexed
+ symbolically instead of numerically. If your target uses subregisters you
+ will need to adapt to use SubRegIndex when you upgrade to 2.8.</li>
+<!-- SplitKit -->
+
+<li>The -fast-isel instruction selection path (used at -O0 on X86) was rewritten
+ to work bottom-up on basic blocks instead of top down. This makes it
+ slightly faster (because the MachineDCE pass is not needed any longer) and
+ allows it to generate better code in some cases.</li>
+
+</ul>