- <!-- to write:
- MachineCSE tuned and on by default.
- llvm.dbg.value: variable debug info for optimized code
- MC Assembler backend is now real, does relaxation and is bitwise identical
- with darwin assembler in huge majority of all cases.
- new GHC calling convention
- New half float intrinsics LangRef.html#int_fp16
- Rewrote tblgen's type inference for backends to be more consistent and
- diagnose more target bugs. This also allows limited support for writing
- patterns for instructions that return multiple results, e.g. a virtual
- register and a flag result. Stuff that used 'parallel' before should use
- this.
- New ARM/Thumb disassembler support in MC.
- New SSEDomainFix pass:
- On Nehalem and newer CPUs there is a 2 cycle latency penalty on using a
- register in a different domain than where it was defined. Some instructions
- have equvivalents for different domains, like por/orps/orpd. The
- SSEDomainFix pass tries to minimize the number of domain crossings by
- changing between equvivalent opcodes where possible.
- Support for the Intel AES instructions in the assembler.
- memcpy, memmove, and memset now take address space qualified pointers + volatile.
- per-instruction debug info metadata is much faster and uses less space (new DebugLoc class).
- -ffunction-sections and -fdata-sections are supported on ELF targets.
- Now iterate function passes when a cgsccpassmanager detects a devirtualization
- -momit-leaf-frame-pointer now supported.
- New -regalloc=fast, =local got removed
- New -regalloc=default option that chooses a register allocator based on the -O optimization level.
- New "trap values" concept: http://llvm.org/docs/LangRef.html#trapvalues
- Improved trip count analysis for <= and >= loops, and uses sign overflow info.
- REMOVED: SCCVN pass.
- X86 backend attempts to promote 16-bit integer operations to 32-bits to avoid
- 0x66 prefixes, which are slow on some microarchitectures and bloat the code
- on others.
- X87 fp stackifier is global!
- LTO debug info support?
- NEON: Better performance for QQQQ (4-consecutive Q register) instructions. New reg sequence abstraction?
- New support for X86 "thiscall" calling convention (x86_thiscallcc in IR).
- ARM: Better scheduling (list-hybrid, hybrid?)
- New SubRegIndex tblgen class for targets -> jakob
- ARM: Tail call support.
- AVX support in the MC assembler. Full compiler support not done yet.
- Atomics now get legalized when not natively supported (jim g)
- ARM: General performance work and tuning.
- Bottom up fast isel. Simple Load reuse. No more machinedce. Load folding at -O0?
- New linker_private_weak and linker_private_weak_def_auto linkage types
- compiler_rt softfloat support.
- X86 ABI: <2 x float> in IR no longer maps onto MMX, it turns into <4 x float>
- IR ABI: <3 x float> is passed as <4 x float> instead of 3 floats.
- renamed "Release" -> "Release+Asserts"; "Release-Asserts" -> "Release etc.
- New COPY instruction. copyRegToReg -> copyPhysReg, isMoveInstr is gone.
- JumpThreading much more aggressive about implied value relations.
- New RegionInfo pass "opt -regions analyze" or "opt -view-regions".
- mc assembler supports macros.
- RenderMachineFunction: -rendermf
- SplitKit?
- Evan: Teach bottom up pre-ra scheduler to track register pressure. Work in progress.
- Evan: Add an ILP scheduler. On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2 by 16%.
- RegisterPass<> -> INTIALIZE_PASS()
- llvm-diff?
- Preliminary work on TBAA but not usable in 2.8.
- Atomic lowering patch: -loweratomic (see Passes.html#loweratomic)
- compiler_rt now includes extensive a fairly testsuite for blocks language feature and the blocks runtime.
- New OptimizeExts+OptimizeCmps -> PeepholeOptimizer pass
- Triples are now stored in normalized form. Triple::normalize.
- New LocalStackSlotAllocation.cpp pass (jimg)
- New llvm.x86.int intrinsic (for int $42 and int3)
- New CorrelatedValuePropagation pass, not on by default in 2.8 yet.
- Verbose assembly decodes X86 shuffle instructions, e.g.:
- insertps $113, %xmm3, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm3[1]
- unpcklps %xmm1, %xmm0 ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
- pshufd $1, %xmm1, %xmm1 ## xmm1 = xmm1[1,0,0,0]
- -->
-