From: Chris Lattner Date: Mon, 4 Oct 2010 04:39:25 +0000 (+0000) Subject: checkpoint, the release notes are now feature complete. X-Git-Url: http://demsky.eecs.uci.edu/git/?a=commitdiff_plain;h=7714c91533b5039216f7330555bd45febd7d8fd0;p=oota-llvm.git checkpoint, the release notes are now feature complete. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115495 91177308-0d34-0410-b5e6-96231b3b80d8 --- diff --git a/docs/ReleaseNotes.html b/docs/ReleaseNotes.html index 48d5c6fe5cd..29de47c49ec 100644 --- a/docs/ReleaseNotes.html +++ b/docs/ReleaseNotes.html @@ -742,8 +742,9 @@ it run faster:

  • A new (experimental) "-rendermf" pass is available which renders a MachineFunction into HTML, showing live ranges and other useful details.
  • - - +
  • The new SubRegIndex tablegen class allows subregisters to be indexed + symbolically instead of numerically. If your target uses subregisters you + will need to adapt to use SubRegIndex when you upgrade to 2.8.
  • The -fast-isel instruction selection path (used at -O0 on X86) was rewritten @@ -760,7 +761,7 @@ it run faster:

    -

    New features of the X86 target include: +

    New features and major changes in the X86 target include:

      @@ -768,30 +769,38 @@ it run faster:

      in registers across basic blocks, dramatically improving performance of code that uses long double, and when targetting CPUs that don't support SSE. - New SSEDomainFix pass: - On Nehalem and newer CPUs there is a 2 cycle latency penalty on using a - register in a different domain than where it was defined. Some instructions - have equvivalents for different domains, like por/orps/orpd. The - SSEDomainFix pass tries to minimize the number of domain crossings by - changing between equvivalent opcodes where possible. - - X86 backend attempts to promote 16-bit integer operations to 32-bits to avoid - 0x66 prefixes, which are slow on some microarchitectures and bloat the code - on others. - - New support for X86 "thiscall" calling convention (x86_thiscallcc in IR) for windows. - - New llvm.x86.int intrinsic (for int $42 and int3) - - Verbose assembly decodes X86 shuffle instructions, e.g.: - insertps $113, %xmm3, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm3[1] - unpcklps %xmm1, %xmm0 ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] - pshufd $1, %xmm1, %xmm1 ## xmm1 = xmm1[1,0,0,0] +
    • The X86 backend now uses a SSEDomainFix pass to optimize SSE operations. On + Nehalem ("Core i7") and newer CPUs there is a 2 cycle latency penalty on + using a register in a different domain than where it was defined. This pass + optimizes away these stalls.
    • + +
    • The X86 backend now promote 16-bit integer operations to 32-bits when + possible. This avoids 0x66 prefixes, which are slow on some + microarchitectures and bloat the code on all of them.
    • + +
    • The X86 backend now supports the Microsoft "thiscall" calling convention, + and a calling convention to support + ghc.
    • + +
    • The X86 backend supports a new "llvm.x86.int" intrinsic, which maps onto + the X86 "int $42" and "int3" instructions.
    • + +
    • At the IR level, the <2 x float> datatype is now promoted and passed + around as a <4 x float> instead of being passed and returns as an MMX + vector. If you have a frontend that uses this, please pass and return a + <2 x i32> instead (using bitcasts).
    • + +
    • When printing .s files in verbose assembly mode (the default for clang -S), + the X86 backend now decodes X86 shuffle instructions and prints human + readable comments after the most inscrutible of them, e.g.: + +
      +  insertps $113, %xmm3, %xmm0 # xmm0 = zero,xmm0[1,2],xmm3[1]
      +  unpcklps %xmm1, %xmm0       # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
      +  pshufd   $1, %xmm1, %xmm1   # xmm1 = xmm1[1,0,0,0]
      +
      +
    • - X86 ABI: <2 x float> in IR no longer maps onto MMX, it turns into <4 x float> - - new GHC calling convention -
    @@ -806,14 +815,21 @@ it run faster:

    - - - - - -
    -

    Other miscellaneous features include:

    -
      -
    -