-
New features of the X86 target include:
+
New features and major changes in the X86 target include:
@@ -768,30 +769,38 @@ it run faster:
in registers across basic blocks, dramatically improving performance of code
that uses long double, and when targetting CPUs that don't support SSE.
- New SSEDomainFix pass:
- On Nehalem and newer CPUs there is a 2 cycle latency penalty on using a
- register in a different domain than where it was defined. Some instructions
- have equvivalents for different domains, like por/orps/orpd. The
- SSEDomainFix pass tries to minimize the number of domain crossings by
- changing between equvivalent opcodes where possible.
-
- X86 backend attempts to promote 16-bit integer operations to 32-bits to avoid
- 0x66 prefixes, which are slow on some microarchitectures and bloat the code
- on others.
-
- New support for X86 "thiscall" calling convention (x86_thiscallcc in IR) for windows.
-
- New llvm.x86.int intrinsic (for int $42 and int3)
-
- Verbose assembly decodes X86 shuffle instructions, e.g.:
- insertps $113, %xmm3, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm3[1]
- unpcklps %xmm1, %xmm0 ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
- pshufd $1, %xmm1, %xmm1 ## xmm1 = xmm1[1,0,0,0]
+- The X86 backend now uses a SSEDomainFix pass to optimize SSE operations. On
+ Nehalem ("Core i7") and newer CPUs there is a 2 cycle latency penalty on
+ using a register in a different domain than where it was defined. This pass
+ optimizes away these stalls.
+
+- The X86 backend now promote 16-bit integer operations to 32-bits when
+ possible. This avoids 0x66 prefixes, which are slow on some
+ microarchitectures and bloat the code on all of them.
+
+- The X86 backend now supports the Microsoft "thiscall" calling convention,
+ and a calling convention to support
+ ghc.
+
+- The X86 backend supports a new "llvm.x86.int" intrinsic, which maps onto
+ the X86 "int $42" and "int3" instructions.
+
+- At the IR level, the <2 x float> datatype is now promoted and passed
+ around as a <4 x float> instead of being passed and returns as an MMX
+ vector. If you have a frontend that uses this, please pass and return a
+ <2 x i32> instead (using bitcasts).
+
+- When printing .s files in verbose assembly mode (the default for clang -S),
+ the X86 backend now decodes X86 shuffle instructions and prints human
+ readable comments after the most inscrutible of them, e.g.:
+
+
+ insertps $113, %xmm3, %xmm0 # xmm0 = zero,xmm0[1,2],xmm3[1]
+ unpcklps %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+ pshufd $1, %xmm1, %xmm1 # xmm1 = xmm1[1,0,0,0]
+
+
- X86 ABI: <2 x float> in IR no longer maps onto MMX, it turns into <4 x float>
-
- new GHC calling convention
-
@@ -806,14 +815,21 @@ it run faster: