add a note that is important for some fp apps.

author Chris Lattner <sabre@nondot.org>

Mon, 7 Jan 2008 21:59:58 +0000 (21:59 +0000)

committer Chris Lattner <sabre@nondot.org>

Mon, 7 Jan 2008 21:59:58 +0000 (21:59 +0000)
author Chris Lattner <sabre@nondot.org>
Mon, 7 Jan 2008 21:59:58 +0000 (21:59 +0000)
committer Chris Lattner <sabre@nondot.org>
Mon, 7 Jan 2008 21:59:58 +0000 (21:59 +0000)
diff --git a/lib/Target/X86/README.txt b/lib/Target/X86/README.txt

index 759c7acf389173e3d52de301f3720cb1855c3751..be37acfa8a240243c5aca551abcd8a055a5df2b9 100644 (file)
--- a/lib/Target/X86/README.txt
+++ b/lib/Target/X86/README.txt
@@ -1597,3 +1597,24 @@ a stride-4 IV, would would allow all the scales in the loop to go away.
  This would result in smaller code and more efficient microops.
  
  //===---------------------------------------------------------------------===//
+
+In SSE mode, we turn abs and neg into a load from the constant pool plus a xor
+or and instruction, for example:
+
+       xorpd   LCPI2_0-"L2$pb"(%esi), %xmm2
+
+However, if xmm2 gets spilled, we end up with really ugly code like this:
+
+       %xmm2 = reload [mem]
+       xorpd   LCPI2_0-"L2$pb"(%esi), %xmm2
+       store %xmm2 -> [mem]
+
+Since we 'know' that this is a 'neg', we can actually "fold" the spill into
+the neg/abs instruction, turning it into an *integer* operation, like this:
+
+       xorl 2147483648, [mem+4]     ## 2147483648 = (1 << 31)
+
+you could also use xorb, but xorl is less likely to lead to a partial register
+stall.
+
+//===---------------------------------------------------------------------===//
author	Chris Lattner <sabre@nondot.org>
	Mon, 7 Jan 2008 21:59:58 +0000 (21:59 +0000)
committer	Chris Lattner <sabre@nondot.org>
	Mon, 7 Jan 2008 21:59:58 +0000 (21:59 +0000)