Programming Languages Research Group: Git

author	Sanjay Patel <spatel@rotateright.com>
	Tue, 11 Nov 2014 20:51:00 +0000 (20:51 +0000)
committer	Sanjay Patel <spatel@rotateright.com>
	Tue, 11 Nov 2014 20:51:00 +0000 (20:51 +0000)
commit	e7c966f0673009b82e87d3e9c50e1216efe721bb
tree	650c7d32e06bd88ea1a30b299b7bc5505759ad9e	tree \| snapshot
parent	612f7d7e00c360f065775aa5d9e32cf40b5214c1	commit \| diff

Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385).

This is a first step for generating SSE rcp instructions for reciprocal
calcs when fast-math allows it. This is very similar to the rsqrt optimization
enabled in D5658 ( http://reviews.llvm.org/rL220570 ).

For now, be conservative and only enable this for AMD btver2 where performance
improves significantly both in terms of latency and throughput.

We may never enable this codegen for Intel Core* chips because the divider circuits
are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21
cycle critical path for the rcp + mul + sub + mul + add estimate.

Follow-on patches may allow configuration of the number of Newton-Raphson refinement
steps, add AVX512 support, and enable the optimization for more chips.

More background here: http://llvm.org/bugs/show_bug.cgi?id=21385

Differential Revision: http://reviews.llvm.org/D6175

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221706 91177308-0d34-0410-b5e6-96231b3b80d8

lib/Target/X86/X86.td		diff \| blob \| history
lib/Target/X86/X86ISelLowering.cpp		diff \| blob \| history
lib/Target/X86/X86ISelLowering.h		diff \| blob \| history
lib/Target/X86/X86Subtarget.h		diff \| blob \| history
test/CodeGen/X86/recip-fastmath.ll	[new file with mode: 0644]	blob