Programming Languages Research Group: Git

author	Sanjay Patel <spatel@rotateright.com>
	Fri, 20 Mar 2015 21:19:52 +0000 (21:19 +0000)
committer	Sanjay Patel <spatel@rotateright.com>
	Fri, 20 Mar 2015 21:19:52 +0000 (21:19 +0000)
commit	39110ecd35f9ed643bf335b94789871b297bf03a
tree	0bd7fd0f1f38d5c987c12be1c4b0358edc01e0df	tree \| snapshot
parent	5155a78d187b8bb9311be87aaf3f8f7046d7ca21	commit \| diff

[X86] Prefer blendps over insertps codegen for one special case

With this patch, for this one exact case, we'll generate:

blendps %xmm0, %xmm1, $1

instead of:

insertps %xmm0, %xmm1, $0

If there's a memory operand available for load folding and we're
optimizing for size, we'll still generate the insertps.

The detailed performance data motivation for this may be found in D7866;
in summary, blendps has 2-3x throughput vs. insertps on widely used chips.

Differential Revision: http://reviews.llvm.org/D8332

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232850 91177308-0d34-0410-b5e6-96231b3b80d8

lib/Target/X86/X86ISelLowering.cpp		diff \| blob \| history
test/CodeGen/X86/sse41.ll		diff \| blob \| history