[X86] Avoid folding scalar loads into unary sse intrinsics
authorMichael Kuperstein <michael.m.kuperstein@intel.com>
Thu, 31 Dec 2015 09:45:16 +0000 (09:45 +0000)
committerMichael Kuperstein <michael.m.kuperstein@intel.com>
Thu, 31 Dec 2015 09:45:16 +0000 (09:45 +0000)
commit4f73c427970321683a140fb4947673fe980f6916
tree8304f01e3c59d44e1e3a29e0ef8a26d8b2fe88a4
parent7b8bd88d4597339cec0103d6c1df3f29316ea867
[X86] Avoid folding scalar loads into unary sse intrinsics

Not folding these cases tends to avoid partial register updates:
sqrtss (%eax), %xmm0
Has a partial update of %xmm0, while
movss (%eax), %xmm0
sqrtss %xmm0, %xmm0
Has a clobber of the high lanes immediately before the partial update,
avoiding a potential stall.

Given this, we only want to fold when optimizing for size.
This is consistent with the patterns we already have for some of
the fp/int converts, and in X86InstrInfo::foldMemoryOperandImpl()

Differential Revision: http://reviews.llvm.org/D15741

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256671 91177308-0d34-0410-b5e6-96231b3b80d8
lib/Target/X86/X86InstrSSE.td
test/CodeGen/X86/fold-load-unops.ll