[InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi.
authorAndrea Di Biagio <Andrea_DiBiagio@sn.scee.net>
Thu, 11 Dec 2014 20:44:59 +0000 (20:44 +0000)
committerAndrea Di Biagio <Andrea_DiBiagio@sn.scee.net>
Thu, 11 Dec 2014 20:44:59 +0000 (20:44 +0000)
commitf27500040bde2a604f08c13c0be7833924170d0a
tree668483119b9b70d5d96a92456b8be98bcfcecdbf
parentff3fa3dc00c6a792d8f7a09c62e740f4e09f042a
[InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi.

This patch teaches the instruction combiner how to fold a call to 'insertqi' if
the 'length field' (3rd operand) is set to zero, and if the sum between
field 'length' and 'bit index' (4th operand) is bigger than 64.

From the AMD64 Architecture Programmer's Manual:
1. If the sum of the bit index + length field is greater than 64, then the
   results are undefined;
2. A value of zero in the field length is defined as a length of 64.

This patch improves the existing combining logic for intrinsic 'insertqi'
adding extra checks to address both point 1. and point 2.

Differential Revision: http://reviews.llvm.org/D6583

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224054 91177308-0d34-0410-b5e6-96231b3b80d8
lib/Transforms/InstCombine/InstCombineCalls.cpp
test/Transforms/InstCombine/vec_demanded_elts.ll