[x86] Simplify the pre-SSSE3 v16i8 lowering significantly by decomposing
authorChandler Carruth <chandlerc@gmail.com>
Thu, 19 Feb 2015 13:15:12 +0000 (13:15 +0000)
committerChandler Carruth <chandlerc@gmail.com>
Thu, 19 Feb 2015 13:15:12 +0000 (13:15 +0000)
commitc3d7858505a102af24ea2338cf3ad23d576878e4
treeb79af98a8e414291bf36677c2c617fa7727d35d7
parent3d4542ce3da1cb0782c65d38130556a00ed2586d
[x86] Simplify the pre-SSSE3 v16i8 lowering significantly by decomposing
them into permutes and a blend with the generic decomposition logic.

This works really well in almost every case and lets the code only
manage the expansion of a single input into two v8i16 vectors to perform
the actual shuffle. The blend-based merging is often much nicer than the
pack based merging that this replaces. The only place where it isn't we
end up blending between two packs when we could do a single pack. To
handle that case, just teach the v2i64 lowering to handle these blends
by digging out the operands.

With this we're down to only really random permutations that cause an
explosion of instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229849 91177308-0d34-0410-b5e6-96231b3b80d8
lib/Target/X86/X86ISelLowering.cpp
test/CodeGen/X86/sse3.ll
test/CodeGen/X86/vector-shuffle-128-v16.ll