[x86] Add a slight variation on some of the other generic shuffle
lowerings -- one which decomposes into an initial blend followed by
a permute.
Particularly on newer chips, blends are handled independently of
shuffles and so this is much less bottlenecked on the single port that
floating point shuffles are executed with on Intel.
I'll be adding this lowering to a bunch of other code paths in
subsequent commits to handle still more places where we can effectively
leverage blends when they're available in the ISA.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229292
91177308-0d34-0410-b5e6-
96231b3b80d8