[X86, AVX] instcombine common cases of vperm2* intrinsics into shuffles
vperm2* intrinsics are just shuffles.
In a few special cases, they're not even shuffles.
Optimizing intrinsics in InstCombine is better than
handling this in the front-end for at least two reasons:
1. Optimizing custom-written SSE intrinsic code at -O0 makes vector coders
really angry (and so I have regrets about some patches from last week).
2. Doing mask conversion logic in header files is hard to write and
subsequently read.
There are a couple of TODOs in this patch to complete this optimization.
Differential Revision: http://reviews.llvm.org/D8486
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232852
91177308-0d34-0410-b5e6-
96231b3b80d8