crypto: camellia-aesni-avx2 - tune assembly code for more performance
Add implementation tuned for more performance on real hardware. Changes are
mostly around the part mixing 128-bit extract and insert instructions and
AES-NI instructions. Also 'vpbroadcastb' instructions have been change to
'vpshufb with zero mask'.
Tests on Intel Core i5-4570:
tcrypt ECB results, old-AVX2 vs new-AVX2:
size 128bit key 256bit key
enc dec enc dec
256 1.00x 1.00x 1.00x 1.00x
1k 1.08x 1.09x 1.05x 1.06x
8k 1.06x 1.06x 1.06x 1.06x
tcrypt ECB results, AVX vs new-AVX2:
size 128bit key 256bit key
enc dec enc dec
256 1.00x 1.00x 1.00x 1.00x
1k 1.51x 1.50x 1.52x 1.50x
8k 1.47x 1.48x 1.48x 1.48x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>