crypto: cast6-avx - tune assembler code for more performance
authorJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Tue, 28 Aug 2012 11:24:54 +0000 (14:24 +0300)
committerHerbert Xu <herbert@gondor.apana.org.au>
Thu, 6 Sep 2012 20:17:05 +0000 (04:17 +0800)
commitc09220e1bc97d83cae445cab8dcb057fabd62361
tree2f7a80b98592630c28350ebe4cb0f3be616cc8f8
parentddaea7869d29beb9e0042c96ea52c9cca2afd68a
crypto: cast6-avx - tune assembler code for more performance

Patch replaces 'movb' instructions with 'movzbl' to break false register
dependencies, interleaves instructions better for out-of-order scheduling
and merges constant 16-bit rotation with round-key variable rotation.

tcrypt ECB results:

Intel Core i5-2450M:

size    old-vs-new      new-vs-generic  old-vs-generic
        enc     dec     enc     dec     enc     dec
256     1.13x   1.19x   2.05x   2.17x   1.82x   1.82x
1k      1.18x   1.21x   2.26x   2.33x   1.93x   1.93x
8k      1.19x   1.19x   2.32x   2.33x   1.95x   1.95x

[v2]
 - Do instruction interleaving another way to avoid adding new FPU<=>CPU
   register moves as these cause performance drop on Bulldozer.
 - Improvements to round-key variable rotation handling.
 - Further interleaving improvements for better out-of-order scheduling.

Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
arch/x86/crypto/cast6-avx-x86_64-asm_64.S