Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during
authorEvan Cheng <evan.cheng@apple.com>
Tue, 29 Mar 2011 01:56:09 +0000 (01:56 +0000)
committerEvan Cheng <evan.cheng@apple.com>
Tue, 29 Mar 2011 01:56:09 +0000 (01:56 +0000)
commit78fe9ababead2168f7196c6a47402cf499a0aaf7
tree625da1ee1c53c784a40e7160f7ef3faf6ea52fc6
parent79abc9dd4a306d4ec42d09e2673a94abd225bcdc
Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during
isel lowering to fold the zero-extend's and take advantage of no-stall
back to back vmul + vmla:
 vmull q0, d4, d6
 vmlal q0, d5, d6
is faster than
 vaddl q0, d4, d5
 vmovl q1, d6
 vmul  q0, q0, q1

This allows us to vmull + vmlal for:
    f = vmull_u8(   vget_high_u8(s), c);
    f = vmlal_u8(f, vget_low_u8(s),  c);

rdar://9197392

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128444 91177308-0d34-0410-b5e6-96231b3b80d8
lib/Target/ARM/ARMISelLowering.cpp
test/CodeGen/ARM/vmul.ll