Always use inline-asm version on GCC/Clang
Summary:
Because the intrinsic version requires explicitly adding `__target__` attributes, which results in things not being inlined. Although the code generated with the `__target__` attribute is strictly better, ensuring it's applied on all the relevant functions is error-prone, so just use the inline assembly version for GCC/Clang so that it can be inlined elsewhere. MSVC will inline the intrinsic version without any issue.
This also marks the functions as `ALWAYS_INLINE`, as the diff that is getting reverted made that change as well.
Reviewed By: yfeldblum, philippv, ot
Differential Revision:
D3963935
fbshipit-source-id:
47175d64e7be351eb455a4d053b91ce9392bf152