[AArch64] Fix halfword load merging for big-endian targets
For big-endian targets, when we merge two halfword loads into a word load, the
order of the halfwords in the loaded value is reversed compared to
little-endian, so the load-store optimiser needs to swap the destination
registers.
This does not affect merging of two word loads, as we use ldp, which treats the
memory as two separate 32-bit words.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252597
91177308-0d34-0410-b5e6-
96231b3b80d8