AMDGPU: Split x8 and x16 vector loads instead of scalarize