Try using the last Deferred reader slot first
Summary:
When trying to find an empty deferred reader slot, getting the current CPU can
take quite a few cycles, e.g. >1% CPU on SMC (https://fburl.com/
434646643).
Let's track the last slot used by this thread and try that slot first before reading
CPU id and doing the search.
u-benchmark results seem to be improving generally (though a bit noisy and not
sure how much to trust). Results w/ this diff on left side:
P56648675
Reviewed By: nbronson
Differential Revision:
D3857793
fbshipit-source-id:
8b1c005362c82e748a663100f889b0b99dc257fe