Try using the last Deferred reader slot first
authorQi Wang <qiwang@fb.com>
Mon, 19 Sep 2016 23:48:17 +0000 (16:48 -0700)
committerFacebook Github Bot 1 <facebook-github-bot-1-bot@fb.com>
Mon, 19 Sep 2016 23:53:45 +0000 (16:53 -0700)
commit8d329050cc18c02716dda2d1dea45bbef8fc4c53
tree5cadfa157a2256e8f85c839698c2207bdc06ee4d
parent5bebf3c9ea5276ce8a099bcce9a3dfa9c6727bb8
Try using the last Deferred reader slot first

Summary:
When trying to find an empty deferred reader slot, getting the current CPU can
take quite a few cycles, e.g. >1% CPU on SMC (https://fburl.com/434646643).

Let's track the last slot used by this thread and try that slot first before reading
CPU id and doing the search.

u-benchmark results seem to be improving generally (though a bit noisy and not
sure how much to trust). Results w/ this diff on left side:
P56648675

Reviewed By: nbronson

Differential Revision: D3857793

fbshipit-source-id: 8b1c005362c82e748a663100f889b0b99dc257fe
folly/SharedMutex.h
folly/test/SharedMutexTest.cpp