Fix a ThreadLocal bug: hold the meta lock when resizing the element vector
Summary:
There appears to be a race here. leizha reported issues with
a heavily recycled AtomicHashMap (ThreadCachedInt inside). It looks
like what's happening is this:
- Thread A: ~ThreadCachedInt from an AHM
- meta lock is taken, and the ThreadElement list is iterated
- all entries are zerod, and the id is marked free
- then releases the lock
- Thread B: someone is calling get() on an unrelated id
- hit reserve: rallocm on the pointer or unsynchronized memcpy from
the element vector
- waits on the lock
- when it gets the lock, it stores back the value that it read that
was zero'd by A.
Later, someone reuses the id from the freelist, and reuses the
previously freed pointer, and eventually double-freeing it. (nullptr
is the signifier for "this thread doesn't have an instance of the
threadlocal yet").
Test Plan:
leizha's test case doesn't segv after this diff---it was
reliably breaking with corruption in malloc before it. I'm working on
making that test case into a unit test to add to this diff, but I'm
putting it up early in case there's something wrong with the theory
above or in case someone has an idea for a better fix.
Reviewed By: tudorb@fb.com
FB internal diff:
D928534