libatomic: perform write before context switch
Atomic actions block a thread at the "switch_to_master" function call, so under
the current structure (which isn't quite fit for relaxed modeling yet...) we
should perform the memory write before calling "switch_to_master".
If not, we can observe sequences like the following, where x is an atomic
variable. All actions are seq_cst:
Initially, x = 0
Thread Action
------ ------
1 r1 = x; // r1 = 0
1 x = r1 + 1; // x = 1, not stored yet?
2 r2 = x; // r2 = 0
2 x = r2 // x = 1, not stored yet?
Then, depending on scheduling, Thread 1 or Thread 2 might complete first, with
its write being performed *after* it receives control again.