1 -------------------------------------------------------------------------------
3 -------------------------------------------------------------------------------
5 Fences provide a few new modification order constraints as well as an
6 interesting extension to release sequences, detailed in 29.3 (p4-p7) and 29.8
7 (p2-p4). The specifications are pasted here in Appendix A and are applied to our
8 model-checker in these notes.
10 Section 29.3 details the modification order constraints established by
11 sequentially-consistent fences.
13 Section 29.8 details the behavior of release and acquire fences (note that
14 memory_order_seq_cst is both release and acquire).
16 The text of these rules are provided at the end of this document for reference.
18 *******************************
19 Backtracking requirements
20 *******************************
22 Because we maintain the seq-cst order as consistent with the execution order,
23 seq-cst fences cannot commute with each other, with seq-cst loads, nor with
24 seq-cst stores; we backtrack at all such pairs.
26 Fences extend release/acquire synchronization beyond just
27 store-release/load-acquire. We must backtrack with potentially-synchronizing
28 fences: that is, with any pair of store- or fence-release and load- or
29 fence-acquire, where the release comes after the acquire in the execution order
30 (the other ordering is OK, as we will explore both behaviors; where the pair
31 synchronize and where they don't).
33 Note that, for instance, a fence-release may synchronize with a fence-acquire
34 only in the presence of a appropriate load/store pair (29.8p2); but the
35 synchronization still occurs between the fences, so the backtracking
36 requirements are only placed on the release/acquire fences themselves.
38 *******************************
39 Seq-cst MO constraints (29.3 p4-p7)
40 *******************************
42 The statements given in the specification regarding sequentially consistent
43 fences can be transformed into the following 4 modification order constraints.
48 is_write(A) && is_read(B) && is_write(W) && is_fence(X) &&
49 is_seqcst(W) && is_seqcst(X) && A != W &&
56 Intuition/Implementation:
57 * We may (but don't currently) limit our considertion of W to only the most
58 recent (in the SC order) store to the same location as A and B prior to X
59 (note that all prior writes will be ordered prior to W in both SC and MO)
60 * We should consider the "most recent" seq-cst fence X that precedes B
61 * This search can be combined with the r_modification_order search, since we
62 already iterate through the necessary stores W
67 is_write(A) && is_read(B) && is_write(W) && is_fence(X) &&
68 is_seqcst(B) && is_seqcst(X) &&
76 Intuition/Implementation:
77 * We only need to examine the "most recent" seq-cst fence X from each thread
78 * We only need to examine the "most recent" qualifying store W that precedes X;
79 all other W will provide a weaker MO constraint
80 * This search can be combined with the r_modification_order search, since we
81 already iterate through the necessary stores W
86 is_write(A) && is_read(B) && is_write(W) && is_fence(X) && is_fence(Y) &&
87 is_seqcst(X) && is_seqcst(Y) &&
91 W --sb-> X --sc-> Y --sb-> B
95 Intuition/Implementation:
96 * We should consider only the "most recent" fence Y in the same thread as B
97 (prior fences may only yield the same or weaker constraints)
98 * We may then consider the "most recent" seq-cst fence X prior to Y (in SC order)
99 from each thread (prior fences may only yield the same or weaker constraints)
100 * We should consider only the "most recent" store W (to the same location as A,
101 B) in the same thread as X (prior stores may only yield the same or weaker
103 * This search can be combined with the r_modification_order search, since we
104 already iterate through the necessary stores W
109 is_write(A) && is_write(B) && is_fence(X) && is_fence(Y) &&
110 is_seqcst(X) && is_seqcst(Y) &&
112 A --sb-> X --sc-> Y --sb-> B
116 Intuition/Implementation:
117 * (Similar to 29.3p6 rules, except using A/B write/write) only search for the
118 most recent fence Y in the same thread; search for the most recent (prior to
119 Y) fence X from each thread; search for the most recent store A prior to X
120 * This search can be combined with the w_modification_order search, since we
121 already iterate through the necessary stores A
123 **********************************************************************
124 Release/acquire synchronization: extended to fences (29.3 p4-p7)
125 **********************************************************************
127 The C++ specification statements regarding release and acquire fences make
128 extensions to release sequences, using what they call "hypothetical release
129 sequences." These hypothetical release sequences are the same as normal release
130 sequences, except that the "head" doesn't have to be a release: it can have any
131 memory order (e.g., relaxed). This change actually simplifies our release
132 sequences (for the fence case), as we don't actually have to establish a
133 contiguous modification order all the way to a release operation; we just need
134 to reach the same thread (via a RMW chain, for instance).
136 The statements given in the specification regarding release and acquire fences
137 do not result in totally separable conditions, so I will write down my
138 semi-formal notation here along with some simple notes then present my
139 implementation notes at the end.
141 Note that we will use A --rs-> B to denote that B is in the release sequence
142 headed by A (we allow A = B, unless otherwise stated). The hypothetical release
143 sequence will be similarly denoted A --hrs-> B.
148 is_fence(A) && is_write(X) && is_write(W) && is_read(Y) && is_fence(B) &&
149 is_release(A) && is_acquire(B) &&
150 A --sb-> X --hrs-> W --rf-> Y --sb-> B
155 * The fence-release A does not result in any action on its own (i.e., when it
156 is first explored); it only affects later release operations, at which point
157 we may need to associate store X with A. Thus, for every store X, we eagerly
158 record the most recent fence-release, then this record can be utilized during
159 later (hypothetical) release sequence checks.
160 * The fence-acquire B is more troublesome, since there may be many qualifying
161 loads Y (loads from different locations; loads which read from different
162 threads; etc.). Each Y may read from different hypothetical release
163 sequences, ending in a different release A with which B should synchronize.
164 It is difficult (but not impossible) to find good stopping conditions at
165 which we should terminate our search for Y. However, we at least know we only
166 need to consder Y such that:
168 where V is a previous fence-acquire.
173 is_fence(A) && is_write(X) && is_write(W) && is_read(B) &&
174 is_release(A) && is_acquire(B) &&
175 A --sb-> X --hrs-> W --rf-> B
180 * See the note for fence-release A in 29.8p2
185 is_write(A) && is_write(W) && is_read(X) && is_fence(B) &&
186 is_release(A) && is_acquire(B) &&
187 A --rs-> W --rf-> X --sb-> B
192 * See the note for fence-acquire B in 29.8p2. The A, Y, and B in 29.8p2
193 correspond to A, X, and B in this rule (29.8p4).
197 Now, noting the overlap in implementation notes between 29.8p2,3,4 and the
198 similarity between release sequences and hypothetical release sequences, I can
199 extend our release sequence support to provide easy facilities for
200 release/acquire fence support.
202 I extend release sequence support to include fences first by distinguishing the
203 'acquire' from the 'load'; previously, release sequence searches were always
204 triggered by a load-acquire operation. Now, we may have a *fence*-acquire which
205 finds a previous load-*relaxed*, then follows any chain to a release sequence
206 (29.8p4). Any release heads found by our existing release sequence support must
207 synchronize with the fence-acquire. Any uncertain release sequences can be
208 stashed (along with both the fence-acquire and the load-relaxed) as "pending" in
211 Next I extend the release sequence support to include hypothetical release
212 sequences. Note that any search for a release sequence can also search for a
213 hypothetical release sequence with little additional effort (and even saving
214 effort in cases where a fence-release hides a prior store-release, whose release
215 sequence may be harder to establish eagerly). Because the "most recent"
216 fence-release is stashed in each ModelAction (see the fence-release note in
217 29.8p2), release sequence searches can easily add the most recent fence-release
218 to the release_heads vector as it explores a RMW chain. Then, once it reaches a
219 thread in which it looks for a store-release, it can perform this interesting
220 optimization: if the most recent store-release is sequenced before the most
221 recent fence-release, then we can ignore the store-release and simply
222 synchronize with the fence-release. This avoids a "contiguous MO" computation.
224 So, with hypothetical release sequences seamlessly integrated into the release
225 sequence code, we easily cover 29.8p3 (fence-release --sw-> load-acquire). Then,
226 it's a simple extension to see how 29.8p2 is just a combination of the rules
227 described for 29.8p3 and 29.8p4: a fence-acquire triggers a search for loads in
228 its same thread; these loads then launch a series of release sequence
229 searches--hypothetical (29.8p2) or real (29.8p4)--and synchronizes with all the
232 The best part about all of the preceding explanations: the lazy fixups, etc.,
233 can simply be re-used from existing release sequence code, with slight
234 adjustments for dealing the presence of a fence-acquire preceded by a
237 *******************************
239 *******************************
241 fence(memory_order_consume) acts like memory_order_release, so if we ever
242 support consume, we must alias consume -> release
244 fence(memory_order_relaxed) is a no-op
246 **************************************************
247 Appendix A: From the C++11 specification (N3337)
248 **************************************************
256 For an atomic operation B that reads the value of an atomic object M, if there
257 is a memory_order_seq_cst fence X sequenced before B, then B observes either
258 the last memory_order_seq_cst modification of M preceding X in the total order
259 S or a later modification of M in its modification order.
263 For atomic operations A and B on an atomic object M, where A modifies M and B
264 takes its value, if there is a memory_order_seq_cst fence X such that A is
265 sequenced before X and B follows X in S, then B observes either the effects of
266 A or a later modification of M in its modification order.
270 For atomic operations A and B on an atomic object M, where A modifies M and B
271 takes its value, if there are memory_order_seq_cst fences X and Y such that A
272 is sequenced before X, Y is sequenced before B, and X precedes Y in S, then B
273 observes either the effects of A or a later modification of M in its
278 For atomic operations A and B on an atomic object M, if there are
279 memory_order_seq_cst fences X and Y such that A is sequenced before X, Y is
280 sequenced before B, and X precedes Y in S, then B occurs later than A in the
281 modification order of M.
289 A release fence A synchronizes with an acquire fence B if there exist atomic
290 operations X and Y, both operating on some atomic object M, such that A is
291 sequenced before X, X modifies M, Y is sequenced before B, and Y reads the value
292 written by X or a value written by any side effect in the hypothetical release
293 sequence X would head if it were a release operation.
297 A release fence A synchronizes with an atomic operation B that performs an
298 acquire operation on an atomic object M if there exists an atomic operation X
299 such that A is sequenced before X, X modifies M, and B reads the value written
300 by X or a value written by any side effect in the hypothetical release sequence
301 X would head if it were a release operation.
305 An atomic operation A that is a release operation on an atomic object M
306 synchronizes with an acquire fence B if there exists some atomic operation X on
307 M such that X is sequenced before B and reads the value written by A or a value
308 written by any side effect in the release sequence headed by A.