The atomic instructions are designed specifically to provide readable IR and
optimized code generation for the following:
-* The new C++0x ``<atomic>`` header. (`C++0x draft available here
- <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C1x draft available here
+* The new C++11 ``<atomic>`` header. (`C++11 draft available here
+ <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
<http://www.open-std.org/jtc1/sc22/wg14/>`_.)
* Proper semantics for Java-style memory, for both ``volatile`` and regular
shared variables. (`Java Specification
- <http://java.sun.com/docs/books/jls/third_edition/html/memory.html>`_)
+ <http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_)
* gcc-compatible ``__sync_*`` builtins. (`Description
- <http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html>`_)
+ <https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html>`_)
* Other scenarios with atomic semantics, including ``static`` variables with
non-trivial constructors in C++.
``cmpxchg`` and ``atomicrmw`` are essentially like an atomic load followed by an
atomic store (where the store is conditional for ``cmpxchg``), but no other
-memory operation can happen on any thread between the load and store. Note that
-LLVM's cmpxchg does not provide quite as many options as the C++0x version.
+memory operation can happen on any thread between the load and store.
A ``fence`` provides Acquire and/or Release ordering which is not part of
another operation; it is normally used along with Monotonic memory operations.
A Monotonic load followed by an Acquire fence is roughly equivalent to an
-Acquire load.
+Acquire load, and a Monotonic store following a Release fence is roughly
+equivalent to a Release store. SequentiallyConsistent fences behave as both
+an Acquire and a Release fence, and offer some additional complicated
+guarantees, see the C++11 standard for details.
Frontends generating atomic instructions generally need to be aware of the
target to some degree; atomic instructions are guaranteed to be lock-free, and
also expected to generate an i8 store as an i8 store, and not an instruction
which writes to surrounding bytes. (If you are writing a backend for an
architecture which cannot satisfy these restrictions and cares about
- concurrency, please send an email to llvmdev.)
+ concurrency, please send an email to llvm-dev.)
Unordered
---------
Unordered is the lowest level of atomicity. It essentially guarantees that races
produce somewhat sane results instead of having undefined behavior. It also
-guarantees the operation to be lock-free, so it do not depend on the data being
-part of a special atomic structure or depend on a separate per-process global
-lock. Note that code generation will fail for unsupported atomic operations; if
-you need such an operation, use explicit locking.
+guarantees the operation to be lock-free, so it does not depend on the data
+being part of a special atomic structure or depend on a separate per-process
+global lock. Note that code generation will fail for unsupported atomic
+operations; if you need such an operation, use explicit locking.
Relevant standard
This is intended to match the Java memory model for shared variables.
address, a consistent ordering exists.
Relevant standard
- This corresponds to the C++0x/C1x ``memory_order_relaxed``; see those
+ This corresponds to the C++11/C11 ``memory_order_relaxed``; see those
standards for the exact definition.
Notes for frontends
other memory with normal loads and stores.
Relevant standard
- This corresponds to the C++0x/C1x ``memory_order_acquire``. It should also be
- used for C++0x/C1x ``memory_order_consume``.
+ This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be
+ used for C++11/C11 ``memory_order_consume``.
Notes for frontends
If you are writing a frontend which uses this directly, use with caution.
release a lock.
Relevant standard
- This corresponds to the C++0x/C1x ``memory_order_release``.
+ This corresponds to the C++11/C11 ``memory_order_release``.
Notes for frontends
If you are writing a frontend which uses this directly, use with caution.
barrier (for fences and operations which both read and write memory).
Relevant standard
- This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
+ This corresponds to the C++11/C11 ``memory_order_acq_rel``.
Notes for frontends
If you are writing a frontend which uses this directly, use with caution.
ordering exists between all SequentiallyConsistent operations.
Relevant standard
- This corresponds to the C++0x/C1x ``memory_order_seq_cst``, Java volatile, and
+ This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and
the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
Notes for frontends
that they return true for any operation which is volatile or at least
Monotonic.
+* ``isAtLeastAcquire()``/``isAtLeastRelease()``: These are predicates on
+ orderings. They can be useful for passes that are aware of atomics, for
+ example to do DSE across a single atomic access, but not across a
+ release-acquire pair (see MemoryDependencyAnalysis for an example of this)
+
* Alias analysis: Note that AA will return ModRef for anything Acquire or
Release, and for the address accessed by any Monotonic operation.
* DSE: Unordered stores can be DSE'ed like normal stores. Monotonic stores can
be DSE'ed in some cases, but it's tricky to reason about, and not especially
- important.
+ important. It is possible in some case for DSE to operate across a stronger
+ atomic operation, but it is fairly tricky. DSE delegates this reasoning to
+ MemoryDependencyAnalysis (which is also used by other passes like GVN).
* Folding a load: Any atomic load from a constant global can be constant-folded,
because it cannot be observed. Similar reasoning allows scalarrepl with
Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
On architectures which use barrier instructions for all atomic ordering (like
-ARM), appropriate fences are split out as the DAG is built.
+ARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if
+``setInsertFencesForAtomic()`` was used.
The MachineMemOperand for all atomic operations is currently marked as volatile;
this is not correct in the IR sense of volatile, but CodeGen handles anything
generator is not very helpful here at the moment, but hopefully that will
change.)
-The implementation of atomics on LL/SC architectures (like ARM) is currently a
-bit of a mess; there is a lot of copy-pasted code across targets, and the
-representation is relatively unsuited to optimization (it would be nice to be
-able to optimize loops involving cmpxchg etc.).
-
On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
fences generate an ``MFENCE``, other fences do not cause any code to be
on the users of the result, some ``atomicrmw`` operations can be translated into
operations like ``LOCK AND``, but that does not work in general.
-On ARM, MIPS, and many other RISC architectures, Acquire, Release, and
-SequentiallyConsistent semantics require barrier instructions for every such
+On ARM (before v8), MIPS, and many other RISC architectures, Acquire, Release,
+and SequentiallyConsistent semantics require barrier instructions for every such
operation. Loads and stores generate normal instructions. ``cmpxchg`` and
``atomicrmw`` can be represented using a loop with LL/SC-style instructions
which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
-on ARM, etc.). At the moment, the IR does not provide any way to represent a
-weak ``cmpxchg`` which would not require a loop.
+on ARM, etc.).
+
+It is often easiest for backends to use AtomicExpandPass to lower some of the
+atomic constructs. Here are some lowerings it can do:
+
+* cmpxchg -> loop with load-linked/store-conditional
+ by overriding ``shouldExpandAtomicCmpXchgInIR()``, ``emitLoadLinked()``,
+ ``emitStoreConditional()``
+* large loads/stores -> ll-sc/cmpxchg
+ by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``
+* strong atomic accesses -> monotonic accesses + fences
+ by using ``setInsertFencesForAtomic()`` and overriding ``emitLeadingFence()``
+ and ``emitTrailingFence()``
+* atomic rmw -> loop with cmpxchg or load-linked/store-conditional
+ by overriding ``expandAtomicRMWInIR()``
+
+For an example of all of these, look at the ARM backend.