+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="int_atomics">Atomic Operations and Synchronization Intrinsics</a>
+</div>
+
+<div class="doc_text">
+<p>
+ These intrinsic functions expand the "universal IR" of LLVM to represent
+ hardware constructs for atomic operations and memory synchronization. This
+ provides an interface to the hardware, not an interface to the programmer. It
+ is aimed at a low enough level to allow any programming models or APIs
+ (Application Programming Interfaces) which
+ need atomic behaviors to map cleanly onto it. It is also modeled primarily on
+ hardware behavior. Just as hardware provides a "universal IR" for source
+ languages, it also provides a starting point for developing a "universal"
+ atomic operation and synchronization IR.
+</p>
+<p>
+ These do <em>not</em> form an API such as high-level threading libraries,
+ software transaction memory systems, atomic primitives, and intrinsic
+ functions as found in BSD, GNU libc, atomic_ops, APR, and other system and
+ application libraries. The hardware interface provided by LLVM should allow
+ a clean implementation of all of these APIs and parallel programming models.
+ No one model or paradigm should be selected above others unless the hardware
+ itself ubiquitously does so.
+
+</p>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="int_memory_barrier">'<tt>llvm.memory.barrier</tt>' Intrinsic</a>
+</div>
+<div class="doc_text">
+<h5>Syntax:</h5>
+<pre>
+declare void @llvm.memory.barrier( i1 <ll>, i1 <ls>, i1 <sl>, i1 <ss>,
+i1 <device> )
+
+</pre>
+<h5>Overview:</h5>
+<p>
+ The <tt>llvm.memory.barrier</tt> intrinsic guarantees ordering between
+ specific pairs of memory access types.
+</p>
+<h5>Arguments:</h5>
+<p>
+ The <tt>llvm.memory.barrier</tt> intrinsic requires five boolean arguments.
+ The first four arguments enables a specific barrier as listed below. The fith
+ argument specifies that the barrier applies to io or device or uncached memory.
+
+</p>
+ <ul>
+ <li><tt>ll</tt>: load-load barrier</li>
+ <li><tt>ls</tt>: load-store barrier</li>
+ <li><tt>sl</tt>: store-load barrier</li>
+ <li><tt>ss</tt>: store-store barrier</li>
+ <li><tt>device</tt>: barrier applies to device and uncached memory also.
+ </ul>
+<h5>Semantics:</h5>
+<p>
+ This intrinsic causes the system to enforce some ordering constraints upon
+ the loads and stores of the program. This barrier does not indicate
+ <em>when</em> any events will occur, it only enforces an <em>order</em> in
+ which they occur. For any of the specified pairs of load and store operations
+ (f.ex. load-load, or store-load), all of the first operations preceding the
+ barrier will complete before any of the second operations succeeding the
+ barrier begin. Specifically the semantics for each pairing is as follows:
+</p>
+ <ul>
+ <li><tt>ll</tt>: All loads before the barrier must complete before any load
+ after the barrier begins.</li>
+
+ <li><tt>ls</tt>: All loads before the barrier must complete before any
+ store after the barrier begins.</li>
+ <li><tt>ss</tt>: All stores before the barrier must complete before any
+ store after the barrier begins.</li>
+ <li><tt>sl</tt>: All stores before the barrier must complete before any
+ load after the barrier begins.</li>
+ </ul>
+<p>
+ These semantics are applied with a logical "and" behavior when more than one
+ is enabled in a single memory barrier intrinsic.
+</p>
+<p>
+ Backends may implement stronger barriers than those requested when they do not
+ support as fine grained a barrier as requested. Some architectures do not
+ need all types of barriers and on such architectures, these become noops.
+</p>
+<h5>Example:</h5>
+<pre>
+%ptr = malloc i32
+ store i32 4, %ptr
+
+%result1 = load i32* %ptr <i>; yields {i32}:result1 = 4</i>
+ call void @llvm.memory.barrier( i1 false, i1 true, i1 false, i1 false )
+ <i>; guarantee the above finishes</i>
+ store i32 8, %ptr <i>; before this begins</i>
+</pre>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="int_atomic_cmp_swap">'<tt>llvm.atomic.cmp.swap.*</tt>' Intrinsic</a>
+</div>
+<div class="doc_text">
+<h5>Syntax:</h5>
+<p>
+ This is an overloaded intrinsic. You can use <tt>llvm.atomic.cmp.swap</tt> on
+ any integer bit width and for different address spaces. Not all targets
+ support all bit widths however.</p>
+
+<pre>
+declare i8 @llvm.atomic.cmp.swap.i8.p0i8( i8* <ptr>, i8 <cmp>, i8 <val> )
+declare i16 @llvm.atomic.cmp.swap.i16.p0i16( i16* <ptr>, i16 <cmp>, i16 <val> )
+declare i32 @llvm.atomic.cmp.swap.i32.p0i32( i32* <ptr>, i32 <cmp>, i32 <val> )
+declare i64 @llvm.atomic.cmp.swap.i64.p0i64( i64* <ptr>, i64 <cmp>, i64 <val> )
+
+</pre>
+<h5>Overview:</h5>
+<p>
+ This loads a value in memory and compares it to a given value. If they are
+ equal, it stores a new value into the memory.
+</p>
+<h5>Arguments:</h5>
+<p>
+ The <tt>llvm.atomic.cmp.swap</tt> intrinsic takes three arguments. The result as
+ well as both <tt>cmp</tt> and <tt>val</tt> must be integer values with the
+ same bit width. The <tt>ptr</tt> argument must be a pointer to a value of
+ this integer type. While any bit width integer may be used, targets may only
+ lower representations they support in hardware.
+
+</p>
+<h5>Semantics:</h5>
+<p>
+ This entire intrinsic must be executed atomically. It first loads the value
+ in memory pointed to by <tt>ptr</tt> and compares it with the value
+ <tt>cmp</tt>. If they are equal, <tt>val</tt> is stored into the memory. The
+ loaded value is yielded in all cases. This provides the equivalent of an
+ atomic compare-and-swap operation within the SSA framework.
+</p>
+<h5>Examples:</h5>
+
+<pre>
+%ptr = malloc i32
+ store i32 4, %ptr
+
+%val1 = add i32 4, 4
+%result1 = call i32 @llvm.atomic.cmp.swap.i32.p0i32( i32* %ptr, i32 4, %val1 )
+ <i>; yields {i32}:result1 = 4</i>
+%stored1 = icmp eq i32 %result1, 4 <i>; yields {i1}:stored1 = true</i>
+%memval1 = load i32* %ptr <i>; yields {i32}:memval1 = 8</i>
+
+%val2 = add i32 1, 1
+%result2 = call i32 @llvm.atomic.cmp.swap.i32.p0i32( i32* %ptr, i32 5, %val2 )
+ <i>; yields {i32}:result2 = 8</i>
+%stored2 = icmp eq i32 %result2, 5 <i>; yields {i1}:stored2 = false</i>
+
+%memval2 = load i32* %ptr <i>; yields {i32}:memval2 = 8</i>
+</pre>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="int_atomic_swap">'<tt>llvm.atomic.swap.*</tt>' Intrinsic</a>
+</div>
+<div class="doc_text">
+<h5>Syntax:</h5>
+
+<p>
+ This is an overloaded intrinsic. You can use <tt>llvm.atomic.swap</tt> on any
+ integer bit width. Not all targets support all bit widths however.</p>
+<pre>
+declare i8 @llvm.atomic.swap.i8.p0i8( i8* <ptr>, i8 <val> )
+declare i16 @llvm.atomic.swap.i16.p0i16( i16* <ptr>, i16 <val> )
+declare i32 @llvm.atomic.swap.i32.p0i32( i32* <ptr>, i32 <val> )
+declare i64 @llvm.atomic.swap.i64.p0i64( i64* <ptr>, i64 <val> )
+
+</pre>
+<h5>Overview:</h5>
+<p>
+ This intrinsic loads the value stored in memory at <tt>ptr</tt> and yields
+ the value from memory. It then stores the value in <tt>val</tt> in the memory
+ at <tt>ptr</tt>.
+</p>
+<h5>Arguments:</h5>
+
+<p>
+ The <tt>llvm.atomic.swap</tt> intrinsic takes two arguments. Both the
+ <tt>val</tt> argument and the result must be integers of the same bit width.
+ The first argument, <tt>ptr</tt>, must be a pointer to a value of this
+ integer type. The targets may only lower integer representations they
+ support.
+</p>
+<h5>Semantics:</h5>
+<p>
+ This intrinsic loads the value pointed to by <tt>ptr</tt>, yields it, and
+ stores <tt>val</tt> back into <tt>ptr</tt> atomically. This provides the
+ equivalent of an atomic swap operation within the SSA framework.
+
+</p>
+<h5>Examples:</h5>
+<pre>
+%ptr = malloc i32
+ store i32 4, %ptr
+
+%val1 = add i32 4, 4
+%result1 = call i32 @llvm.atomic.swap.i32.p0i32( i32* %ptr, i32 %val1 )
+ <i>; yields {i32}:result1 = 4</i>
+%stored1 = icmp eq i32 %result1, 4 <i>; yields {i1}:stored1 = true</i>
+%memval1 = load i32* %ptr <i>; yields {i32}:memval1 = 8</i>
+
+%val2 = add i32 1, 1
+%result2 = call i32 @llvm.atomic.swap.i32.p0i32( i32* %ptr, i32 %val2 )
+ <i>; yields {i32}:result2 = 8</i>
+
+%stored2 = icmp eq i32 %result2, 8 <i>; yields {i1}:stored2 = true</i>
+%memval2 = load i32* %ptr <i>; yields {i32}:memval2 = 2</i>
+</pre>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="int_atomic_load_add">'<tt>llvm.atomic.load.add.*</tt>' Intrinsic</a>
+
+</div>
+<div class="doc_text">
+<h5>Syntax:</h5>
+<p>
+ This is an overloaded intrinsic. You can use <tt>llvm.atomic.load.add</tt> on any
+ integer bit width. Not all targets support all bit widths however.</p>
+<pre>
+declare i8 @llvm.atomic.load.add.i8..p0i8( i8* <ptr>, i8 <delta> )
+declare i16 @llvm.atomic.load.add.i16..p0i16( i16* <ptr>, i16 <delta> )
+declare i32 @llvm.atomic.load.add.i32..p0i32( i32* <ptr>, i32 <delta> )
+declare i64 @llvm.atomic.load.add.i64..p0i64( i64* <ptr>, i64 <delta> )
+
+</pre>
+<h5>Overview:</h5>
+<p>
+ This intrinsic adds <tt>delta</tt> to the value stored in memory at
+ <tt>ptr</tt>. It yields the original value at <tt>ptr</tt>.
+</p>
+<h5>Arguments:</h5>
+<p>
+
+ The intrinsic takes two arguments, the first a pointer to an integer value
+ and the second an integer value. The result is also an integer value. These
+ integer types can have any bit width, but they must all have the same bit
+ width. The targets may only lower integer representations they support.
+</p>
+<h5>Semantics:</h5>
+<p>
+ This intrinsic does a series of operations atomically. It first loads the
+ value stored at <tt>ptr</tt>. It then adds <tt>delta</tt>, stores the result
+ to <tt>ptr</tt>. It yields the original value stored at <tt>ptr</tt>.
+</p>
+
+<h5>Examples:</h5>
+<pre>
+%ptr = malloc i32
+ store i32 4, %ptr
+%result1 = call i32 @llvm.atomic.load.add.i32.p0i32( i32* %ptr, i32 4 )
+ <i>; yields {i32}:result1 = 4</i>
+%result2 = call i32 @llvm.atomic.load.add.i32.p0i32( i32* %ptr, i32 2 )
+ <i>; yields {i32}:result2 = 8</i>
+%result3 = call i32 @llvm.atomic.load.add.i32.p0i32( i32* %ptr, i32 5 )
+ <i>; yields {i32}:result3 = 10</i>
+%memval1 = load i32* %ptr <i>; yields {i32}:memval1 = 15</i>
+</pre>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="int_atomic_load_sub">'<tt>llvm.atomic.load.sub.*</tt>' Intrinsic</a>
+
+</div>
+<div class="doc_text">
+<h5>Syntax:</h5>
+<p>
+ This is an overloaded intrinsic. You can use <tt>llvm.atomic.load.sub</tt> on
+ any integer bit width and for different address spaces. Not all targets
+ support all bit widths however.</p>
+<pre>
+declare i8 @llvm.atomic.load.sub.i8.p0i32( i8* <ptr>, i8 <delta> )
+declare i16 @llvm.atomic.load.sub.i16.p0i32( i16* <ptr>, i16 <delta> )
+declare i32 @llvm.atomic.load.sub.i32.p0i32( i32* <ptr>, i32 <delta> )
+declare i64 @llvm.atomic.load.sub.i64.p0i32( i64* <ptr>, i64 <delta> )
+
+</pre>
+<h5>Overview:</h5>
+<p>
+ This intrinsic subtracts <tt>delta</tt> to the value stored in memory at
+ <tt>ptr</tt>. It yields the original value at <tt>ptr</tt>.
+</p>
+<h5>Arguments:</h5>
+<p>
+
+ The intrinsic takes two arguments, the first a pointer to an integer value
+ and the second an integer value. The result is also an integer value. These
+ integer types can have any bit width, but they must all have the same bit
+ width. The targets may only lower integer representations they support.
+</p>
+<h5>Semantics:</h5>
+<p>
+ This intrinsic does a series of operations atomically. It first loads the
+ value stored at <tt>ptr</tt>. It then subtracts <tt>delta</tt>, stores the
+ result to <tt>ptr</tt>. It yields the original value stored at <tt>ptr</tt>.
+</p>
+
+<h5>Examples:</h5>
+<pre>
+%ptr = malloc i32
+ store i32 8, %ptr
+%result1 = call i32 @llvm.atomic.load.sub.i32.p0i32( i32* %ptr, i32 4 )
+ <i>; yields {i32}:result1 = 8</i>
+%result2 = call i32 @llvm.atomic.load.sub.i32.p0i32( i32* %ptr, i32 2 )
+ <i>; yields {i32}:result2 = 4</i>
+%result3 = call i32 @llvm.atomic.load.sub.i32.p0i32( i32* %ptr, i32 5 )
+ <i>; yields {i32}:result3 = 2</i>
+%memval1 = load i32* %ptr <i>; yields {i32}:memval1 = -3</i>
+</pre>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="int_atomic_load_and">'<tt>llvm.atomic.load.and.*</tt>' Intrinsic</a><br>
+ <a name="int_atomic_load_nand">'<tt>llvm.atomic.load.nand.*</tt>' Intrinsic</a><br>
+ <a name="int_atomic_load_or">'<tt>llvm.atomic.load.or.*</tt>' Intrinsic</a><br>
+ <a name="int_atomic_load_xor">'<tt>llvm.atomic.load.xor.*</tt>' Intrinsic</a><br>
+
+</div>
+<div class="doc_text">
+<h5>Syntax:</h5>
+<p>
+ These are overloaded intrinsics. You can use <tt>llvm.atomic.load_and</tt>,
+ <tt>llvm.atomic.load_nand</tt>, <tt>llvm.atomic.load_or</tt>, and
+ <tt>llvm.atomic.load_xor</tt> on any integer bit width and for different
+ address spaces. Not all targets support all bit widths however.</p>
+<pre>
+declare i8 @llvm.atomic.load.and.i8.p0i8( i8* <ptr>, i8 <delta> )
+declare i16 @llvm.atomic.load.and.i16.p0i16( i16* <ptr>, i16 <delta> )
+declare i32 @llvm.atomic.load.and.i32.p0i32( i32* <ptr>, i32 <delta> )
+declare i64 @llvm.atomic.load.and.i64.p0i64( i64* <ptr>, i64 <delta> )
+
+</pre>
+
+<pre>
+declare i8 @llvm.atomic.load.or.i8.p0i8( i8* <ptr>, i8 <delta> )
+declare i16 @llvm.atomic.load.or.i16.p0i16( i16* <ptr>, i16 <delta> )
+declare i32 @llvm.atomic.load.or.i32.p0i32( i32* <ptr>, i32 <delta> )
+declare i64 @llvm.atomic.load.or.i64.p0i64( i64* <ptr>, i64 <delta> )
+
+</pre>
+
+<pre>
+declare i8 @llvm.atomic.load.nand.i8.p0i32( i8* <ptr>, i8 <delta> )
+declare i16 @llvm.atomic.load.nand.i16.p0i32( i16* <ptr>, i16 <delta> )
+declare i32 @llvm.atomic.load.nand.i32.p0i32( i32* <ptr>, i32 <delta> )
+declare i64 @llvm.atomic.load.nand.i64.p0i32( i64* <ptr>, i64 <delta> )
+
+</pre>
+
+<pre>
+declare i8 @llvm.atomic.load.xor.i8.p0i32( i8* <ptr>, i8 <delta> )
+declare i16 @llvm.atomic.load.xor.i16.p0i32( i16* <ptr>, i16 <delta> )
+declare i32 @llvm.atomic.load.xor.i32.p0i32( i32* <ptr>, i32 <delta> )
+declare i64 @llvm.atomic.load.xor.i64.p0i32( i64* <ptr>, i64 <delta> )
+
+</pre>
+<h5>Overview:</h5>
+<p>
+ These intrinsics bitwise the operation (and, nand, or, xor) <tt>delta</tt> to
+ the value stored in memory at <tt>ptr</tt>. It yields the original value
+ at <tt>ptr</tt>.
+</p>
+<h5>Arguments:</h5>
+<p>
+
+ These intrinsics take two arguments, the first a pointer to an integer value
+ and the second an integer value. The result is also an integer value. These
+ integer types can have any bit width, but they must all have the same bit
+ width. The targets may only lower integer representations they support.
+</p>
+<h5>Semantics:</h5>
+<p>
+ These intrinsics does a series of operations atomically. They first load the
+ value stored at <tt>ptr</tt>. They then do the bitwise operation
+ <tt>delta</tt>, store the result to <tt>ptr</tt>. They yield the original
+ value stored at <tt>ptr</tt>.
+</p>
+
+<h5>Examples:</h5>
+<pre>
+%ptr = malloc i32
+ store i32 0x0F0F, %ptr
+%result0 = call i32 @llvm.atomic.load.nand.i32.p0i32( i32* %ptr, i32 0xFF )
+ <i>; yields {i32}:result0 = 0x0F0F</i>
+%result1 = call i32 @llvm.atomic.load.and.i32.p0i32( i32* %ptr, i32 0xFF )
+ <i>; yields {i32}:result1 = 0xFFFFFFF0</i>
+%result2 = call i32 @llvm.atomic.load.or.i32.p0i32( i32* %ptr, i32 0F )
+ <i>; yields {i32}:result2 = 0xF0</i>
+%result3 = call i32 @llvm.atomic.load.xor.i32.p0i32( i32* %ptr, i32 0F )
+ <i>; yields {i32}:result3 = FF</i>
+%memval1 = load i32* %ptr <i>; yields {i32}:memval1 = F0</i>
+</pre>
+</div>
+
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="int_atomic_load_max">'<tt>llvm.atomic.load.max.*</tt>' Intrinsic</a><br>
+ <a name="int_atomic_load_min">'<tt>llvm.atomic.load.min.*</tt>' Intrinsic</a><br>
+ <a name="int_atomic_load_umax">'<tt>llvm.atomic.load.umax.*</tt>' Intrinsic</a><br>
+ <a name="int_atomic_load_umin">'<tt>llvm.atomic.load.umin.*</tt>' Intrinsic</a><br>
+
+</div>
+<div class="doc_text">
+<h5>Syntax:</h5>
+<p>
+ These are overloaded intrinsics. You can use <tt>llvm.atomic.load_max</tt>,
+ <tt>llvm.atomic.load_min</tt>, <tt>llvm.atomic.load_umax</tt>, and
+ <tt>llvm.atomic.load_umin</tt> on any integer bit width and for different
+ address spaces. Not all targets
+ support all bit widths however.</p>
+<pre>
+declare i8 @llvm.atomic.load.max.i8.p0i8( i8* <ptr>, i8 <delta> )
+declare i16 @llvm.atomic.load.max.i16.p0i16( i16* <ptr>, i16 <delta> )
+declare i32 @llvm.atomic.load.max.i32.p0i32( i32* <ptr>, i32 <delta> )
+declare i64 @llvm.atomic.load.max.i64.p0i64( i64* <ptr>, i64 <delta> )
+
+</pre>
+
+<pre>
+declare i8 @llvm.atomic.load.min.i8.p0i8( i8* <ptr>, i8 <delta> )
+declare i16 @llvm.atomic.load.min.i16.p0i16( i16* <ptr>, i16 <delta> )
+declare i32 @llvm.atomic.load.min.i32..p0i32( i32* <ptr>, i32 <delta> )
+declare i64 @llvm.atomic.load.min.i64..p0i64( i64* <ptr>, i64 <delta> )
+
+</pre>
+
+<pre>
+declare i8 @llvm.atomic.load.umax.i8.p0i8( i8* <ptr>, i8 <delta> )
+declare i16 @llvm.atomic.load.umax.i16.p0i16( i16* <ptr>, i16 <delta> )
+declare i32 @llvm.atomic.load.umax.i32.p0i32( i32* <ptr>, i32 <delta> )
+declare i64 @llvm.atomic.load.umax.i64.p0i64( i64* <ptr>, i64 <delta> )
+
+</pre>
+
+<pre>
+declare i8 @llvm.atomic.load.umin.i8..p0i8( i8* <ptr>, i8 <delta> )
+declare i16 @llvm.atomic.load.umin.i16.p0i16( i16* <ptr>, i16 <delta> )
+declare i32 @llvm.atomic.load.umin.i32..p0i32( i32* <ptr>, i32 <delta> )
+declare i64 @llvm.atomic.load.umin.i64..p0i64( i64* <ptr>, i64 <delta> )
+
+</pre>
+<h5>Overview:</h5>
+<p>
+ These intrinsics takes the signed or unsigned minimum or maximum of
+ <tt>delta</tt> and the value stored in memory at <tt>ptr</tt>. It yields the
+ original value at <tt>ptr</tt>.
+</p>
+<h5>Arguments:</h5>
+<p>
+
+ These intrinsics take two arguments, the first a pointer to an integer value
+ and the second an integer value. The result is also an integer value. These
+ integer types can have any bit width, but they must all have the same bit
+ width. The targets may only lower integer representations they support.
+</p>
+<h5>Semantics:</h5>
+<p>
+ These intrinsics does a series of operations atomically. They first load the
+ value stored at <tt>ptr</tt>. They then do the signed or unsigned min or max
+ <tt>delta</tt> and the value, store the result to <tt>ptr</tt>. They yield
+ the original value stored at <tt>ptr</tt>.
+</p>
+
+<h5>Examples:</h5>
+<pre>
+%ptr = malloc i32
+ store i32 7, %ptr
+%result0 = call i32 @llvm.atomic.load.min.i32.p0i32( i32* %ptr, i32 -2 )
+ <i>; yields {i32}:result0 = 7</i>
+%result1 = call i32 @llvm.atomic.load.max.i32.p0i32( i32* %ptr, i32 8 )
+ <i>; yields {i32}:result1 = -2</i>
+%result2 = call i32 @llvm.atomic.load.umin.i32.p0i32( i32* %ptr, i32 10 )
+ <i>; yields {i32}:result2 = 8</i>
+%result3 = call i32 @llvm.atomic.load.umax.i32.p0i32( i32* %ptr, i32 30 )
+ <i>; yields {i32}:result3 = 8</i>
+%memval1 = load i32* %ptr <i>; yields {i32}:memval1 = 30</i>
+</pre>
+</div>
+