3 #ifndef _CDS_URCU_DETAILS_BASE_H
4 #define _CDS_URCU_DETAILS_BASE_H
6 #include <cds/algo/atomic.h>
7 #include <cds/gc/details/retired_ptr.h>
8 #include <cds/details/allocator.h>
9 #include <cds/os/thread.h>
10 #include <cds/details/marked_ptr.h>
13 /// User-space Read-Copy Update (URCU) namespace
14 /** @ingroup cds_garbage_collector
17 This namespace contains declarations for different types of Read-Copy Update (%RCU)
18 synchronization primitive and data structures developed for RCU.
19 In <b>libcds</b> %RCU is used as garbage collector.
22 - [2009] M.Desnoyers "Low-Impact Operating System Tracing" PhD Thesis,
23 Chapter 6 "User-Level Implementations of Read-Copy Update"
24 - [2011] M.Desnoyers, P.McKenney, A.Stern, M.Dagenias, J.Walpole "User-Level
25 Implementations of Read-Copy Update"
27 <b>Informal intruduction to user-space %RCU</b>
29 [<i>From Desnoyer's papers</i>] %RCU is a synchronization mechanism that was added to the
30 Linux kernel in October of 2002. %RCU achieves scalability improvements by allowing
31 reads to occur concurrently with updates. In contrast to conventional locking
32 primitives that ensure mutual exclusion among concurrent threads regardless of whether
33 they be readers or updaters, or with reader-writer locks that allow concurrent reads
34 but not in the presence of updates, %RCU supports concurrency between a single updater
35 and multiple readers. %RCU ensures that reads are coherent by maintaining multiple
36 versions of objects and ensuring that they are not freed up until all pre-existing readside
37 critical sections complete. %RCU defines and uses efficient and scalable mechanisms
38 for publishing and reading new versions of an object, and also for deferring reclamation
39 of old versions. These mechanisms distribute the work among read and update
40 paths in such a way as to make read paths extremely fast.
42 %RCU readers execute within %RCU read-side critical sections. Each such critical section begins with
43 \p rcu_read_lock(), ends with \p rcu_read_unlock() (in \p libcds these primitives are the methods of
44 GC class and are usually called \p access_lock and \p access_unlock respectively). Read-side
45 critical sections can be nested.
46 The performance benefits of %RCU are due to the fact that \p rcu_read_lock()
47 and \p rcu_read_unlock() are exceedingly fast.
49 When a thread is not in an %RCU read-side critical section, it is in a quiescent state.
50 A quiescent state that persists for a significant time period is an extended quiescent state.
51 Any time period during which every thread has been in at least one quiescent state
52 is a grace period; this implies that every %RCU read-side critical section
53 that starts before a grace period must end before that grace period does.
54 Distinct grace periods may overlap, either partially or completely. Any time period
55 that includes a grace period is by definition itself a grace period.
56 Each grace period is guaranteed to complete as long as all read-side critical sections
57 are finite in duration; thus even a constant flow of such critical sections is unable to
58 extend an %RCU grace period indefinitely.
60 Suppose that readers enclose each of their data-structure traversals in
61 an %RCU read-side critical section. If an updater first removes an element
62 from such a data structure and then waits for a grace period, there can be
63 no more readers accessing that element. The updater can then carry out destructive
64 operations, for example freeing the element, without disturbing any readers.
66 The %RCU update is split into two phases, a removal phase and a reclamation phase.
67 These two phases must be separated by a grace period, for example via the \p synchronize_rcu()
68 primitive, which initiates a grace period and waits for it to finish.
69 During the removal phase, the %RCU update removes elements from a shared data structure.
70 The removed data elements will only be accessible to read-side critical sections
71 that ran concurrently with the removal phase, which are guaranteed to complete before the
72 grace period ends. Therefore the reclamation phase can safely free the data elements
73 removed by the removal phase.
75 Desnoyers describes several classes of user-space %RCU implementations:
76 - The Quiescent-State-Based Reclamation (QSBR) %RCU implementation offers
77 the best possible read-side performance, but requires that each thread periodically
78 calls a function to announce that it is in a quiescent state, thus strongly
79 constraining the application design. This type of %RCU is not implemented in \p libcds.
80 - The general-purpose %RCU implementation places almost no constraints on the application
\92s
81 design, thus being appropriate for use within a general-purpose library, but it has
82 relatively higher read-side overhead. The \p libcds contains several implementations of general-purpose
83 %RCU: \ref general_instant, \ref general_buffered, \ref general_threaded.
84 - The signal-handling %RCU presents an implementation having low read-side overhead and
85 requiring only that the application give up one POSIX signal to %RCU update processing.
86 The \p libcds contains several implementations if signal-handling %RCU: \ref signal_buffered,
89 @note The signal-handled %RCU is defined only for UNIX-like systems, not for Windows.
92 <b>RCU implementation type</b>
94 There are several internal implementation of RCU (all declared in \p %cds::urcu namespace):
95 - \ref general_instant - general purpose RCU with immediate reclamation
96 - \ref general_buffered - general purpose RCU with deferred (buffered) reclamation
97 - \ref general_threaded - general purpose RCU with special reclamation thread
98 - \ref signal_buffered - signal-handling RCU with deferred (buffered) reclamation
99 - \ref signal_threaded - signal-handling RCU with special reclamation thread
101 You cannot create an object of any of those classes directly.
102 Instead, you should use wrapper classes.
103 The wrapper simplifies creation and usage of RCU singleton objects
104 and has the reacher interface that combines interfaces of wrapped class i.e. RCU global part like
105 \p synchronize, and corresponding RCU thread-specific interface like \p access_lock, \p access_unlock and \p retire_ptr.
108 There are several wrapper classes (all declared in \p %cds::urcu namespace)
109 - \ref cds_urcu_general_instant_gc "gc<general_instant>" - general purpose RCU with immediate reclamation,
110 include file <tt><cds/urcu/general_instant.h></tt>
111 - \ref cds_urcu_general_buffered_gc "gc<general_buffered>" - general purpose RCU with deferred (buffered) reclamation,
112 include file <tt><cds/urcu/general_buffered.h></tt>
113 - \ref cds_urcu_general_threaded_gc "gc<general_threaded>" - general purpose RCU with special reclamation thread
114 include file <tt><cds/urcu/general_threaded.h></tt>
115 - \ref cds_urcu_signal_buffered_gc "gc<signal_buffered>" - signal-handling RCU with deferred (buffered) reclamation
116 include file <tt><cds/urcu/signal_buffered.h></tt>
117 - \ref cds_urcu_signal_threaded_gc "gc<signal_threaded>" - signal-handling RCU with special reclamation thread
118 include file <tt><cds/urcu/signal_threaded.h></tt>
120 Any RCU-related container in \p libcds expects that its \p RCU template parameter is one of those wrapper.
122 @anchor cds_urcu_tags
123 For simplicity, in some algorithms instead of using RCU implementation type
124 you should specify corresponding RCU tags (all declared in \p %cds::urcu namespace):
125 - \ref general_instant_tag - for \ref general_instant
126 - \ref general_buffered_tag - for \ref general_buffered
127 - \ref general_threaded_tag - for \ref general_threaded
128 - \ref signal_buffered_tag - for \ref signal_buffered
129 - \ref signal_threaded_tag - for \ref signal_threaded
131 @anchor cds_urcu_performance
134 As a result of our experiments we can range above %RCU implementation in such order,
135 from high to low performance:
136 - <tt>gc<general_buffered></tt> - high
137 - <tt>gc<general_threaded></tt>
138 - <tt>gc<signal_buffered></tt>
139 - <tt>gc<signal_threaded></tt>
140 - <tt>gc<general_instant></tt> - low
142 This estimation is very rough and depends on many factors:
143 type of payload - mostly read-only (seeking) or read-write (inserting and deleting), -
144 a hardware, your application, and so on.
146 @anchor cds_urcu_howto
149 Usually, in your application you use only one \ref cds_urcu_gc "type of RCU" that is the best for your needs.
150 However, the library allows to apply several RCU singleton in one application.
151 The only limitation is that only one object of each RCU type can be instantiated.
152 Since each RCU type is a template class the creation of two object of one RCU type class
153 with different template arguments is an error and is not supported.
154 However, it is correct when your RCU objects relates to different RCU types.
156 @note If you want to use \p %signal_buffered and \p %signal_threaded RCU in your application simultaneously,
157 you should specify different signal number for each signal-handled RCU type on construction time,
158 for example, \p SIGUSR1 and \p SIGUSR2 respectively. By default, both signal-handled RCU implementation
159 share \p SIGUSR1 signal and cannot be applied together.
161 In \p libcds, many GC-based ordered list, set and map template classes have %RCU-related specializations
162 that hide the %RCU specific details.
164 RCU GC is initialized in usual way: you should declare an object of type \p cds::urcu::gc< RCU_type >,
167 #include <cds/urcu/general_buffered.h>
169 typedef cds::urcu::gc< cds::urcu::general_buffered<> > rcu_gpb;
175 // Initialize general_buffered RCU
178 // If main thread uses lock-free containers
179 // the main thread should be attached to libcds infrastructure
180 cds::threading::Manager::attachThread();
182 // Now you can use RCU-based containers in the main thread
190 Each thread that deals with RCU-based container should be initialized first:
192 #include <cds/urcu/general_buffered.h>
193 int myThreadEntryPoint(void *)
195 // Attach the thread to libcds infrastructure
196 cds::threading::Manager::attachThread();
198 // Now you can use RCU-based containers in the thread
201 // Detach thread when terminating
202 cds::threading::Manager::detachThread();
208 # if CDS_OS_INTERFACE == CDS_OSI_UNIX || defined(CDS_DOXYGEN_INVOKED)
209 # define CDS_URCU_SIGNAL_HANDLING_ENABLED 1
212 /// General-purpose URCU type
213 struct general_purpose_rcu {
215 static uint32_t const c_nControlBit = 0x80000000;
216 static uint32_t const c_nNestMask = c_nControlBit - 1;
220 # ifdef CDS_URCU_SIGNAL_HANDLING_ENABLED
221 /// Signal-handling URCU type
222 struct signal_handling_rcu {
224 static uint32_t const c_nControlBit = 0x80000000;
225 static uint32_t const c_nNestMask = c_nControlBit - 1;
230 /// Tag for general_instant URCU
231 struct general_instant_tag: public general_purpose_rcu {
232 typedef general_purpose_rcu rcu_class ; ///< The URCU type
235 /// Tag for general_buffered URCU
236 struct general_buffered_tag: public general_purpose_rcu
238 typedef general_purpose_rcu rcu_class ; ///< The URCU type
241 /// Tag for general_threaded URCU
242 struct general_threaded_tag: public general_purpose_rcu {
243 typedef general_purpose_rcu rcu_class ; ///< The URCU type
246 # ifdef CDS_URCU_SIGNAL_HANDLING_ENABLED
247 /// Tag for signal_buffered URCU
248 struct signal_buffered_tag: public signal_handling_rcu {
249 typedef signal_handling_rcu rcu_class ; ///< The URCU type
252 /// Tag for signal_threaded URCU
253 struct signal_threaded_tag: public signal_handling_rcu {
254 typedef signal_handling_rcu rcu_class ; ///< The URCU type
258 ///@anchor cds_urcu_retired_ptr Retired pointer, i.e. pointer that ready for reclamation
259 typedef cds::gc::details::retired_ptr retired_ptr;
261 /// Pointer to function to free (destruct and deallocate) retired pointer of specific type
262 typedef cds::gc::details::free_retired_ptr_func free_retired_ptr_func;
265 /// Implementation-specific URCU details
267 /// forward declarations
268 template <typename RCUtag>
271 template <typename RCUtag>
274 template <typename RCUtag >
278 class singleton_vtbl {
280 virtual ~singleton_vtbl()
283 virtual void retire_ptr( retired_ptr& p ) = 0;
289 template <typename MarkedPtr> using atomic_marked_ptr = atomics::atomic<MarkedPtr>;
294 template <typename ThreadData>
295 struct thread_list_record {
296 ThreadData * m_pNext ; ///< Next item in thread list
297 atomics::atomic<OS::ThreadId> m_idOwner ; ///< Owner thread id; 0 - the record is free (not owned)
301 , m_idOwner( cds::OS::c_NullThreadId )
304 ~thread_list_record()
310 template <typename RCUtag, class Alloc = CDS_DEFAULT_ALLOCATOR >
313 typedef thread_data<RCUtag> thread_record;
314 typedef cds::details::Allocator< thread_record, Alloc > allocator_type;
317 atomics::atomic<thread_record *> m_pHead;
329 thread_record * alloc()
331 thread_record * pRec;
332 cds::OS::ThreadId const nullThreadId = cds::OS::c_NullThreadId;
333 cds::OS::ThreadId const curThreadId = cds::OS::get_current_thread_id();
335 // First try to reuse a retired (non-active) HP record
336 for ( pRec = m_pHead.load( atomics::memory_order_acquire ); pRec; pRec = pRec->m_list.m_pNext ) {
337 cds::OS::ThreadId thId = nullThreadId;
338 if ( !pRec->m_list.m_idOwner.compare_exchange_strong( thId, curThreadId, atomics::memory_order_seq_cst, atomics::memory_order_relaxed ) )
343 // No records available for reuse
344 // Allocate and push a new record
345 pRec = allocator_type().New();
346 pRec->m_list.m_idOwner.store( curThreadId, atomics::memory_order_relaxed );
348 atomics::atomic_thread_fence( atomics::memory_order_release );
350 thread_record * pOldHead = m_pHead.load( atomics::memory_order_acquire );
352 pRec->m_list.m_pNext = pOldHead;
353 } while ( !m_pHead.compare_exchange_weak( pOldHead, pRec, atomics::memory_order_release, atomics::memory_order_relaxed ));
358 void retire( thread_record * pRec )
360 assert( pRec != nullptr );
361 pRec->m_list.m_idOwner.store( cds::OS::c_NullThreadId, atomics::memory_order_release );
366 thread_record * pNext = nullptr;
367 cds::OS::ThreadId const nullThreadId = cds::OS::c_NullThreadId;
369 for ( thread_record * pRec = m_pHead.load(atomics::memory_order_acquire); pRec; pRec = pNext ) {
370 pNext = pRec->m_list.m_pNext;
371 if ( pRec->m_list.m_idOwner.load(atomics::memory_order_relaxed) != nullThreadId ) {
377 thread_record * head( atomics::memory_order mo ) const
379 return m_pHead.load( mo );
386 CDS_DEBUG_ONLY( cds::OS::ThreadId const nullThreadId = cds::OS::c_NullThreadId; )
387 CDS_DEBUG_ONLY( cds::OS::ThreadId const mainThreadId = cds::OS::get_current_thread_id() ;)
389 thread_record * p = m_pHead.exchange( nullptr, atomics::memory_order_seq_cst );
391 thread_record * pNext = p->m_list.m_pNext;
393 assert( p->m_list.m_idOwner.load( atomics::memory_order_relaxed ) == nullThreadId
394 || p->m_list.m_idOwner.load( atomics::memory_order_relaxed ) == mainThreadId
395 || !cds::OS::is_thread_alive( p->m_list.m_idOwner.load( atomics::memory_order_relaxed ) )
406 template <class ThreadGC>
409 typedef ThreadGC thread_gc;
410 typedef typename thread_gc::rcu_tag rcu_tag;
414 scoped_lock(bool bLock = true)
418 thread_gc::access_lock();
424 thread_gc::access_unlock();
428 } // namespace details
433 template <typename RCUimpl> class gc;
436 /// Epoch-based retired ptr
438 Retired pointer with additional epoch field that prevents early reclamation.
439 This type of retired pointer is used in buffered RCU implementations.
441 struct epoch_retired_ptr: public retired_ptr
443 uint64_t m_nEpoch ; ///< The epoch when the object has been retired
450 /// Constructor creates a copy of \p rp retired pointer and saves \p nEpoch reclamation epoch.
451 epoch_retired_ptr( retired_ptr const& rp, uint64_t nEpoch )
460 #endif // #ifndef _CDS_URCU_DETAILS_BASE_H